0% found this document useful (0 votes)
50 views74 pages

Analysis Chapter 1

The document provides an introduction to algorithms and algorithm analysis. It discusses what algorithms are, their properties, how to design and analyze algorithms. It gives examples of running time analysis for different algorithms and explains that running time analysis determines how fast an algorithm runs based on the size of the input. The goal of analysis is to understand an algorithm's efficiency and limitations.

Uploaded by

Yawkal Addis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views74 pages

Analysis Chapter 1

The document provides an introduction to algorithms and algorithm analysis. It discusses what algorithms are, their properties, how to design and analyze algorithms. It gives examples of running time analysis for different algorithms and explains that running time analysis determines how fast an algorithm runs based on the size of the input. The goal of analysis is to understand an algorithm's efficiency and limitations.

Uploaded by

Yawkal Addis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Habesha College

Department of Computer Science

Analysis of Algorithms (CoSc3092 )

Chapter-1

Introduction and Elementary Data Structures

1
Introduction to Algorithm analysis
 What is algorithm?
 A step by step procedures, to solve different kind of problems.
 Suppose ,we want to make chocolate cake

 unambiguous sequence computational steps that transform input to the


output
 A computer algorithm is a detailed step-by-step method for solving a
problem by using a computer.
 An algorithm is a well-defined computational procedure.
2
....Cont’d
 Process or set of rules to be followed to achieve desired output,
especially by a computer.
 an algorithm gives the computer a specific set of instructions, which
allows the computer to do everything

 An algorithm is any well-define computational procedures that takes


some value, or a set of values as input and produces some value, or a
set of values as output.

3
Algorithm Design and Analysis Process
 Understanding the problem
 Asking questions, do a few examples by hand, think about special
cases, etc.
 Deciding on
– Exact vs. approximate problem solving
– Appropriate data structure
 Design an algorithm
 Proving correctness
 Analyzing an algorithm
– Time efficiency : how fast the algorithm runs
– Space efficiency: how much extra memory the algorithm needs.
 Coding an algorithm 4
Algorithm Design and Analysis Process
5
Algorithm Design Techniques/Strategies
Divide and conquer
Brute force
Decrease and conquer
Transform and conquer
Space and time tradeoffs
Greedy approach
Dynamic programming
Backtracking
Branch and bound

6
Properties of an algorithm
 Finiteness: Algorithm must complete after a finite number
of steps.
 Definiteness: Each step must be clearly defined, having
one and only one interpretation. At each point in
computation, one should be able to tell exactly what
happens next.
 Sequence: Each step must have a unique defined preceding
and succeeding step. The first step (start step) and last step
(halt step) must be clearly noted.
 Feasibility: It must be possible to perform each instruction.
 Correctness: It must compute correct answer for all
possible legal inputs.
7
Cont.…
 Language Independence: It must not depend on any
one programming language.
 Completeness: It must solve the problem completely.
 Effectiveness: It must be possible to perform each step
exactly and in a finite amount of time.
 Efficiency: It must solve with the least amount of
computational resources such as time and space.
 Generality: Algorithm should be valid on all possible
inputs.
 Input/output: There must be a specified number of input
values, and one or more result values.
8
Cont.…
 Given 2 or more algorithms to solve the same problem,
how do we select the best one?
 Some criteria for selecting an algorithm
1) Is it easy to implement, understand, modify?
2) How long does it take to run it to completion?
3) How much of computer memory does it use?

9
Analysis of algorithms
 Algorithm analysis refers to the process of determining the
amount of computing time and storage space required by
different algorithms.
 It’s a process of predicting the resource requirement of
algorithms in a given environment.
 To analyze an algorithm means:
Developing a formula for predicting how fast an
algorithm is, based on the size of the
Developing a formula for predicting how much memory
an algorithm requires, based on the size of the input
 Running time is usually treated as the most important and
the most precious resource in most problem domains. 10
Why analyze?
– Need to recognize limitations of various algorithms for
solving a problem.
– Need to understand relationship between problem size
and running time.
» When is a running program not good enough?
– Need to learn how to analyze an algorithm's running time
without coding it.
– Need to learn techniques for writing more efficient code.
– Need to recognize bottlenecks in code as well as which
parts of code are easiest to optimize

11
Complexity Analysis
 Complexity Analysis is the systematic study of the cost of
computation, measured either in time units or in operations
performed, or in the amount of storage space required.
 The goal is to have a meaningful measure that permits
comparison of algorithms independent of operating platform.
 There are two things to consider:
Time Complexity: is define as the process of determining a
formula for total time required towards execution of that
algorithm.
Space Complexity: is defined as the process of determining a
formula for prediction of how much memory space will be
required for the successful execution of the algorithm.
12
Cont.….
 Analysis Rules:
1. We assume an arbitrary time unit.
2. Execution of one of the following operations takes time 1
 Assignment Operation
 Single Input/output Operation
 Single Boolean Operations
 Single Arithmetic Operations
 Function Return
 Arya index operations, pointer dereferences

3. Running time of a selection statement (if, switch) is the time for the
condition evaluation + the maximum of the running times for the
individual clauses in the selection.
13
Cont.….
4. Loops: Running time for a loop is equal to the running
time for the statements inside the loop * number of iterations.
The total running time of a statement inside a group of nested
loops is the running time of the statements multiplied by the
product of the sizes of all the loops.
For nested loops, analyze inside out. Always assume that the
loop executes the maximum number of iterations possible.
5. Running time of a function call is 1 for setup + the time for
any parameter calculations + the time required for the
execution of the function body.

14
Cont.….
Examples:1. Time Units to Compute
-------------------------------------------------
int count() {
- 1 for the assignment statement: int k=0
int k=0; - 1 for the output statement.
cout<< ―Enter an integer‖; - 1 for the input statement.
- In the for loop:
cin>>n; 1 assignment, n+ 1 test, and n increments.
for (i=0;i<n;i++) n loops of 2 units for an assignment, and an addition.
1 for the return statement.
k=k+1; ------------------------------
return 0;
}
Figure
T (n)= 1+1+1+(1+n+1+n)+2n+1 = 4n+6 = O(n) 1.2 Electricity
transmission line

15
Cont.….
Example 2. Time Units to Compute
-------------------------------
int total(int n) { - 1 for the assignment statement: int sum=0
In the for loop:
int sum=0; - 1 assignment, n+1 tests, and n increments.
- n loops of 2 units for an assignment, and an
for (int i=1;i<=n;i++) addition.
- 1 for the return statement.
sum=sum+1; ------------------------------
return sum;
}
T (n)= 1+ (1+n+1+n)+2n+1 = 4n+4 = O(n)

16
Cont.….
Example 3. Time Units to Compute
-------------------------------------------------
int sum (int n) { - 1 for the assignment.
int partial_sum = 0; - 1 assignment, n+ 1 test, and n
increments.
for (int i = 1; i <= n; i++) - n loops of 4 units for an assignment,
{ an addition, and two multiplications.
1 for the return statement.
partial_sum = partial_sum +(i * i * i);
}
return partial_sum;
}

T (n) = 1+(1+n+1+n)+4n+1 = 6n+4 = O(n)

17
Cont.….
Time Units to Compute
Example 4 ----------------------------
void func() { - 1 for the first assignment statement: x=0;
- 1 for the second assignment statement: i=0;
int x=0;i=0, j=1; - 1 for the third assignment statement: j=1;
- 1 for the output statement.
cout<< ―Enter an Integer value‖; - 1 for the input statement.
cin>>n; - > In the first while loop:
- n+1 tests
while (i<n) { -n loops of 2 units for the two increment (addition)
operations
x++;
-> In the second while loop:
i++; } - n tests
- n-1 increments
while (j<n){

j++;} }
T (n)= 1+1+1+1+1+n+1+2n+n+n-1 = 5n+5 = O(n)

18
CONT…

Exercise

19
Formal Approach to Analysis
 In the above examples we have seen that analysis is a bit
complex. However, it can be simplified by using some
formal approach in which case we can ignore initializations,
loop control, and book keeping.
For Loops: Formally
In general, a for loop translates to a summation. The index
and bounds of the summation are the same as the index and
bounds of the for loop.
N

1  N
for (int i = 1; i <= N; i++) {
sum = sum+i;
}
i 1
Suppose we count the number of additions that are done. There is 1
addition per iteration of the loop, hence N additions in total.
20
CONT…

 Nested Loops: Formally


Nested for loops translate into multiple summations, one
for each for loop.
for (int i = 1; i <= N; i++) {
for (int j = 1; j <= M; j++) { N M N

}
sum = sum+i+j;  2   2M  2MN
i 1 j 1 i 1
}

Again, count the number of additions. The outer summation is


for the outer for loop.

21
CONT…

Consecutive Statements: Formally


Add the running times of the separate blocks of your code
for (int i = 1; i <= N; i++) {
sum = sum+i;
N  N N 
1   2  N  2 N
} 2
for (int i = 1; i <= N; i++) {
for (int j = 1; j <= N; j++) {  i 1   i 1 j 1 
sum = sum+i+j;
}
}

22
CONT…

Conditionals: Formally
1.If (test) s1 else s2: Compute the maximum of the
running time for s1 and s2.

if (test == 1) {
for (int i = 1; i <= N; i++) {  N N N 
sum = sum+i; max 1,  2  
}}  i 1 i 1 j 1 
 
else for (int i = 1; i <= N; i++) {
for (int j = 1; j <= N; j++) { max N , 2 N 2  2 N 2
sum = sum+i+j;
}}

23
Measures of Times
 In order to determine the running time of an algorithm it is
possible to define three functions Tbest(n), Tavg(n) and Tworst(n)
as the best, the average and the worst case running time of the
algorithm respectively.
Best Case (Tbest): The amount of time the algorithm takes
on the smallest possible set of inputs.
Example:- Sort a set of numbers in increasing order; and the data is already in increasing
order

 Average Case (Tavg): The


amount of time the algorithm takes on
an "average" set of inputs.
Worst Case (Tworst): The amount of time the algorithm
takes on the worst possible set of inputs.
Example: Sort a set of numbers in increasing order; and the data is in decreasing
24
Asymptotic Notation
 Asymptotic analysis is concerned with how the running
time of an algorithm increases with the size of the input in
the limit, as the size of the input increases without bound.
 Asymptotic notation gives us a method for classifying
functions according to their rate of growth.
 Asymptotic notation of an algorithm is a mathematical
representation of its complexity
 Algorithms perform f(n) basic operations to accomplish
task
– Identify that function
– Identify size of problem (n)
– Count number of operations in terms of n 25
CONT…

 In asymptotic notation, we use only the most significant


terms in the complexity of that algorithm and ignore least
significant terms in the complexity of that algorithm.
 For example, consider the following time complexities of
two algorithms
– Algorihtm 1 : 5n2 + 2n + 1
– Algorihtm 2 : 10n2 + 8n + 3
Generally, when we analyze an algorithm, we consider the time
complexity for larger values of input data (i.e. 'n' value).
In above two time complexities, for larger value of 'n' the term in
algorithm 1 '2n + 1' has least significance than the term '5n2', and the
term in algorithm 2 '8n + 3' has least significance than the term '10n2'.26
CONT…

 Here for larger value of 'n' the value of most significant


terms ( 5n2 and 10n2 ) is very larger than the value of
least significant terms ( 2n + 1 and 8n + 3 ).
 So for larger value of 'n' we ignore the least significant
terms to represent overall time required by an algorithm.
 In asymptotic notation, we use only the most significant
terms to represent the time complexity of an algorithm.
 Mainly, we use three types of Asymptotic Notations and
those are as follows...
 Big - Oh (O)
 Omega (Ω)
 Theta (Θ)
27
The Big-Oh Notation
 Big - Oh notation is used to define the upper bound of an
algorithm in terms of Time Complexity.
 That means Big - Oh notation always indicates the
maximum time required by an algorithm for all input
values.
 That means Big - Oh notation describes the worst case of
an algorithm time complexity.

28
Cont.…..
 Definition: f(n) = O(g(n)) iff there
are two positive constants c and n0
such that
|f(n)| ≤ c |g(n)| for all n ≥ n0
 If f(n) is nonnegative, we can
simplify the last condition to 0 ≤
f(n) ≤ c g(n) for all n ≥ n0
 We say that ―f(n) is big-O of g(n).‖
 As n increases, f(n) grows no faster than
g(n). In other words, g(n) is an
asymptotic upper bound on f(n).

29
CONT…

Example: n 2 + n = O(n 3 ) Proof:


 Here, we have f(n) = n 2 + n, and g(n) = n 3
Notice that if n ≥ 1, n ≤ n 3 is clear.
 Also, notice that if n ≥ 1, n 2 ≤ n 3 is clear.
Side Note:
In general, if a ≤ b, then n a ≤ n b whenever n ≥ 1. This fact is used often
in these types of proofs.
Therefore, n 2 + n ≤ n 3 + n 3 = 2n 3
We have just shown that n 2 + n ≤ 2n 3 for all n ≥ 1
 Thus, we have shown that n 2 + n = O (n 3) (by definition of Big-O,
with n0 = 1, and c = 2.)
30
Big-Omega Notation
 Omega notation is used to define the lower bound of an
algorithm in terms of Time Complexity.
 That means Big-Omega notation always indicates the
minimum time required by an algorithm for all input values.
 That means Omega notation describes the best case of an
algorithm time complexity.

31
Cont.….
 Definition: f(n) = Ω(g(n)) iff
there are two positive constants
c and n0 such that |f(n)| ≥ c
|g(n)| for all n ≥ n0
 If f(n) is nonnegative, we can
simplify the last condition to 0
≤ c g(n) ≤ f(n) for all n ≥ n0
 We say that ―f(n) is omega of
g(n).‖
 As n increases, f(n) grows no
slower than g(n).
 In other words, g(n) is an
asymptotic lower bound on f(n).
32
Cont.…

Example: n 3 + 4n 2 = Ω(n 2 )
Proof: Here, we have f(n) = n 3 + 4n 2 , and g(n) = n 2
 It is not too hard to see that if n ≥ 0, n 3 ≤ n 3 + 4n 2
 We have already seen that if n ≥ 1, n 2 ≤ n 3
Thus when n ≥ 1, n 2 ≤ n 3 ≤ n 3 + 4n 2
Therefore, 1n 2 ≤ n 3 + 4n 2 for all n ≥ 1
 Thus, we have shown that n 3 + 4n 2 = Ω(n 2 ) (by definition
of Big-Ω, with n0 = 1, and c = 1.)

33
Big-Theta Notation (Θ)
 Theta notation is used to define the average bound of an
algorithm in terms of Time Complexity.
 That means Theta notation always indicates the average
time required by an algorithm for all input values.
 That means Theta notation describes the average case of
an algorithm time complexity.

34
Cont.….
Definition: f(n) = Θ(g(n)) iff there
are three positive constants c1, c2
and n0 such that c1|g(n)| ≤ |f(n)| ≤
c2|g(n)| for all n ≥ n0
 If f(n) is nonnegative, we can
simplify the last condition to 0 ≤ c1
g(n) ≤ f(n) ≤ c2 g(n) for all n ≥ n0
We say that ―f(n) is theta of g(n).‖
As n increases, f(n) grows at the
same rate as g(n). In other words,
g(n) is an asymptotically tight
bound on f(n).

35
Cont.…
Example: n 2 + 5n + 7 = Θ(n 2 )
Proof: When n ≥ 1, n 2 + 5n + 7 ≤ n 2 + 5n 2 + 7n 2 ≤ 13n
When n ≥ 0, n 2 ≤ n 2 + 5n + 7
Thus, when n ≥ 1 1n 2 ≤ n 2 + 5n + 7 ≤ 13n 2
Thus, we have shown that n 2 + 5n + 7 = Θ(n 2 ) (by
definition of Big-Θ, with n0 = 1, c1 = 1, and c2 = 13.)

36
Relational Properties of the
Asymptotic Notations
Transitivity
if f(n)=(g(n)) and g(n)= (h(n)) then f(n)=(h(n)),
if f(n)=O(g(n)) and g(n)= O(h(n)) then f(n)=O(h(n)),
if f(n)=(g(n)) and g(n)= (h(n)) then f(n)= (h(n)),
Example: if f(n) = n , g(n) = n² and h(n)=n³
n is O(n²) and n² is O(n³) then n is O(n³)
Symmetry
f(n)=(g(n)) if and only if g(n)=(f(n)).
Example: f(n) = n² and g(n) = n² then f(n) = Θ(n²) and g(n) = Θ(n²)
Transpose symmetry
f(n)=O(g(n)) if and only if g(n)=(f(n)),
Example: f(n) = n , g(n) = n² then n is O(n²) and n² is Ω (n)
Reflexivity
f(n)=(f(n)),
f(n)=O(f(n)),
37
CONT…
Exercise
1.Is 5n2 - 6n = (n2) ?
2.Prove that n2/2 - 3n= (n2)
3.Is 2n + (n) = (n2)?
4.Is 10n2 + 4n +2 = O(n2) ?
5.Is 3n2 +4n+1=O(n2) ?
6.f(n)=10n+5 and g(n)=n. Show that f(n) is O(g(n))
7.3n + 2 = Ω(n)
8.5n 2 + 3n + 20 = O(n 2 )
9. 10 log2n + 4= log2n)
What constants for n0, c1, and c2 will work?

38
Review of elementary data structures
 Stacks
A stack is an ordered list in which all insertions and
deletions are made at one end, called the top.
 The operations of a stack imply that if the elements
A,B,C,D,E are inserted into a stack, in that order, then the
first element to be removed/deleted must be E.
Equivalently we say that the last element to be inserted into
the stack will be the first to be removed.
For this reason stacks are sometimes referred to as Last In
First Out (LIFO) lists.

39
Cont.….
 The simplest way to represent a stack is by using a one-
dimensional array, say STACK(1:n), where n in the maximum
number of allowable entries.
 The first or bottom element in the stack will be stored at
STACK(1), the second at ST ACK(2) and the i th at STACK(i).
 Associated with the array will be a variable, typically called top,
which points to the top element in the stack.
 To test if the stack is empty we ask "if top = 0―, If not, the topmost
element is at STACK(top ).
 Two more substantial operations are inserting and deleting
elements.
 The corresponding procedures are given as algorithms (a) and (b).
40
Cont.….
The time complexity
of push and pop operation
in a stack is O(1) because
its either top++ or top--,
where the top is an index
which points to the top most
element in the stack at any
instant of time.

41
Sequential Search Algorithm
int search(int A[], int N, int Num)
{
int index = 0;
while ((index < N))
{
if (A[index] == Num)
return index;
else
return -1;
index++;
}
} 42
CONT…
 Best case : Record at first place in array.
 Worst case : Worst case time for serial search requires n array
accesses: O(n).
Average case : We have an array of 10 records.
The average of all these searches is:
(1+2+3+4+5+6+7+8+9+10)/10 = 5.5
Expression for average-case running time:
(1+2+…+n)/n , (1+2+…+n)/ =n(n+1)/2
n(n+1)/2n = (n+1)/2
Therefore, average case time complexity for serial search is (n)

43
Binary Search
Array should be sorted
Array a[i:l] and x is to be searched.
Algo BinSearch (a,i,l,x) {
if(l=i)
{
if x=a[i]
return i;
else return 0; }
else { mid = (i+l)/2;
if x= a[mid] then return mid;
else if(x<a[mid]) then return BinSearch(a,i,mid-1,x);
else
return BinSearch(a,mid+1,l,x);
}
}
44
Binary Search Analysis
For an array of size N, it eliminates ½ until 1 element
remains.
– N, N/2, N/4, N/8, ..., 4, 2, 1
– How many divisions does it take?
Think of it from the other direction:
– How many times do I have to multiply by 2 to reach N?
– 1, 2, 4, 8, ..., N/4, N/2, N
– Call this number of multiplications "x".
– 2x =N
– x = log2 N
Binary search is in the logarithmic complexity class.
Binary search is best-case : Θ(1)
Binary search is average-case : Θ(log n)
Binary search is worst-case : Θ(log n)

45
Insertion sort

5 2 4 6 1 3

2 5 4 6 1 3

2 4 5 6 1 3

2 4 5 6 1 3

1 2 4 5 6 3

46
Analysis Of insertion sort

47
CONT…
The running time of the algorithm is the sum of running
times for each statement executed;
a statement that takes ci steps to execute and executes n
times will contribute cin to the total running time.
To compute T(n), the running time of INSERTION-SORT
on an input of n values, we sum the products of the cost
and times columns, obtaining

48
CONT…
 Even for inputs of a given size, an algorithm’s running time may depend on which
input of that size is given.
For example, in INSERTION-SORT, the best case occurs if the array is already sorted.
For each j = 2,3….. n, we then find that A[i ]<=key in line 5 when i has its initial
value of j - 1. Thus tj = 1 for j = 2,3….n, and the best-case running time is

We can express this running time as an - b for constants a and b that depend on the
statement costs ci; it is thus a linear function of n.
If the array is in reverse sorted order—that is, in decreasing order—the worst case
results. We must compare each element A[j ]with each element in the entire sorted
sub array A[1…j-1] , and so tj = j for j=2, 3…. n. Noting that
49
We can express this worst-case running time as an2+bn+c for
constants a, b, and c that again depend on the statement costs
ci; it is thus a quadratic function of n.
Analysis Insertion Sort
Best case : array is already sorted Ω(n)
Worst Case: array is in reverse sorted , Ω(n2)
Average Case : Running time is quadratic as like in worst case, Ω(n2)
50
SETS AND DISJOINT SET UNION
 A disjoint-set data structure, also called a union–find data
structure or merge–find set.
 A disjoint-set data structure that keeps track of a set of
elements partitioned into a number of disjoint or non-
overlapping subsets.
 It provides near-constant-time operations to add new sets, to
merge existing sets, and to determine whether elements are
in the same set.
 Plays a key role in Kruskal’s algorithm for finding
the minimum spanning tree of a graph.
 This can also be used to detect cycle in the graph.
51
CONT…
How Disjoint Set is constructed:
A disjoint-set forest consists of a number of elements each of which
stores an id, a parent pointer
The parent pointers of elements are arranged to form one or more trees,
each representing a set.
If an element’s parent pointer points to no other element, then the
element is the root of a tree and is the representative member of its
set.
A set may consist of only a single element. However, if the element has
a parent, the element is part of whatever set is identified by following
the chain of parents upwards until a representative element (one
without a parent) is reached at the root of the tree.

52
CONT…
Disjoint Set Operations:
MakeSet(X): This operation makes a new set by creating a new element with
a parent pointer to itself. The parent pointer to itself indicates that the element
is the representative member of its own set. The MakeSet operation
has O(1) time complexity.
Find(X): follows the chain of parent pointers from x upwards through the tree
until an element is reached whose parent is itself. This element is the root of the
tree and is the representative member of the set to which x belongs, and may
be x itself.
Union(x,y): Uses Find to determine the roots of the trees x and y belong to. If
the roots are distinct, the trees are combined by attaching the root of one to the
root of the other. If this is done naively, such as by always making x a child
of y, the height of the trees can grow as O(n).

53
Sets and disjoint set union
Use of forests in the representation of sets Representing disjoint sets by trees
S1= {1,7,8,9}, s2={2,5,10}, s3={3,4,6} 1 5 3

4 6
7 9 2 10
8
Possible set pointer representation of sets

Set name Pointer 3


S1 5
1
S2
S3 4 6
7 8 9 2 10

i 1 2 3 4 5 6 7 8 9 10
P -1 5 -1 3 -1 3 1 1 1 5 Array representation of S1, S2 and S3
54
CONT…

1
 Weighting Rule for UNION (i j). If
the number of nodes in tree i is less 9
than the number in tree j, then make 5 7 8

j the parent of i, otherwise make i


the parent of j. 10
2
55
CONT…

Suppose {1} {2} {3} {4}……{n}


n
Operations are : union (1, 2), union (2, 3), union (3,4), union (n
Operations are: find (1), find(2), find(3), …….,find(n)
n-1

..
.

1
56
Applications using Disjoint sets

Represents network connectivity.


Image Processing.
In game algorithms.
Kruskal’s minimum spanning tree.
Find cycle in undirected graphs.

57
Hashing
 A hash function generates a signature from a data object.
 Hash functions have security and data processing applications.
 A hash table is a data structure where the storage location of data is
computed from the key using a hash function.
 Hashing is the process of converting a given key into another value.
 A hash function is used to generate the new value according to a
mathematical algorithm.
 A hash algorithm also known as a hash function.
 The result of a hash function is known as a hash value or simply,
a hash.

58
CONT…
 A hash table is a collection of items which are stored in such
a way as to make it easy to find them later. Each position of
the hash table, often called a slot, can hold an item and is
named by an integer value starting at 0. For example, we
will have a slot named 0, a slot named 1, a slot named 2, and
so on. Initially, the hash table contains no items so every slot
is empty. Figure below shows a hash table of size m=11.
 In other words, there are m slots in the table, named 0
through 10.
0 12 3 4 5 6 7 8 9 10
None None None None None None None None None None

59
CONT…
 The mapping between an item and the slot where that item belongs in the hash table
is called the hash function.
 The hash function will take any item in the collection and return an integer in the
range of slot names, between 0 and m-1.
 Given a collection of items, a hash function that maps each item into a unique slot is
referred to as a perfect hash function.
 If we know the items and the collection will never change, then it is possible to
construct a perfect hash function.
 Unfortunately, given an arbitrary collection of items, there is no systematic way to
construct a perfect hash function.
 Luckily, we do not need the hash function to be perfect to still gain performance
efficiency.
 One way to always have a perfect hash function is to increase the size of the hash
table so that each possible value in the item range can be accommodated.
 This guarantees that each item will have a unique slot. Although this is practical for
small numbers of items, it is not feasible when the number of possible items is large.
60
Handling collisions
 Two or more than two keys can generate the same hash.
This phenomenon is known as a collision. There are several
ways to handle collisions.
 If the number of possible keys greatly exceeds the numbers
of records, and of computed storage locations, hash
collisions become inevitable and so have to be handled
without loss of data.
 3 approaches are used to handle collisions:
Open hashing (linear probing)
Quadratic hashing
Chained hashing
61
Open hashing
 If a key can be stored in its computed location store it there
else go to the next unused table location and store the record
there.
 Rotate to the first location (array_element[0] ) after the
highest.
 Use the remainder when dividing the position number by
table size; i.e.
 array_location = position_number % array_size;
 This modulus always maps any integer to a valid array-
_location.

62
CONT…

63
Keys : 8, 3, 13, 6, 4, 23, 43, 10
h(x)=x% size
Key space Hash Table h(x)=x% size
10 0 h’(x)=[h(x)+f(i)] % size
8 1 f(i)=i
3 2 i= 0, 1, 2, 3……
13 3 3 h’(13)= [h(13)+f(0)] % 10
6 13 4 =3
4 4 5 h’(13) = [h(13) + f(1)] % 10
23 6 6 =4
43 23 7 h’(4) = [h(4) + f(0)] % 10
10 8 8 =4
43 9 h’() = [h(4) + f(1)] % 10
=4

64
Quadratic Hashing
 If a location for a key is already occupied by another record,
find the next unused location by trying locations separated
from the calculated location by 1,4,9,15,25,49... positions
(i.e the series of perfect squares) on from the original record
position (using the modulus operation described for open
hashing).
 The advantage of this approach is that data is less likely to
become clustered (and therefore requiring more access
operations) than would occur with open hashing.

65
CONT…
 Calculating the successive squares can also be reduced to
quicker addition by virtue of the fact that the series of
quadratic locations 0,1,4,9,16,25... from the origin are
separated by the series of jumps 1,3,5,7,9... from each other.
 This approach will require special care in the sizing of the
hash table.
 If not there is a greater risk of jumps skipping over unused
positions and revisiting previously searched ones.

66
Keys : 8, 3, 13, 23, 43, 10
h(x)=x% size
Hash Table
10 0
8 1
3 43 2
13 3 3
23 13 4
43 5
10 6
23 7
8 8
9

67
Chained hashing
 This involves co-location of 0 or more data items using a
singly-linked list starting at the array location returned by
the hash function.
 If the array size and hash function are chosen in order to
reduce the frequency of collisions such that say, 90% of
records are the only record at their array location, then it is
probable that a further 9% will be chained in list lengths of
2, and 0.9% will be triply located, 0.09% will by quadruply
located etc.

68
CONT…
 This would result in an average number of comparisons
needed to find a single data item of approximately (0.9n +
0.09n*1.5 + 0.009n*2 + 0.0009n*2.5...)/n which is
1.0555555, or close enough to 1.0 to make little difference.
 If the hash table is an array of pointers, each pointer is either
the head address of a linked list or a null to indicate an
unused position.

69
Hash Table
0 10
8 Keys : 8, 3, 13, 6, 4, 23, 43, 10
1
3 2
h(x)=x% size
13 3 3 13 23 43
23 4 4
43 5
10 6 6
7
8 8
9

70
CONT…
 Hashing is also used in data encryption. Passwords can be
stored in the form of their hashes so that even if a database is
breached, plaintext passwords are not accessible. MD5,
SHA-1 and SHA-2 are popular cryptographic hashes.

71
Example h(x)= sum of ASCII value %11

72
CONT…

73
74

You might also like