Data Analysis &algorithm
Data Analysis &algorithm
{
if small(p) then solve(p)
else
{
Divide p into number of sub problems p1, p2,..pn
D&C(p1);
D&C(p2);
.
D&C(pn);
Combine solutions of sub problems
}
}
Small() is a user defined function which returns true if the problem is small enough to solve
directly, otherwise it returns false.
Solve() is a user defined function which gets solution of the problem.
Binary Search
It is one of the techniques used to check the presence of a value in a set or list of values. The
values in the list or set should be in sorted order. The procedure used in this method is as
follows: the value to be searched is compared with the middle value in the list. If they match then
the process is terminated and announced that the value is identified. Otherwise, the list is divided
into two equal size parts based on middle position and then the value is searched in first part if
the value to be searched is less than the middle value of list or in second part if the value to be
searched is greater than the middle value of list. This process is repeated until the value is found
or the list shrinks to a single value.
Algorithm
Algorithm BinarySearch(a, n, k)
//a is an array of size n and k is value to be searched
{
s := 1;
e := n;
while s <= e do
{
m := (s+e)/2;
if k = a[m] then
{
write value is found at position m;
return;
}
else if k < a[m] then
e := m-1;
else
s := m+1;
}
write value is not found;
}
Time complexity
To calculate time complexity, a tree called decision tree is constructed. A decision tree is
constructed based on mid position of list as well as sub lists that will be generated in searching
process.
Ex:If the number of elements in the list is n=10 then the decision tree is
The decision tree indicates the number of comparisons required to identify values at different
positions of the list. For example, 1 comparison is required to identify the value at 5th position of
the list. Two comparisons are required to identify the value at 2nd position. Three comparisons
are required to identify the value at 6th position.
Time complexity of binary search is calculated by considering 2 cases
Case 1: Successful Search (value is present in the list)
Merge Sort
Merge sort is one of the techniques used to sort a list of values or to arrange a list of values in
ascending order. The process for sorting a list of values is as follows:
Divide the list into two equal parts based on mid position. Divide each part into two equal parts
again. Continue this division process until each part contains only one value. Now, combine the
parts in reverse direction. While combining two parts, compare the values in two parts and place
them in sorting order.
To combine or merge two parts, use the following procedure:
Set a pointer (i) at the beginning of first part. Set a pointer (j) at the beginning of second pointer.
Compare the value at ith position with the value at jth position. If the value at ith position is less
than or equal to the value at jth position then place the value at ith position into the temporary
array and then move the ith pointer to the next position. Otherwise, place the value at jth position
into the temporary array and then move the jth pointer to the next position. Repeat this procedure
until one of the parts is completed. When one of the parts (first or second) is completed then
place the values in other part one by one into the temporary array.
Finally, copy the values in temporary array into the original array.
Ex: sort the following list of values using merge sort technique
1 2 3 4 5 6
7 8
Algorithm of MergeSort
Algorithm MergeSort(a, s, e)
// a is an array containing list of values to be sorted
// s is starting position and h is ending position of the list
{
if s < e then
{
m := (s+e)/2;
MergeSort(a, s, m);
MergeSort(a, m+l, e);
Merge(a, s, m, e);
}
}
Algorithm Merge(a, s, m, e)
// s is starting position of first part, m is ending position of first part
// e is ending position of second part
{
// b is an array used to store values
i := s;
j := m+1;
k := s;
while i <= m and j <= e do
{
if a[i] <= a[j] then
{
b[k] := a[i];
i := i+1;
k := k+1;
}
else
{
b[k] := a[j];
j := j+1;
k := k+1;
}
}
for x := i to m do
{
b[k] := a[x];
k := k+1;
}
for x := j to e do
{
b[k] := a[x];
k := k+1;
}
for x := s to e do
{
a[x] := b[x];
}
}
Quick Sort
Quick sort is a technique to sort a list of values. The procedure for sorting a list of values is as
follows:
Select a value (generally the first value) in the list as pivot value. Set a pointer (i) to the starting
position of list. Set a pointer (j) to the ending position of list. Move the pointer i in forward
direction until the value at ith position is less than or equal to the pivot value. Move the pointer j
in back word direction until the value at jth position is greater than the pivot value. If the position
of i is less than the position of j then swap the values at i and j positions and then move the
i and j pointers again. Otherwise, swap the pivot value with the value at jth position and divide
the list into two parts based on position j. The first includes the values from the starting of list
to (j-1)th position and the second part includes the values from (j+1) th position to ending of list.
Now, sort the first and second parts separately using same procedure.
Ex: sort the following list of values using quick sort technique
Algorithm
Algorithm Quicksort(a, s, e)
// a is an array containing list of values to be sorted
// s is starting position and e is ending position of list
{
if s < e then
{
j := Partition(a, s, e);
Quicksort(a, s, j-1);
Quicksort(a, j+1, e);
}
}
Algorithm Partition(a, s, e)
{
p := a[s];
i := s;
j := e;
while i < j do
{
while a[i] <= p do
i := i+1;
while a[j] > p do
j := j-1;
if i < j then
{
t := a[i];
a[i] := a[j];
a[j]: = t;
}
}
t := a[s];
a[s] := a[j];
a[j] := t;
return j;
}
Time Complexity
Time Complexity of Partition Algorithm:
Partition algorithm moves the pointer i towards right for a maximum of n positions and the
pointer j towards left for a maximum of n positions. So, time complexity of partition
algorithm is n+n+c=2n+c=O(n).
Where c is some constant.
10
20
30
40
50
pi
60
j
10
20
pj
30
40
50
60
When the list is divided into two parts then the first part includes no values, second part includes
(n-1) values.
T(n)=cn+T(0)+T(n-1-0)
T(n)=cn+0+T(n-1)
T(n)=T(n-1)+cn
T(n)=T(n-2) +c(n-1) +cn
T(n)=T(n-2)+2cn-c
T(n)=T(n-3)+3cn-3c
.
.
After n times
T(n)=T(n-n)+ncn-nc
T(n)=0+cn2-nc
T(n) =cn2-cn
T(n)=O(n2)
Best Case
Best case is encountered when the list is divided into 2 equal size parts as incase of merge sort
algorithm. So, Time complexity is
T(n)=cn+T(n/2)+T(n/2)
T(n)=2T(n/2)+cn
T(n)=2[2T(n/2)+cn/2]+cn
T(n)=22 T(n/22)+cn+cn
T(n)=22 T(n/22)+2cn
=22 T(2T(n/8)+cn/4]+2cn
=23T(n/23)+cn+2cn
=23T(n/23)+3cn
:
:
After k times
=2kT(n/2k)+kcn
Assuming that n=2k then k= log2n
T(n)=nT(n/n)+cn(log2n)=nT(1)+cnlog2n=n+cnlog2n=O(nlog2n)
Average Case
Average case is encountered when the list is divided into two unequal size parts. The average
case time complexity is calculated by considering all possible sizes for first part & second part.
Ex: if the list is containing 8 values the possible sizes of first and second parts are
Time complexity is
T(n)=cn+[T(0)+T(1)+.+T(n)]/n+[T(n)+T(n-1)++T(0)]/n
T(n)=cn+2/n[T(0)+T(1)+..+T(n)]
After solving the above equation, the time complexity is
T(n)=O(nlog2n)
Ex:
Algorithm
Algorithm RandomQuickSort(a, s, e)
// a is an array containing the list of values to be sorted
// s is starting position of array and e is ending position of array
{
if s < e then
{
x: = Random(s, e);
t := a[x];
a[x] := a[s];
a[s] := t;
j := Partition(a, s, e);
RandomQuickSort(a, s, j-1);
RandomQuickSort(a, j+1, e);
}
}
Random(s, e) is a user defined function which returns a value between s and e.