DS Unit 5
DS Unit 5
The above figure shows how sequential search works. It searches an element
or value from an array till the desired element or value is not found. If we
search the element 25, it will go step by step in a sequence order. It
searches in a sequence order. Sequential search is applied on the unsorted
or unordered list when there are fewer elements in a list.
2. Binary Search
● Binary Search is used for searching an element in a sorted array.
● It is a fast search algorithm with run-time complexity of O(log n).
●
●
●
●
Sorting
Sorting
Sorting refers to arranging data in a particular format. Sorting algorithm
specifies the way to arrange data in a particular order. Most common orders
are in numerical or lexicographical order.
The importance of sorting lies in the fact that data searching can be
optimized to a very high level, if data is stored in a sorted manner. Sorting
is also used to represent data in more readable formats. Following are some
of the examples of sorting in real-life scenarios −
● Telephone Directory − The telephone directory stores the telephone
numbers of people sorted by their names, so that the names can be
searched easily.
● Dictionary − The dictionary stores words in an alphabetical order so
that searching of any word becomes easy.
In-place Sorting and Not-in-place Sorting
Sorting algorithms may require some extra space for comparison and
temporary storage of few data elements. These algorithms do not require
any extra space and sorting is said to happen in-place, or for example,
within the array itself. This is called in-place sorting. Bubble sort is an
example of in-place sorting.
However, in some sorting algorithms, the program requires space which is
more than or equal to the elements being sorted. Sorting which uses equal
or more space is called not-in-place sorting. Merge-sort is an example of
not-in-place sorting.
Stable and Not Stable Sorting
If a sorting algorithm, after sorting the contents, does not change the
sequence of similar content in which they appear, it is called stable
sorting.
Bubble sort
Bubble sort is a simple sorting algorithm. This sorting algorithm is
comparison-based algorithm in which each pair of adjacent elements is
compared and the elements are swapped if they are not in order. This
algorithm is not suitable for large data sets as its average and worst case
complexity are of Ο(n2) where n is the number of items.
Bubble sort takes Ο(n2) time so we're keeping it short and precise.
How Bubble Sort Works?
1. Starting from the first index, compare the first and the second elements.If
the first element is greater than the second element, they are swapped.
Now, compare the second and the third elements. Swap them if they are not
in order.
The above process goes on until the last element..
The same process goes on for the remaining iterations. After each iteration,
the largest element among the unsorted elements is placed at the end.
In each iteration, the comparison takes place up to the last unsorted
element.
The array is sorted when all the unsorted elements are placed at their
correct positions.
Bubble Sort Algorithm
bubbleSort(array)
for i <- 1 to indexOfLastUnsortedElement-1
if leftElement > rightElement
swap leftElement and rightElement
end bubbleSort
Selection Sort
Selection sort is a simple sorting algorithm. This sorting algorithm is an
in-place comparison-based algorithm in which the list is divided into two
parts, the sorted part at the left end and the unsorted part at the right end.
Initially, the sorted part is empty and the unsorted part is the entire list.
The smallest element is selected from the unsorted array and swapped with
the leftmost element, and that element becomes a part of the sorted array.
This process continues moving unsorted array boundary by one element to
the right.
This algorithm is not suitable for large data sets as its average and worst
case complexities are of Ο(n2), where n is the number of items.
Working of Selection sort
Consider the following depicted array as an example.
For the first position in the sorted list, the whole list is scanned
sequentially. The first position where 14 is stored presently, we search the
whole list and find that 10 is the lowest value.
So we replace 14 with 10. After one iteration 10, which happens to be the
minimum value in the list, appears in the first position of the sorted list.
For the second position, where 33 is residing, we start scanning the rest of
the list in a linear manner.
We find that 14 is the second lowest value in the list and it should appear at
the second place. We swap these values.
After two iterations, two least values are positioned at the beginning in a
sorted manner.
The same process is applied to the rest of the items in the array.
Pseudocode
procedure selection sort
list : array of items
n : size of list
for i = 1 to n - 1
/* set current element as minimum*/
min = i
for j = i+1 to n
if list[j] < list[min] then
min = j;
end if
end for
end procedure
Insertion sort
This is an in-place comparison-based sorting algorithm. Here, a sub-list is
maintained which is always sorted. For example, the lower part of an array
is maintained to be sorted. An element which is to be 'insert'ed in this
sorted sub-list, has to find its appropriate place and then it has to be
inserted there. Hence the name, insertion sort.
The array is searched sequentially and unsorted items are moved and
inserted into the sorted sub-list (in the same array). This algorithm is not
suitable for large data sets as its average and worst case complexity are of
Ο(n2), where n is the number of items.
Working of Insertion sort
We take an unsorted array for our example.
It finds that both 14 and 33 are already in ascending order. For now, 14 is in sorted
sub-list.
It swaps 33 with 27. It also checks with all the elements of sorted sub-list. Here we
see that the sorted sub-list has only one element 14, and 27 is greater than 14.
Hence, the sorted sub-list remains sorted after swapping.
By now we have 14 and 27 in the sorted sub-list. Next, it compares 10 with 33.
This process goes on until all the unsorted values are covered in a sorted sub-list.
Now we shall see some programming aspects of insertion sort.
Pseudocode
procedure insertionSort( A : array of items )
int holePosition
int valueToInsert
end for
end procedure
Quick Sort
The quick sort uses divide and conquer to gain the same advantages as the
merge sort, while not using additional storage. As a trade-off, however, it is
possible that the list may not be divided in half. When this happens, we will
see that performance is diminished.
A quick sort first selects a value, which is called the pivot value. Although
there are many different ways to choose the pivot value, we will simply use
the first item in the list. The role of the pivot value is to assist with splitting
the list. The actual position where the pivot value belongs in the final sorted
list, commonly called the split point, will be used to divide the list for
subsequent calls to the quick sort.
Figure 1 shows that 54 will serve as our first pivot value. Since we have
looked at this example a few times already, we know that 54 will eventually
end up in the position currently holding 31. The partition process will
happen next. It will find the split point and at the same time move other
items to the appropriate side of the list, either less than or greater than the
pivot value.
Figure 3: Completing the Partition Process to Find the Split Point for 54
Pseudocode
Procedure quickSort (alist,first,last)
{
if first<last:
splitpoint = partition(alist,first,last)
quickSort (alist,first,splitpoint-1)
quickSort (alist,splitpoint+1,last) }
Procedure partition(alist,first,last)
{
pivotvalue = alist[first]
leftmark = first+1
rightmark = last
done = False
while not done
{
temp = alist[first]
alist[first] = alist[rightmark]
alist[rightmark] = temp
return rightmark }
alist = [54,26,93,17,77,31,44,55,20]
quickSort(alist,0,len(alist)-1)
print(alist)
Hashing Techniques
General Idea
▪ The ideal hash table data structure is an array of some fixed size,
containing the items
▪ A search is performed based on key
▪ Each key is mapped into some position in the range 0 to TableSize-1
▪ The mapping is called hash function
A hash table
Usually, m << N
h(Ki) = an integer in [0, …, m-1] called the hash value of Ki.
Hashing:
A Function that transforms a key into a table index is called a hash
function. This mapping process is called Hashing
Collision
– Two keys may hash to the same slot
– Can we ensure that any two distinct keys get different cells?
• No, if N>m, where m is the size of the hash table
Solution
⮚ Task 1: Design a good hash function
– That is fast to compute and
– Can minimize the number of collisions
– Distribute the keys evenly among the cells
–
⮚ Task 2: Design a method to resolve the collisions when they occur
⮚
Design Hash Function
▪ A simple and reasonable strategy: h(k) = k mod m
▪ e.g. m=12, k=100, h(k)=4
▪ Requires only a single division operation (quite fast)
▪ Certain values of m should be avoided
▪ If the table size is 10 and the keys end in zero, then
standard hash function is bad choice
▪ It’s a good practice to set the table size m to be a prime number
Deal with String-type Keys
Method 1: Add up the ASCII values of the characters in the string
(Sum of the ASCII values) % Tablesize
Problems:
▪ Different permutations of the same set of characters would have
the same hash value
▪ e.g - key : maytas, satyam both will give same hash value
▪ If the table size is large, the keys are not distribute well.
e.g. Suppose m=10007 and all the keys are eight or fewer characters long.
Since ASCII value <= 127, the hash function can only assume values
between 0 and 127*8=1016
Method 2
Key has at least two characters plus NULL terminator
– If the first 3 characters are random and the table size is
10,0007 => a reasonably equitable distribution
Problem
• English is not random
• Only 28 percent of the table can actually be hashed to (assuming a
table size of 10,007)
Method 3
Primary Clustering
● A block of contiguously occupied table entry is a cluster
● On the average, when we insert a new key K, we may hit the middle of
a cluster. Therefore, the time to insert K would be proportional to half
the size of a cluster. That is, the larger the cluster, the slower the
performance.
● Linear probing has the following disadvantages:
o Once h(K) falls into a cluster, this cluster will definitely grow in
size by one. Thus, this may worsen the performance of
insertion in the future.
o If two clusters are only separated by one entry, then inserting
one key into a cluster can merge the two clusters together.
Thus, the cluster size can increase drastically by a single
insertion. This means that the performance of insertion can
deteriorate drastically after a single insertion.
o Large clusters are easy targets for collisions.
Analysis of Linear Probing
Unsuccessful & Insertion : ≈
successful: ≈