Unit 5
Unit 5
In this search particular element is search sequentially in the whole list. Suppose LA is a
linear array with N elements. Given no other information about LA, the most intuitive way to
search for a given ITEM in LA is to compare ITEM with each element of LA one by one.
To simplify the matter we first assign ITEM to LA[N+1], the position following the last
element of LA. Now LA[1] is compare with LA[n+1] if both same then LOC=LA[1] and if
not then LA[2] is compare with LA[n+1] and son in the end if LOC is found before the
LA[n+1] then search is successful otherwise search is not successful. The purpose of this
initial assignment is to avoid repeatedly testing whether or not we have reached the end of the
array DATA.
This method of searching is called linear search or sequential search.
(Here LA is a linear array with N elements, and ITEM is a given item of formation. This
algorithm finds the location of ITEM in LA.)
• Worst Case: Clearly the worst case occurs when ITEM is the last element in the array
DATA or is not there at all.
• Accordingly, T(n) =O(n) is the worst-case complexity of the linear search algorithm.
•
Average Case: Here we assume that ITEM does appear in DATA, and that is equally
likely to occur at any position in the array. Accordingly, the number of comparisons
can be any of the numbers 1,2,3,..., n, and each number occurs with probability p=1/n.
Then T(n) = O(n/2)
1
Q2. Explain Binary search , give its algorithm and time complexity?
The basic idea is to start with an examination of the middle element of the array. This
will lead to 3 possible situations:
If this matches the target K, then search can terminate successfully, by printing out the
index of the element in the array.
On the other hand, if K<A[middle], then search can be limited to elements to the left of
A[middle]. All elements to the right of middle can be ignored.
If it turns out that K >A[middle], then further search is limited to elements to the right of
A[middle].
If all elements are exhausted and the target is not found in the array, then the method
returns a special value such as –1.
2
Complexity Analysis:
Let us now carry out an Analysis of this method to determine its time complexity. Since
there are no “for” loops, we can not use summations to express the total number of
operations. Let us examine the operations for a specific case, where the number of
elements in the array n is 64.
Thus we see that BinarySearch function is called 6 times ( 6 elements of the array were
examined) for n =64.
Note that 64 = 2 pow 6
Also we see that the BinarySearch function is called 5 times ( 5 elements of the array
were examined) for n = 32.
Note that 32 = 2 pow 5
Let us consider a more general case where n is still a power of 2. Let us say n = 2 k .
Following the above argument for 64 elements, it is easily seen that after k searches, the
while loop is executed k times and n reduces to size 1.
Let us assume that each run of the while loop involves at most 5 operations.
Thus total number of operations: 5k.
The value of k can be determined from the expression
2 pow k = n
Taking log of both sides
k = log n
Thus total number of operations = 5 log n.
We conclude from there that the time complexity of the Binary search method is O(log
n), which is much more efficient than the Linear Search method.
Complexity of binary search is O(logn) (where n is the input size)
The Search time of each algorithm depends on the number n of elements in the collection S
of data. Hashing or Hash addressing is a searching technique which is independent of the
number n. Assume that there is a file F of n records with a set K of keys which uniquely
determine the records in F.
3
Example: Suppose a company has 68 employees, employees are assigned a 4-digit employee
number to each employee, which is used as the primary key. We can use the employee
number as the address of the record in memory.
The search will require no comparison, but this technique will require space for 10,000
memory locations because the highest 4 digit number is 9999.
So the general idea of using the key to determine the address of a record is an excellent idea,
but it is further modified, so that a great deal of space is not wasted.
This modification takes the form of a function H from the set K of keys into the set L of
memory addresses.
Such a function
H:K->L
Is called a hash function or hashing function, unfortunately such a function H may not yield
distinct values, it is possible that two different keys k1 and k2 will yield the same hash
address.
This situation is called collision and some method must be used to resolve it.
Hash Functions
The principal criteria used in selecting a hash function H:K->L are as follows:
2. The function H should uniformly distribute the hash addresses throughout the set L so
that there are a minimum number of collisions.
There is no guarantee that the second condition will be fulfilled, however there are certain
techniques.
One technique is to “chop” a key K into pieces and combine the pieces in some way to form
the hash address H(K).
a) Division Method:
Choose a number m larger than the number n of keys in K. the number m is usually chosen to
be a prime number or a number without small divisors, since this frequently minimizes the
number of collisions.
4
b) Midsquare Method: The key k is squared, then the Hash Function H is defined as H(K)
= l where l is obtained by deleting digits from both ends of K2. The same positions of K2
must be used for all the keys.
c) Folding Method: The key K is partitioned into a number of parts k1,…., kr where each
part, except possibly the last, has the same number of digits as the required address, then the
parts are added together, ignoring the last carry. That is H(K) = K1 + K2 + …. + Kr where
the leading digit carries, if any, are ignored.
Sometimes the even-numbered parts K2, K4,… are each reversed before the addition.
Example:
Suppose a company with 68 employees, each employee is assigned a unique 4-digit employee
number, suppose L consists of 100 two-digit addresses i.e. 00, 01, 02,….99.
Solution:
Division Method:
In the case the memory addresses begin with 01 rather than 00, we choose the function
H(K) = K(mod m) + 1
Midsquare Method:
H(K) : 72 93 99
The fourth and the fifth digits, counting from the right are choosen for the hash address.
Folding Method:
Chopping the key K into two parts and adding yields the following hash addresses:
5
H(3205) = 32 + 05 = 37
H(7148) = 71 + 48 = 19
H(2345) = 23 + 45 = 68
Alternatively, folding method can be performed by reversing the second part before adding.
H(3205) = 32 + 50 = 82
H(7148) = 71 + 84 = 55
H(2345) = 23 + 54 = 77