0% found this document useful (0 votes)
58 views

Utkarsh Garg Roll No. 12771 Collaborated With: Kartik Agrawal (12344)

The document discusses the lower bound for sorting algorithms when there are duplicate keys. It defines the entropy formula H and shows that the height h of a decision tree for a comparison-based sorting algorithm must be at least nH-n. This is derived by relating the minimum number of leaves in a tree of height h to the number of possible permutations of keys, taking into account the duplicate keys.

Uploaded by

utkarshiitk20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Utkarsh Garg Roll No. 12771 Collaborated With: Kartik Agrawal (12344)

The document discusses the lower bound for sorting algorithms when there are duplicate keys. It defines the entropy formula H and shows that the height h of a decision tree for a comparison-based sorting algorithm must be at least nH-n. This is derived by relating the minimum number of leaves in a tree of height h to the number of possible permutations of keys, taking into account the duplicate keys.

Uploaded by

utkarshiitk20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Utkarsh Garg

Roll No. 12771


Collaborated with: Kartik Agrawal(12344)
Problem 1.
1.)
Solution:
The worst case scenario for the NR-Qsort occurs if the indices bracketing the partition with more elements is pushed above the partition
with lesser elements.
For example: At first iteration, partition is done such that 2 elements are in one partition and (n-3) elements are in the other partition. The
indices bracketing the first partition are put first into stack. If this method of partitioning and pushing pair of indices is followed for the rest
also it gives the worst case scenario.
Explanation:
At first iteration, indices of smaller partition (size 2) is pushed first and larger(size n-3) is pushed second.
At second iteration the partition with size =n-3 pops out.
Now, again a smaller partition of size 2 and larger of size n-6 are done. Indices of smaller are pushed first and larger pushed second.
Maximum stack depth will be of O(n)
T(n)= T(n-1)+ T(1)+O(n)

O(n) for partition routine

T(n)<= T(n-1) + cn
<= T(n-2 ) + c(n-1) + cn
<= T(n-3) + c(n-2) + c(n-1) + cn
.
2

<= d + c + 2c + 3c + . nc

= O(n )
2.)
Solution.
In this case the maximum elements in the partition pushed above can be n/2(since it is assumed to be having lesser elements). If similar
method of pushing indices is followed then in the worst case partitions with size n/2, n/4, n/8 and so on will be pushed above the other
partition. Hence, maximum stack depth will be equal to the number of levels which is log(n). Hence the worst case depth of Stack is
O(log(n)).
NR-Qsort(A,n)
{
Stack S
Push(S, (1, n))
while S is not empty
{
(p, r) = Pop(S)
q = Partition(A, p, r)
if(q-1> r-q)
if (q - 1 > p)
Push (p, q - 1)
if (r > q + 1)

// S is a stack of pairs of indices

Push (q + 1, r)
Else
if (r > q + 1)
Push (q + 1,r)
if (q - 1 > p)
Push (p, q - 1)
}
}
Pivot in partition routine is selected at random with probability 1/n
Cn=k=1()n (Ck-1+Cn-k))/n +n+1
which gives Cn =O(nlogn)

Utkarsh Garg
Roll No. 12771
Collaborated with: Kartik Agrawal(12344)
Problem 2. Kendall tau distance between permutations
1.)
Solution:
For permutations we will be using a variable different which keep on updating when the comparison of elements of the two
sub-arrays generated after partition results in a smaller element residing in the right partition.
Basically, we are calculating the total number of inversions in the permutation. An inversion is - for i<j, P[i]>P[j]
Pseudo Code:
Int different =0;

// global integer

Copy(P, p, q, Q)

// copies the array into another array and adds a at the end

{
For i = 1 to q-p+1
Q[i] =P[p+i-1]
Q[q-p+2] =
}
Kendall_tau_Merge(P, p, q, r)
{
t1= q-p+1
t2= r-q
L[1, 2, ,t1+1 ] and R[1, 2, ,t2+1] are defined to copy the left and right partition into them
Copy(P, p, q, L)
Copy(P, q+1, r, R)
i =1
j=2
for k= p to r
if L[i] R[j]
P[k]=L[i]
i=i+1
else
P[k]=R[j]
different=different+ (t1-i+1)
j=j+1
}

int Kendall_tau_MergeSort(P, p, r)

// returns the count of number of inversions

{
If(p<r)
q= (p+r)/2
Kendall_tau_MergeSort(P, p, q)
Kendall_tau_MergeSort(P, q+1, r)
Kendall_tau_Merge(P, p, q, r)
Return (different)
}
Int main
{
Int Kendall_tau_distance
Take the inputs for permutation other than identity permutation in array P
Kendall_tau_distance= Kendall_tau_MergeSort(P, p, r)
// p and r are first and last indices of P
}
Time Complexity:
Since the algorithm works same as that of Mergesort (only constant no. of computations are more because of the different variable)
Hence, it is a O(n log n) time algorithm

T(n)=2T(n/2) + (n)+c1 (1)


for n2
T(n)=2T(n/2) + c*n
Nodes are expanded until each leaf node is a problem of size 1, having cost c
The top level contributes cn. Next level contributes c(n/2)+c(n/2)=cn and so on and there are in total 1+logn levels
Hence,
T(n)= (nlogn)

Eg.
A= 2,3,1

Identity permutation= 1,2,3

For A first partition gives L=2,3

and R=1

L further partitions to 2 and 3


Since 2<3, so no change in different
Now, compare L=2, 3 with R=1
Since, 2>1 hence different increases by 2-1+1=2
Which gives Kendall tau distance between permutations

2.)
Solution:
Given two permutations we have to find Kendall tau distance between them

Description:
Take one of the permutations and label its elements as with labels 1, 2, 3, 4 . n
Now, consider the second permutation and label its elements. For each element give the same label as that element was
given in the first permutation as follows
A=5, 2, 4, 1, 3
B=3, 2, 4, 1, 5
For i=1 to 5

// labeling of first array

C[A[i]]=i
For i=1 to 5

// labeling of second array

D[i]=C[B[i]]

Now, we have to compute number of inversions in assigned labels of the second permutation, hence we apply the algorithm
used in part(1) to array D
Time complexity:
The assignment of labels is of order n and the subsequent algorithm on D is (nlogn)
Hence, T(n)= (nlogn)
Eg.
A=5, 2, 4, 1, 3
B=3, 2, 4, 1, 5
C=4, 2, 5, 3, 1
D=5, 2, 3, 4, 1
Now apply algorithm used in part(1) on D which gives Kendall tau distance between A and B as 7

Utkarsh Garg
Roll No. 12771
Collaborated with: Kartik Agrawal(12344)
Problem 3 : Lower bound for sorting with equal keys.
Solution:

H = - (p1 log2 p1 + p2 log2 p2 +: : : + pk log2 pk)


Considering a decision tree for comparison sort algorithm, we know, that if the height of the tree is h than it can have
h
maximum 2 leaves.
Suppose the height is h and it has l leaves. Now, number of leaves will be at least equal to the number of permutations of
keys which is n!/(f1!*f2!*..fk!)
h

Hence, 2 l n!/(f1!*f2!*..fk!)
h

2 n!/(f1!*f2!*..fk!)
h log(n!) i=1() k (log(fi!))

and log(n!) ~ n*log(n)-(n-1/2)+ (1)


Hence,
h n*log(n)-(n-1/2) [i=1() k (fi* log(fi)) (fi -1/2)] (k-1)*
h n*log(n) - [i=1() k (fi* log(fi))] + (i=1() kfi ) {(k-1)/2} + (1)*(1-k)
h -n* [i=1() k ( fi/n)*log (fi/n)] [k-1]/2 + (1)*(1-k)
Now, since (k-1)/2 < n
Therefore,
h (nH-n)

You might also like