0% found this document useful (0 votes)
20 views10 pages

Lecture 12 AG

The document discusses different sorting algorithms like insertion sort, shell sort, heapsort, and recursive sorting. It provides details on how each algorithm works including pseudocode and analysis of time complexity. Key sorting algorithms discussed achieve O(n log n) time like heapsort, while shell sort can achieve O(n^1.2) time in some cases.

Uploaded by

Gold Aurum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views10 pages

Lecture 12 AG

The document discusses different sorting algorithms like insertion sort, shell sort, heapsort, and recursive sorting. It provides details on how each algorithm works including pseudocode and analysis of time complexity. Key sorting algorithms discussed achieve O(n log n) time like heapsort, while shell sort can achieve O(n^1.2) time in some cases.

Uploaded by

Gold Aurum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

15-211 Announcements

Fundamental Structures
of Computer Science § Homework #4 is available
Ø Due on Monday, March 17, 11:59pm
Introduction to Sorting Ø Get started now!

§ Quiz #2
Ø Available on Tuesday, Feb.25
Ø Some questions will be easier if you have
some parts of HW4 working

Ananda Guna § Read


Ø Chapter 8
February 20, 2003

History History Ctd…


History of sorting from Knuth's book: From NIST web site:
Hollerith's system-including punch, tabulator, and
sorter-allowed the official 1890 population count to be
tallied in six months, and in another two years all the
Hollerith's sorting machine developed in census data was completed and defined; the cost was $5
1901-1904 used radix sort. million below the forecasts and saved more than two
years‘time.
Card sorter: electrical connection with His later machines mechanized the card- feeding process,
vat of mercury made through holes. First added numbers, and sorted cards, in addition to merely
counting data.
the 0s pop out, then 1s, etc.
In 1896 Hollerith founded the Tabulating Machine
For 2 -column numerical data would sort of Company, forerunner of Computer Tabulating Recording
Company (CTR). He served as a consulting engineer with
units column first, then re-insert into CTR until retiring in 1921.
machine and sort by tens column.
In 1924 CTR changed its name to IBM- the International
Business Machines Corporation.

Comparison-based sorting Flips and inversions

We assume An unsorted array.


• Items are stored in an array.
24 47 13 99 105 222
• Can be moved around in the array.
• Can compare any two array
elements.
flip
inversion
Comparison has 3 possible outcomes:
Two elements are inverted if A[i] > A[j] for i < j
< = >

1
Insertion sort Insertion sort

105 47 13 99 30 222
for i = 1 to n-1 do
47 105 13 99 30 222
insert a[i] in the proper place
13 47 105 99 30 222
in a[0:i-1]
13 47 99 105 30 222

13 30 47 99 105 222

105 47 13 99 30 222 So sub-array A[0..k-1] is sorted


for k = 1,.,n after k-1 steps

Proof using loop invariants Analysis of Insertion Sort


for i = 1 to n -1 do {

Invariant 1: A[0..i-1] is a sorted permutation of the original A[1..i-1] § In the ith step we do at least 1
j = i-1; key = A[i]; comparison, at most (i-1) comparisons
while (j >= 0 && A[j]>key)
and on average i/2 (call this Ci)
{ Invariant 2: A[j .. i-1] are all larger than the key
A[j+1] = A[j-- ] ; § Mi – the number of moves at the ith step is
} C i_+ 2
A[j] = key;

} § Obtain formulas for Cmin, C ave, C max and


same for Mmin, Mave, M max
§ Proof is left as an exercise. Argue the correctness
of the algorithm by proving that the loop § Exercise
invariants hold and then draw conclusions from § When is the worst case true? Best case
what this implies upon termination of the loops. true? What type of data sets?

How fast is insertion sort? How long does it take to sort?


•Each step of the insertion sort we
are reducing the number of
inversions.
•Takes O(#inversions + N) steps, § Can we do better than O(n2)?
which is very fast if array is nearly ØIn the worst case?
sorted to begin with. I.e no ØIn the average case
inversions.
•We can slightly increase the
performance by doing binary
insertion

2
Heapsort N2 vs Nlog N
§ Remember heaps:
ØbuildHeap has O(n) worst-case running
time. That is Σ 2i(h-i) = O(n)
ØdeleteMin has O(log n) worst-case
running time. So n deleteMin’s would
give a sorted list. N^2
Nlog N

§ Heapsort :
ØBuild heap. O(n)
ØDeleteMin until empty. O(n log n)
ØTotal worst case: O(n log n)

Sorting in O(n log n) Heapsort in practice

§ Heapsort establishes the fact that § The average-case analysis for


sorting can be accomplished in heapsort is somewhat complex.
O(n log n) worst-case running time.
§ In practice, heapsort consistently
tends to use nearly n log n
comparisons. What if the array is
sorted? What is the performance?
§ In fact, later we will see that it is
possible to prove that any sorting § So, while the worst case is better
algorithm will have require at least than n2, other algorithms sometimes
O(n log n) in the worst case. work better.

Shellsort Shell Sort algorithm


z A refinement of insertion sort original 105 47 13 17 30 222 5 19
proposed by D.L.Shell in 1959
zDefine k-sort as a process that
sorts items that are k positions apart. After 4-sort 30 47 5 17 105 222 13 19
zSo one can do a series of k-sorts to
achieve a relatively few movements After 2-sort
of data. 5 17 13 19 30 47 105 222

zA 1-sort is really the insertion sort.


But then most items are in place.
After 1-sort
5 13 17 19 30 47 105 222

3
Shell Sort Analysis Shell Sort Analysis ctd..

§ Each pass benefit from previous § It is shown that for the sequence
ØEach i-sort combines two groups sorted 1,3,7,15,31,… given by
in previous 2i-sort. Øht=1, hk-1=2hk+1 and t = logn –1
§ Any sequence of increments ØFor this sequence, Shell Sort Algorithm
(h1,h2,…) are fine as long as last one is O(n1.2)
is 1. ØProof available but difficult. Ignore till
Øht = 1, hi+1 < hi 15-451.
ØEach h-sort is an insertion sort
§ Very difficult mathematical analysis

Recursive sorting Divide-and-conquer

§ If array is length 1, then done.

§ If array is length N>1, then split in


half and sort each half.
ØThen combine the results.

Divide-and-conquer is big Analysis of recursive sorting

§ Let T(n) be the time required to sort


n elements.
§ We will see several examples of
divide-and-conquer in this course. § Suppose also it takes time n to
combine the two sorted arrays.

§ Then:

4
Recurrence relation A solution

§ A solution for
§ Then such “recursive sorting” is ØT(1) = 1
characterized by the following ØT(N) = 2T(N/2) + N
recurrence relation:
§ is given by
ØT(N) = Nlog N + N
ØT(1) = 1
ØT(n) = 2 T(n/2) + n Øwhich is O(Nlog N).

§ How to solve such equations?

Exact solutions Repeated substitution method


§ One technique is to use repeated
§ It is sometimes possible to derive substitution.
closed-form solutions to recurrence ØT(N) = 2T(N/2) + N
relations. Ø2T(N/2) = 2(2T(N/4) + N/2)
Ø = 4T(N/4) + N
ØT(N) = 4T(N/4) + 2N
§ Several methods exist for doing this. Ø4T(N/4) = 4(2T(N/8) + N/4)
Ø = 8T(N/8) + N
ØT(N) = 8T(N/8) + 3N
ØT(N) = 2kT(N/2k) + kN

Repeated substitution, cont’d Mergesort

§ We end up with § Mergesort is the most basic recursive


ØT(N) = 2kT(N/2k) + kN, for all k>1 sorting algorithm.
ØDivide array in halves A and B.
§ Let’s use k=log N. ØRecursively mergesort each half.
ØNote that 2log N = N. ØCombine A and B by successively
looking at the first elements of A and B
§ So: and moving the smaller one to the
result array.
ØT(N) = NT(1) + Nlog N
Ø = Nlog N + N § Note: Should be a careful to avoid
creating of lots of result arrays.

5
Mergesort Mergesort
But
don’t
actually
want to
create
all of
these
arrays!

Mergesort Analysis of mergesort

§ Mergesort generates almost exactly


the same recurrence relations shown
L R L L before.
ØT(1) = 1
Use simple indexes to perform the split. ØT(N) = 2T(N/2) + N - 1, for N>1

Use a single extra array to hold each


intermediate result. § Thus, mergesort is O(Nlog N).

Upper bounds for rec. relations Divide-and-Conquer Theorem

§ Divide-and-conquer algorithms are § Theorem: Let a, b, c ≥ 0.


very useful in practice. § The recurrence relation a=2, b=1,
c=2 for rec.
ØT(1) = b sorting
§ Furthermore, they all tend to generate ØT(N) = aT(N/c) + bN
similar recurrence relations. Øfor any N which is a power of c
§ has upper-bound solutions
§ As a result, approximate upper-bound
solutions are well-known for ØT(N) = O(N) if a<c
recurrence relations derived from ØT(N) = O(Nlog N) if a=c
divide-and-conquer algorithms. ØT(N) = O(Nlogc a) if a>c

6
Upper-bounds Upper-bounds

§ Corollary:

§ Dividing a problem into p pieces, § Proof of this theorem later in the


each of size N/p, using only a linear semester.
amount of work, results in an
O(Nlog N) algorithm.

Checking a solution Checking a solution

§ Base case:
ØT(1) = 1log 1 + 1 = 1
§ Inductive case:
§ It is also useful sometimes to check ØAssume T(M) = Mlog M + M, all M<N.
that a solution is valid. ØT(N) = 2T(N/2) + N

ØThis can be done by induction.

Checking a solution Checking a solution

§ Base case: § Base case:


ØT(1) = 1log 1 + 1 = 1 ØT(1) = 1log 1 + 1 = 1
§ Inductive case: § Inductive case:
ØAssume T(M) = Mlog M + M, all M<N. ØAssume T(M) = Mlog M + M, all M<N.
ØT(N) = 2T(N/2) + N ØT(N) = 2T(N/2) + N
Ø = 2((N/2)(log(N/2))+N/2)+N
Ø = N(log N - log 2)+2N
Ø = Nlog N - N + 2N
Ø = Nlog N + N

7
Quicksort Quicksort idea

§ Quicksort was invented in 1960 by § Choose a pivot.


Tony Hoare.

§ Although it has O(N2) worst-case


performance, on average it is
O(Nlog N).

§ More importantly, it is the fastest


known comparison-based sorting
algorithm in practice.

Quicksort idea Quicksort idea


§ Choose a pivot. § Choose a pivot.

§ Rearrange so that § Rearrange so that


pivot is in the pivot is in the
“right” spot. “right” spot.

§ Recurse on each
half and conquer!

Quicksort algorithm Quicksort algorithm

§ If array A has 1 (or 0) elements, 105 47 13 17 30 222 5 19


then done.
19
§ Choose a pivot element x from A. 5 17 13 47 30 222 105
§ Divide A-{x} into two arrays:
13 47
ØB = {y∈A | y≤x}
5 17 30 222 105
ØC = {y∈A | y≥x}
§ Quicksort arrays B and C.
105 222
§ Result is B+{x}+C.

8
Quicksort algorithm Doing quicksort in place
105 47 13 17 30 222 5 19
85 24 63 50 17 31 96 45

19
5 17 13 47 30 222 105 85 24 63 45 17 31 96 50

13 47 L R
5 17 30 222 105
85 24 63 45 17 31 96 50

105 222 L R

31 24 63 45 17 85 96 50
In practice, insertion sort is used once the arrays
get “small enough”. L R

Doing quicksort in place Quicksort is fast but hard to do

31 24 63 45 17 85 96 50 § Quicksort , in the early 1960’s, was


famous for being incorrectly
L R
implemented many times.
31 24 17 45 63 85 96 50 ØMore about invariants next time.
L R § Quicksort is very fast in practice.
ØFaster than mergesort because
31 24 17 45 63 85 96 50 Quicksort can be done “in place”.
R L

31 24 17 45 50 85 96 63

Informal analysis Worst-case behavior


§ If there are duplicate elements, then
algorithm does not specify which 5
105 47 13 17 30 222 5 19
subarray B or C should get them.
ØIdeally, split down the middle. 47 13 17 30 222 19 105
§ Also, not specified how to choose the 13
pivot.
ØIdeally, the median value of the array, 47 105 17 30 222 19
17
but this would be expensive to compute.
§ As a result, it is possible that 47 105 19 30 222
Quicksort will show O(N2) behavior. 19

9
Analysis of quicksort Worst-case analysis

§ Assume random pivot. § If the pivot is always the smallest


ØT(0) = 1 element, then:
ØT(1) = 1 ØT(0) = 1
ØT(1) = 1
ØT(N) = T(i) + T(N-i-1) + cN, for N>1
ØT(N) = T(0) + T(N-1) + cN, for N>1
• where I is the size of the left subarray .
Ø ≅ T(N-1) + cN
Ø = O(N2)

§ See the book for details on this


solution.

Best-case analysis Average-case analysis

§ In the best case, the pivot is always § Consider the quicksort tree:
the median element.
105 47 13 17 30 222 5 19

§ In that case, the splits are always


19
“down the middle”. 5 17 13 47 30 222 105

§ Hence, same behavior as mergesort . 13 47


5 17 30 222 105

§ That is, O(Nlog N).


105 222

Average-case analysis Summary of quicksort

§ The time spent at each level of the § A fast sorting algorithm in practice.
tree is O(N).
§ Can be implemented in-place.
§ So, on average, how many levels?
ØThat is, what is the expected height of § But is O(N2) in the worst case.
the tree?
ØIf on average there are O(log N) levels,
then quicksort is O(Nlog N) on average. § Average-case performance?

10

You might also like