Aukland Compsci 220 Introduction To Algorithms Data Structures Formal Languages 4e Itebooks Download
Aukland Compsci 220 Introduction To Algorithms Data Structures Formal Languages 4e Itebooks Download
https://ebookbell.com/product/aukland-compsci-220-introduction-
to-algorithms-data-structures-formal-
languages-4e-itebooks-23836102
https://ebookbell.com/product/aukland-compsci-111-practical-computing-
reference-manual-itebooks-23836050
https://ebookbell.com/product/aukland-compsci-210-computer-
system-1-lecture-notes-itebooks-23836094
https://ebookbell.com/product/from-tamakimakaurau-to-auckland-a-
history-of-auckland-russell-stone-5733964
https://ebookbell.com/product/the-auckland-university-press-anthology-
of-new-zealand-literature-1st-jane-stafford-5735162
A Press Achieved The Emergence Of Auckland University Press 19271972
Dennis Mceldowney
https://ebookbell.com/product/a-press-achieved-the-emergence-of-
auckland-university-press-19271972-dennis-mceldowney-5853060
Auckland The Bay Of Islands Road Trips 1st Edition Brett Atkinson
https://ebookbell.com/product/auckland-the-bay-of-islands-road-
trips-1st-edition-brett-atkinson-7004384
https://ebookbell.com/product/from-tamakimakauraurau-to-auckland-
russell-stone-44875752
https://ebookbell.com/product/volcanoes-of-auckland-a-field-guide-
bruce-w-hayward-46191444
https://ebookbell.com/product/carried-away-auckland-museum-46387344
Introduction to Algorithms and Data Structures
c 2016
(Fourth edition)
Contents
Contents 2
List of Figures 5
List of Tables 7
Preface 8
2 Efficiency of Sorting 39
2.1 The problem of sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2 Insertion sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3 Mergesort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.5 Heapsort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.6 Data selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.7 Lower complexity bound for sorting . . . . . . . . . . . . . . . . . . . . 66
2.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3 Efficiency of Searching 69
3.1 The problem of searching . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 Sorted lists and binary search . . . . . . . . . . . . . . . . . . . . . . . . 72
3.3 Binary search trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4 Self-balancing binary and multiway search trees . . . . . . . . . . . . . 82
3.5 Hash tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Bibliography 234
Index 235
List of Figures
The fourth edition follows the third edition, while incorporates fixes for the errata
discovered in the third edition.
Michael J. Dinneen
Georgy Gimel’farb
Mark C. Wilson
February 2016
Introduction to the Third Edition
The focus for this third edition has been to make an electronic version that students
can read on the tablets and laptops that they bring to lectures. The main changes
from the second edition are:
Michael J. Dinneen
Georgy Gimel’farb
Mark C. Wilson
March 2013
Introduction to the Second Edition
Writing a second edition is a thankless task, as is well known to authors. Much of the
time is spent on small improvements that are not obvious to readers. We have taken
considerable efforts to correct a large number of errors found in the first edition, and
to improve explanation and presentation throughout the book, while retaining the
philosophy behind the original. As far as material goes, the main changes are:
• more exercises and solutions to many of them;
• a new section on maximum matching (Section 5.9);
• a new section on string searching (Part III);
• a Java graph library updated to Java 1.6 and freely available for download.
The web site http://www.cs.auckland.ac.nz/textbookCS220/ for the book pro-
vides additional material including source code. Readers finding errors are encour-
aged to contact us after viewing the errata page at this web site.
In addition to the acknowledgments in the first edition, we thank Sonny Datt for
help with updating the Java graph library, Andrew Hay for help with exercise solu-
tions and Cris Calude for comments. Rob Randtoul (PlasmaDesign.co.uk) kindly
allowed us to use his cube artwork for the book’s cover. Finally, we thank
MJD all students who have struggled to learn from the first edition and have given
us feedback, either positive or negative;
GLG my wife Natasha and all the family for their permanent help and support;
MCW my wife Golbon and sons Yusef and Yahya, for their sacrifices during the writ-
ing of this book, and the joy they bring to my life even in the toughest times.
31 October 2008
Introduction to the First Edition
This book is an expanded, and, we hope, improved version of the coursebook for
the course COMPSCI 220 which we have taught several times in recent years at the
University of Auckland.
We have taken the step of producing this book because there is no single text
available that covers the syllabus of the above course at the level required. Indeed,
we are not aware of any other book that covers all the topics presented here. Our
aim has been to produce a book that is straightforward, concise, and inexpensive,
and suitable for self-study (although a teacher will definitely add value, particularly
where the exercises are concerned). It is an introduction to some key areas at the
theoretical end of computer science, which nevertheless have many practical appli-
cations and are an essential part of any computer science student’s education.
The material in the book is all rather standard. The novelty is in the combina-
tion of topics and some of the presentation. Part I deals with the basics of algorithm
analysis, tools that predict the performance of programs without wasting time im-
plementing them. Part II covers many of the standard fast graph algorithms that
have applications in many different areas of computer science and science in gen-
eral. Part III introduces the theory of formal languages, shifting the focus from what
can be computed quickly to what families of strings can be recognized easily by a
particular type of machine.
The book is designed to be read cover-to-cover. In particular Part I should come
first. However, one can read Part III before Part II with little chance of confusion.
To make best use of the book, one must do the exercises. They vary in difficulty
from routine to tricky. No solutions are provided. This policy may be changed in a
later edition.
The prerequisites for this book are similar to those of the above course, namely
two semesters of programming in a structured language such as Java (currently used
at Auckland). The book contains several appendices which may fill in any gaps in
the reader’s background.
A limited bibliography is given. There are so many texts covering some of the
topics here that to list all of them is pointless. Since we are not claiming novelty
of material, references to research literature are mostly unnecessary and we have
omitted them. More advanced books (some listed in our bibliography) can provide
more references as a student’s knowledge increases.
A few explanatory notes to the reader about this textbook are in order.
We describe algorithms using a pseudocode similar to, but not exactly like, many
structured languages such as Java or C++. Loops and control structures are indented
in fairly traditional fashion. We do not formally define our pseudocode of comment
style (this might make an interesting exercise for a reader who has mastered Part III).
We make considerable use of the idea of ADT (abstract data type). An abstract
data type is a mathematically specified collection of objects together with opera-
tions that can be performed on them, subject to certain rules. An ADT is completely
independent of any computer programming implementation and is a mathematical
structure similar to those studied in pure mathematics. Examples in this book in-
clude digraphs and graphs, along with queues, priority queues, stacks, and lists. A
data structure is simply a higher level entity composed of the elementary memory
addresses related in some way. Examples include arrays, arrays of arrays (matrices),
linked lists, doubly linked lists, etc.
The difference between a data structure and an abstract data type is exemplified
by the difference between a standard linear array and what we call a list. An array is
a basic data structure common to most programming languages, consisting of con-
tiguous memory addresses. To find an element in an array, or insert an element, or
delete an element, we directly use the address of the element. There are no secrets
in an array. By contrast, a list is an ADT. A list is specified by a set S of elements
from some universal set U, together with operations insert, delete, size, isEmpty
and so on (the exact definition depends on who is doing the defining). We denote
the result of the operation as S.isEmpty(), for example. The operations must sat-
isfy certain rules, for example: S.isEmpty() returns a boolean value TRUE or FALSE;
S.insert(x, r) requires that x belong to U and r be an integer between 0 and S.size(),
and returns a list; for any admissible x and r we have S.isEmpty(S.insert(x, r)) =
FALSE, etc. We are not interested in how the operations are to be carried out, only
in what they do. Readers familiar with languages that facilitate object-based and
object-oriented programming will recognize ADTs as, essentially, what are called
classes in Java or C++.
A list can be implemented using an array (to be more efficient, we would also
have an extra integer variable recording the array size). The insert operation, for ex-
ample, can be achieved by accessing the correct memory address of the r-th element
of the array, allocating more space at the end of the array, shifting along some ele-
ments by one, and assigning the element to be inserted to the address vacated by the
shifting. We would also update the size variable by 1. These details are unimportant
in many programming applications. However they are somewhat important when
discussing complexity as we do in Part I. While ADTs allow us to concentrate on algo-
rithms without worrying about details of programming implementation, we cannot
ignore data structures forever, simply because some implementations of ADT oper-
ations are more efficient than others.
In summary, we use ADTs to sweep programming details under the carpet as long
as we can, but we must face them eventually.
A book of this type, written by three authors with different writing styles under
some time pressure, will inevitably contain mistakes. We have been helped to mini-
mize the number of errors by the student participants in the COMPSCI 220 course-
book error-finding competition, and our colleagues Joshua Arulanandham and An-
dre Nies, to whom we are very grateful.
Our presentation has benefitted from the input of our colleagues who have taught
COMPSCI 220 in the recent and past years, with special acknowledgement due to
John Hamer and the late Michael Lennon.
10 February 2004
Part I
Definition 1.2 (informal). The running time (or computing time) of an algorithm is
the number of its elementary operations.
Example 1.4 (Sums of subarrays). The problem is to compute, for each subarray
a[ j.. j + m − 1] of size m in an array a of size n, the partial sum of its elements s[ j] =
m−1
∑k=0 a[ j + k]; j = 0, . . . , n − m. The total number of these subarrays is n − m + 1. At first
glance, we need to compute n − m + 1 sums, each of m items, so that the running time
is proportional to m(n − m + 1). If m is fixed, the time depends still linearly on n.
But if m is growing with n as a fraction of n, such as m = 2n , then T (n) = c 2n n2 + 1
= 0.25cn2 + 0.5cn. The relative weight of the linear part, 0.5cn, decreases quickly with
respect to the quadratic one as n increases. For example, if T (n) = 0.25n2 + 0.5n, we
see in the last column of Table 1.1 the rapid decrease of the ratio of the two terms.
Table 1.1: Relative growth of linear and quadratic terms in an expression.
Thus, for large n only the quadratic term becomes important and the running
time is roughly proportional to n2 , or is quadratic in n. Such algorithms are some-
times called quadratic algorithms in terms of relative changes of running time with
respect to changes of the data size: if T (n) ≈ cn2 then T (10) ≈ 100T (1), or T (100) ≈
10000T (1), or T (100) ≈ 100T (10).
algorithm slowSums
Input: array a[0..2m − 1]
begin
array s[0..m]
for i ← 0 to m do
s[i] ← 0
for j ← 0 to m − 1 do
s[i] ← s[i] + a[i + j]
end for
end for
return s
end
The “brute-force” quadratic algorithm has two nested loops (see Figure 1.2). Let
us analyse it to find out whether it can be simplified. It is easily seen that repeated
computations in the innermost loop are unnecessary. Two successive sums s[i] and
s[i − 1] differ only by two elements: s[i] = s[i − 1] + a[i + m − 1] − a[i − 1]. Thus we need
not repeatedly add m items together after getting the very first sum s[0]. Each next
sum is formed from the current one by using only two elementary operations (ad-
dition and subtraction). Thus T (n) = c(m + 2(n − m)) = c(2n − m). In the first paren-
theses, the first term m relates to computing the first sum s[0], and the second term
2(n − m) reflects that n − m other sums are computed with only two operations per
sum. Therefore, the running time for this better organized computation is always
linear in n for each value m, either fixed or growing with n. The time for comput-
ing all the sums of the contiguous subsequences is less than twice that taken for the
single sum of all n items in Example 1.3
The linear algorithm in Figure 1.3 excludes the innermost loop of the quadratic
algorithm. Now two simple loops, doing m and 2(n − m) elementary operations, re-
spectively, replace the previous nested loop performing m(n − m + 1) operations.
algorithm fastSums
Input: array a[0..2m − 1]
begin
array s[0..m]
s[0] ← 0
for j ← 0 to m − 1 do
s[0] ← s[0] + a[ j]
end for
for i ← 1 to m do
s[i] ← s[i − 1] + a[i + m − 1] − a[i − 1]
end for
return s;
end
Such an outcome is typical for algorithm analysis. In many cases, a careful analy-
sis of the problem allows us to replace a straightforward “brute-force” solution with
much more effective one. But there are no “standard” ways to reach this goal. To ex-
clude unnecessary computation, we have to perform a thorough investigation of the
problem and find hidden relationships between the input data and desired outputs.
In so doing, we should exploit all the tools we have learnt. This book presents many
examples where analysis tools are indeed useful, but knowing how to analyse and
solve each particular problem is still close to an art. The more examples and tools
are mastered, the more the art is learnt.
Exercises
Exercise 1.1.1. A quadratic algorithm with processing time T (n) = cn2 uses 500 ele-
mentary operations for processing 10 data items. How many will it use for processing
1000 data items?
Exercise 1.1.2. Algorithms A and B use exactly TA (n) = cA n lg n and TB (n) = cB n2 ele-
mentary operations, respectively, for a problem of size n. Find the fastest algorithm
for processing n = 220 data items if A and B spend 10 and 1 operations, respectively,
to process 210 ≡ 1024 items.
Additional conditions for executing inner loops only for special values of the
outer variables also decrease running time.
Example 1.6. Let us roughly estimate the running time of the following nested loops:
m←2
for j ← 1 to n do
if j = m then
m ← 2m
for i ← 1 to n do
. . . constant number of elementary operations
end for
end if
end for
m←1
for j ← 1 step j ← j + 1 until n do
if j = m then m ← m · (n − 1)
for i ← 0 step i ← i + 1 until n − 1 do
. . . constant number of elementary operations
end for
end if
end for
Exercise 1.2.2. What is the running time for the following code fragment as a func-
tion of n?
for i ← 1 step i ← 2 ∗ i while i < n do
for j ← 1 step j ← 2 ∗ j while j < n do
if j = 2 ∗ i
for k = 0 step k ← k + 1 while k < n do
. . . constant number of elementary operations
end for
else
for k ← 1 step k ← 3 ∗ k while k < n do
. . . constant number of elementary operations
end for
end if
end for
end for
• The input data size, or the number n of individual data items in a single data
instance to be processed when solving a given problem. Obviously, how to
measure the data size depends on the problem: n means the number of items
to sort (in sorting applications), number of nodes (vertices) or arcs (edges) in
graph algorithms, number of picture elements (pixels) in image processing,
length of a character string in text processing, and so on.
The running time of a program which implements the algorithm is c f (n) where
c is a constant factor depending on a computer, language, operating system, and
compiler. Even if we don’t know the value of the factor c, we are able to answer
the important question: if the input size increases from n = n1 to n = n2 , how does
the relative running time of the program change, all other things being equal? The
(n2 )
answer is obvious: the running time increases by a factor of TT (n = cc ff (n 2) f (n2 )
(n1 ) = f (n1 ) .
1)
As we have already seen, the approximate running time for large input sizes gives
enough information to distinguish between a good and a bad algorithm. Also, the
constant c above can rarely be determined. We need some mathematical notation to
avoid having to say “of the order of . . .” or “roughly proportional to . . .”, and to make
this intuition precise.
The standard mathematical tools “Big Oh” (O), “Big Theta” (Θ), and “Big Omega”
(Ω) do precisely this.
Note. Actually, the above letter O is a capital “omicron” (all letters in this notation
are Greek letters). However, since the Greek omicron and the English “O” are indis-
tinguishable in most fonts, we read O() as “Big Oh” rather than “Big Omicron”.
The algorithms are analysed under the following assumption: if the running time
of an algorithm as a function of n differs only by a constant factor from the running
time for another algorithm, then the two algorithms have essentially the same time
complexity. Functions that measure running time, T (n), have nonnegative values
because time is nonnegative, T (n) ≥ 0. The integer argument n (data size) is also
nonnegative.
Definition 1.7 (Big Oh). Let f (n) and g(n) be nonnegative-valued functions defined
on nonnegative integers n. Then g(n) is O( f (n)) (read “g(n) is Big Oh of f (n)”) iff there
exists a positive real constant c and a positive integer n0 such that g(n) ≤ c f (n) for all
n > n0 .
Note. We use the notation “iff ” as an abbreviation of “if and only if”.
In other words, if g(n) is O( f (n)) then an algorithm with running time g(n) runs for
large n at least as fast, to within a constant factor, as an algorithm with running time
f (n). Usually the term “asymptotically” is used in this context to describe behaviour
of functions for sufficiently large values of n. This term means that g(n) for large n
may approach closer and closer to c · f (n). Thus, O( f (n)) specifies an asymptotic
upper bound.
Note. Sometimes the “Big Oh” property is denoted g(n) = O( f (n)), but we should not
assume that the function g(n) is equal to something called “Big Oh” of f (n). This
notation really means g(n) ∈ O( f (n)), that is, g(n) is a member of the set O( f (n)) of
functions which are increasing, in essence, with the same or lesser rate as n tends to
infinity (n → ∞). In terms of graphs of these functions, g(n) is O( f (n)) iff there exists
a constant c such that the graph of g(n) is always below or at the graph of c f (n) after
a certain point, n0 .
Example 1.8. Function g(n) = 100 log10 n in Figure 1.4 is O(n) because the graph g(n)
is always below the graph of f (n) = n if n > 238 or of f (n) = 0.3n if n > 1000, etc.
T(n) f(n)=n
f(n) = 0.3n
400
g(n)=100 log 10 n
300
200
100
n0 n0
0 200 400 600 800 1000 1200 n
Definition 1.9 (Big Omega). The function g(n) is Ω( f (n)) iff there exists a positive
real constant c and a positive integer n0 such that g(n) ≥ c f (n) for all n > n0 .
“Big Omega” is complementary to “Big Oh” and generalises the concept of “lower
bound” (≥) in the same way as “Big Oh” generalises the concept of “upper bound”
(≤): if g(n) is O( f (n)) then f (n) is Ω(g(n)), and vice versa.
Definition 1.10 (Big Theta). The function g(n) is Θ( f (n)) iff there exist two positive
real constants c1 and c2 and a positive integer n0 such that c1 f (n) ≤ g(n) ≤ c2 f (n) for
all n > n0 .
Whenever two functions, f (n) and g(n), are actually of the same order, g(n) is
Θ( f (n)), they are each “Big Oh” of the other: f (n) is O(g(n)) and g(n) is O( f (n)). In
other words, f (n) is both an asymptotic upper and lower bound for g(n). The “Big
Theta” property means f (n) and g(n) have asymptotically tight bounds and are in
some sense equivalent for our purposes.
In line with the above definitions, g(n) is O( f (n)) iff g(n) grows at most as fast as
f (n) to within a constant factor, g(n) is Ω( f (n)) iff g(n) grows at least as fast as f (n) to
within a constant factor, and g(n) is Θ( f (n)) iff g(n) and f (n) grow at the same rate to
within a constant factor.
“Big Oh”, “Big Theta”, and “Big Omega” notation formally capture two crucial
ideas in comparing algorithms: the exact function, g, is not very important because
it can be multiplied by any arbitrary positive constant, c, and the relative behaviour
of two functions is compared only asymptotically, for large n, but not near the origin
where it may make no sense. Of course, if the constants involved are very large, the
asymptotic behaviour loses practical interest. In most cases, however, the constants
remain fairly small.
In analysing running time, “Big Oh” g(n) ∈ O( f (n)), “Big Omega” g(n) ∈ Ω( f (n)),
and “Big Theta” g(n) ∈ Θ( f (n)) definitions are mostly used with g(n) equal to “exact”
running time on inputs of size n and f (n) equal to a rough approximation to running
time (like log n, n, n2 , and so on).
To prove that some function g(n) is O( f (n)), Ω( f (n)), or Θ( f (n)) using the defi-
nitions we need to find the constants c, n0 or c1 , c2 , n0 specified in Definitions 1.7,
1.9, 1.10. Sometimes the proof is given only by a chain of inequalities, starting with
f (n). In other cases it may involve more intricate techniques, such as mathemati-
cal induction. Usually the manipulations are quite simple. To prove that g(n) is not
O( f (n)), Ω( f (n)), or Θ( f (n)) we have to show the desired constants do not exist, that
is, their assumed existence leads to a contradiction.
Example 1.11. To prove that linear function g(n) = an + b; a > 0, is O(n), we form
the following chain of inequalities: g(n) ≤ an + |b| ≤ (a + |b|)n for all n ≥ 1. Thus,
Definition 1.7 with c = a + |b| and n0 = 1 shows that an + b is O(n).
“Big Oh” hides constant factors so that both 10−10 n and 1010 n are O(n). It is point-
less to write something like O(2n) or O(an + b) because this still means O(n). Also,
only the dominant terms as n → ∞ need be shown as the argument of “Big Oh”, “Big
Omega”, or “Big Theta”.
Example 1.13. The exponential function g(n) = 2n+k , where k is a constant, is O(2n )
because 2n+k = 2k 2n for all n. Generally, mn+k is O(l n ); l ≥ m > 1, because mn+k ≤ l n+k =
l k l n for any constant k.
Example 1.14. For each m > 1, the logarithmic function g(n) = logm (n) has the same
rate of increase as lg(n) because logm (n) = logm (2) lg(n) for all n > 0. Therefore we may
omit the logarithm base when using the “Big-Oh” and “Big Theta” notation: logm n is
Θ(log n).
Constant factors are ignored, and only the powers and functions are taken into
account. It is this ignoring of constant factors that motivates such a notation.
Lemma 1.19 (Limit Rule). Suppose limn→∞ f (n)/g(n) exists (may be ∞), and denote
the limit by L. Then
Proof. If L = 0 then from the definition of limit, in particular there is some n0 such
that f (n)/g(n) ≤ 1 for all n ≥ n0 . Thus f (n) ≤ g(n) for all such n, and f (n) is O(g(n)) by
definition. On the other hand, for each c > 0, it is not the case that f (n) ≥ cg(n) for
all n past some threshold value n1 , so that f (n) is not Ω(g(n)). The other two parts are
proved in the analogous way.
To compute the limit if it exists, the standard L’Hôpital’s rule of calculus is useful
(see Section D.5).
More specific relations follow directly from the basic ones.
Example 1.20. Higher powers of n grow more quickly than lower powers: nk is O(nl )
if 0 ≤ k ≤ l. This follows directly from the limit rule since nk /nl = nk−l has limit 1 if
k = l and 0 if k < l.
Example 1.21. The growth rate of a polynomial is given by the growth rate of its
leading term (ignoring the leading coefficient by the scaling feature): if Pk (n) is a
polynomial of exact degree k then Pk (n) is Θ(nk ). This follows easily from the limit
rule as in the preceding example.
Example 1.22. Exponential functions grow more quickly than powers: nk is O(bn ),
for all b > 1, n > 1, and k ≥ 0. The restrictions on b, n, and k merely ensure that
both functions are increasing. This result can be proved by induction or by using the
limit-L’Hôpital approach above.
Example 1.23. Logarithmic functions grow more slowly than powers: logb n is O(nk )
for all b > 1, k > 0. This is the inverse of the preceding feature. Thus, as a result, log n
is O(n) and n log n is O(n2 ).
Exercises
Exercise 1.3.1. Prove that 10n3 − 5n + 15 is not O(n2 ).
Exercise 1.3.4. Prove that f (n) is Θ(g(n)) if and only if both f (n) is O(g(n) and f (n) is
Ω(g(n)).
Exercise 1.3.5. Using the definition, show that each function f (n) in Table 1.3 stands
in “Big-Oh” relation to the preceding one, that is, n is O(n log n), n log n is O(n1.5 ), and
so forth.
Exercise 1.3.7. Decide on how to reformulate the Rule of Sums (Lemma 1.17) for
“Big Omega” and “Big Theta” notation.
Exercise 1.3.8. Reformulate and prove Lemmas 1.15–1.18 for “Big Omega” notation.
An algorithm is called polynomial time if its running time T (n) is O(nk ) where k is
some fixed positive integer. A computational problem is considered intractable iff
no deterministic algorithm with polynomial time complexity exists for it. But many
problems are classed as intractable only because a polynomial solution is unknown,
and it is a very challenging task to find such a solution for one of them.
Table 1.2: Relative growth of running time T (n) when the input size increases from n = 8 to
n = 1024 provided that T (8) = 1.
Table 1.3: The largest data sizes n that can be processed by an algorithm with time complexity
f (n) provided that T (10) = 1 minute.
Table 1.3 is even more expressive in showing how the time complexity of an algo-
rithm affects the size of problems the algorithm can solve (we again use log2 = lg). A
linear algorithm solving a problem of size n = 10 in exactly one minute will process
about 5.26 million data items per year and 10 times more if we can wait a decade. But
an exponential algorithm with T (10) = 1 minute will deal only with 29 data items af-
ter a year of running and add only 3 more items after a decade. Suppose we have
computers 10, 000 times faster (this is approximately the ratio of a week to a minute).
Then we can solve a problem 10, 000 times, 100 times, or 21.5 times larger than before
if our algorithm is linear, quadratic, or cubic, respectively. But for exponential algo-
rithms, our progress is much worse: we can add only 13 more input values if T (n) is
Θ(2n ).
Therefore, if our algorithm has a constant, logarithmic, log-square, linear, or even
“n log n” time complexity we may be happy and start writing a program with no doubt
that it will meet at least some practical demands. Of course, before taking the plunge,
it is better to check whether the hidden constant c, giving the computation volume
per data item, is sufficiently small in our case. Unfortunately, order relations can be
drastically misleading: for instance, two linear functions 10−4 n and 1010 n are of the
same order O(n), but we should not claim an algorithm with the latter time complex-
ity as a big success.
Therefore, we should follow a simple rule: roughly estimate the computation vol-
ume per data item for the algorithms after comparing their time complexities in a
“Big-Oh” sense! We may estimate the computation volume simply by counting the
number of elementary operations per data item.
In any case we should be very careful even with simple quadratic or cubic algo-
rithms, and especially with exponential algorithms. If the running time is speeded
up in Table 1.3 so that it takes one second per ten data items in all the cases, then we
will still wait about 12 days (220 ≡ 1, 048, 576 seconds) for processing only 30 items by
the exponential algorithm. Estimate yourself whether it is practical to wait until 40
items are processed.
In practice, quadratic and cubic algorithms cannot be used if the input size ex-
ceeds tens of thousands or thousands of items, respectively, and exponential algo-
rithms should be avoided whenever possible unless we always have to process data
of very small size. Because even the most ingenious programming cannot make an
inefficient algorithm fast (we would merely change the value of the hidden constant
c slightly, but not the asymptotic order of the running time), it is better to spend more
time to search for efficient algorithms, even at the expense of a less elegant software
implementation, than to spend time writing a very elegant implementation of an
inefficient algorithm.
Worst-case and average-case performance
We have introduced asymptotic notation in order to measure the running time
of an algorithm. This is expressed in terms of elementary operations. “Big Oh”, “Big
Omega” and “Big Theta” notations allow us to state upper, lower and tight asymp-
totic bounds on running time that are independent of inputs and implementation
details. Thus we can classify algorithms by performance, and search for the “best”
algorithms for solving a particular problem.
However, we have so far neglected one important point. In general, the running
time varies not only according to the size of the input, but the input itself. The ex-
amples in Section 1.4 were unusual in that this was not the case. But later we shall
see many examples where it does occur. For example, some sorting algorithms take
almost no time if the input is already sorted in the desired order, but much longer if
it is not.
If we wish to compare two different algorithms for the same problem, it will be
very complicated to consider their performance on all possible inputs. We need a
simple measure of running time.
The two most common measures of an algorithm are the worst-case running
time, and the average-case running time.
The worst-case running time has several advantages. If we can show, for example,
that our algorithm runs in time O(n log n) no matter what input of size n we consider,
we can be confident that even if we have an “unlucky” input given to our program,
it will not fail to run fairly quickly. For so-called “mission-critical” applications this
is an essential requirement. In addition, an upper bound on the worst-case running
time is usually fairly easy to find.
The main drawback of the worst-case running time as a measure is that it may be
too pessimistic. The real running time might be much lower than an “upper bound”,
the input data causing the worst case may be unlikely to be met in practice, and the
constants c and n0 of the asymptotic notation are unknown and may not be small.
There are many algorithms for which it is difficult to specify the worst-case input.
But even if it is known, the inputs actually encountered in practice may lead to much
lower running times. We shall see later that the most widely used fast sorting algo-
rithm, quicksort, has worst-case quadratic running time, Θ(n2 ), but its running time
for “random” inputs encountered in practice is Θ(n log n).
By contrast, the average-case running time is not as easy to define. The use of
the word “average” shows us that probability is involved. We need to specify a prob-
ability distribution on the inputs. Sometimes this is not too difficult. Often we can
assume that every input of size n is equally likely, and this makes the mathematical
analysis easier. But sometimes an assumption of this sort may not reflect the inputs
encountered in practice. Even if it does, the average-case analysis may be a rather
difficult mathematical challenge requiring intricate and detailed arguments. And of
course the worst-case complexity may be very bad even if the average case complex-
ity is good, so there may be considerable risk involved in using the algorithm.
Whichever measure we adopt for a given algorithm, our goal is to show that its
running time is Θ( f ) for some function f and there is no algorithm with running
time Θ(g) for any function g that grows more slowly than f when n → ∞. In this case
our algorithm is asymptotically optimal for the given problem.
Proving that no other algorithm can be asymptotically better than ours is usually
a difficult matter: we must carefully construct a formal mathematical model of a
computer and derive a lower bound on the complexity of every algorithm to solve
the given problem. In this book we will not pursue this topic much. If our analysis
does show that an upper bound for our algorithm matches the lower one for the
problem, then we need not try to invent a faster one.
Exercises
Exercise 1.4.1. Add columns to Table 1.3 corresponding to one century (10 decades)
and one millennium (10 centuries).
Exercise 1.4.2. Add rows to Table 1.2 for algorithms with time complexity f (n) =
lg lg n and f (n) = n2 lg n.
Example 1.25 (Fibonacci numbers). These are defined by one of the most famous
recurrence relations: F(n) = F(n − 1) + F(n − 2); F(1) = 1, and F(2) = 1. The last two
equations are called the base of the recurrence or initial condition. The recurrence
relation uniquely defines the function F(n) at any number n because any particular
value of the function is easily obtained by generating all the preceding values until
the desired term is produced, for example, F(3) = F(2) + F(1) = 2; F(4) = F(3) +
F(2) = 3, and so forth. Unfortunately, to compute F(10000), we need to perform
9998 additions.
Example 1.26. One more recurrence relation is T (n) = 2T (n − 1) + 1 with the base
condition T (0) = 0. Here, T (1) = 2 · 0 + 1 = 1, T (2) = 2 · 1 + 1 = 3, T (3) = 2 · 3 + 1 = 7,
T (4) = 2 · 7 + 1 = 15, and so on.
T (n) = 22 (2T (n − 3) + 1) + 2 + 1 = 23 T (n − 3) + 22 + 2 + 1
Step 3 substitute T (n − 3) = 2T (n − 4) + 1:
T (n) = 23 (2T (n − 4) + 1) + 22 + 2 + 1
= 24 T (n − 4) + 23 + 22 + 2 + 1
Step . . . . . .
Step n − 2 . . .
As shown in Figure 1.5, rather than successively substitute the terms T (n − 1),
T (n − 2), . . . , T (2), T (1), it is more convenient to write down a sequence of the scaled
relationships for T (n), 2T (n − 1), 22 T (n − 2), . . . , 2n−1 T (1), respectively, then individ-
ually sum left and right columns, and eliminate similar terms in the both sums (the
terms are scaled to facilitate their direct elimination). Such solution is called tele-
scoping because the recurrence unfolds like a telescopic tube.
Although telescoping is not a powerful technique, it returns the desired explicit
forms of most of the basic recurrences that we need in this book (see Examples 1.29–
1.32 below). But it is helpless in the case of the Fibonacci recurrence because after
proper scaling of terms and reducing similar terms in the left and right sums, tele-
scoping returns just the same initial recurrence.
Example 1.29. T (n) = T (n − 1) + n; T (0) = 1.
This relation arises when a recursive algorithm loops through the input to elimi-
nate one item and is easily solved by telescoping:
T (n) = T (n − 1) + n
T (n − 1) = T (n − 2) + (n − 1)
...
T (1) = T (0) + 1
Basic recurrence: an implicit relationship between T(n) and n; the base condition: T(0) = 0
T(n−1) = 2 T(n−2) + 1
substitution
T(n−2) = 2 T(n−3) + 1
substitution
T(1) = 2 T(0) + 1
substitution
n
T(n) = 2 T(0) + 2
n−1
+ ... + 4 + 2 + 1 2n−1 T(1) = 2 n T(0) + 2 n−1
left−side sum right−side sum
n
=2 −1
Explicit relationship between T(n) and n by reducing common left− and right−side terms
By summing left and right columns and eliminating the similar terms, we obtain that
T (n) = T (0) + 1 + 2 + . . . + (n − 2) + (n − 1) + n = n(n+1)
2 so that T (n) is Θ(n2 ).
T (2m ) = T (2m−1 ) + 1
T (2m−1 ) = T (2m−2 ) + 1
...
1
T (2 ) = T (20 ) + 1
T (2m ) = T (2m−1 ) + n
T (2m−1 ) = T (2m−2 ) + n/2
T (2m−2 ) = T (2m−3 ) + n/4
...
T (2) = T (1) + 2
T (1) = T (0) + 1
T (2m ) = 2T (2m−1 ) + 2m
m−1
T (2 ) = 2T (2m−2 ) + 2m−1
...
T (2) = 2T (1) + 2
so that
T (2m ) T (2m−1 )
= +1
2m 2m−1
T (2 m−1 ) T (2m−2 )
= +1
2m−1 2m−2
...
T (2) T (1)
= + 1.
2 1
There exist very helpful parallels between the differentiation / integration in cal-
culus and recurrence analysis by telescoping.
• The difference equation T (n)−2T (n−1) = c rewritten as T (n)−T1 (n−1) = T (n−1)+
c resembles the differential equation dTdx(x) = T (x). Telescoping of the difference
equation results in the formula T (n) = c(2n − 1) whereas the integration of the
differential equation produces the analogous exponential one T (x) = cex .
• The difference equation T (n)−T (n−1) = cn has the differential analogue dTdx(x) =
cx, and both equations have similar solutions T (n) = c n(n+1)
2 and T (x) = 2c x2 , re-
spectively.
The parallels between difference and differential equations may help us in deriving
the desired closed-form solutions of complicated recurrences.
Exercise 1.5.1. Show that the solution in Example 1.31 is also in Ω(n) for general n.
Exercise 1.5.2. Show that the solution T (n) to Example 1.32 is no more than n lg n +
n − 1 for every n ≥ 1. Hint: try induction on n.
Exercise 1.5.4. The running time T (n) of a slightly different algorithm is given by the
recurrence T (n) = kT nk + ckn; T (1) = 0. Derive the explicit expression for T (n) in
terms of c, n, and k under the same assumption n = km and find time complexity of
this algorithm in the “Big-Oh” sense.
Large constants have to be taken into account when an algorithm is very com-
plex, or when we must discriminate between cheap or expensive access to input
data items, or when there may be lack of sufficient memory for storing large data
sets, etc. But even when constants and lower-order terms are considered, the per-
formance predicted by our analysis may differ from the empirical results. Recall that
for very large inputs, even the asymptotic analysis may break down, because some
operations (like addition of large numbers) can no longer be considered as elemen-
tary.
In order to analyse algorithm performance we have used a simplified mathemat-
ical model involving elementary operations. In the past, this allowed for fairly accu-
rate analysis of the actual running time of program implementing a given algorithm.
Unfortunately, the situation has become more complicated in recent years. Sophis-
ticated behaviour of computer hardware such as pipelining and caching means that
the time for elementary operations can vary wildly, making these models less useful
for detailed prediction. Nevertheless, the basic distinction between linear, quadratic,
cubic and exponential time is still as relevant as ever. In other words, the crude
differences captured by the Big-Oh notation give us a very good way of comparing
algorithms; comparing two linear time algorithms, for example, will require more
experimentation.
We can use worst-case and average-case analysis to obtain some meaningful es-
timates of possible algorithm performance. But we must remember that both re-
currences and asymptotic “Big-Oh”, “Big-Omega”, and “Big-Theta” notation are just
mathematical tools used to model certain aspects of algorithms. Like all models,
they are not universally valid and so the mathematical model and the real algorithm
may behave quite differently.
Exercises
Exercise 1.6.1. Algorithms A and B use TA (n) = 5n log10 n and TB (n) = 40n elementary
operations, respectively, for a problem of size n. Which algorithm has better per-
formance in the “Big Oh” sense? Work out exact conditions when each algorithm
outperforms the other.
Exercise 1.6.2. We have to choose one of two algorithms, A and B, to process a
database containing 109 records.√ The average running time of the algorithms is
TA (n) = 0.001n and TB (n) = 500 n, respectively. Which algorithm should be used,
assuming the application is such that we can tolerate the risk of an occasional long
running time?
1.7 Notes
The word algorithm relates to the surname of the great mathematician Muham-
mad ibn Musa al-Khwarizmi, whose life spanned approximately the period 780–850.
His works, translated from Arabic into Latin, for the first time exposed Europeans
to new mathematical ideas such as the Hindu positional decimal notation and step-
by-step rules for addition, subtraction, multiplication, and division of decimal num-
bers. The translation converted his surname into “Algorismus”, and the computa-
tional rules took on this name. Of course, mathematical algorithms existed well
before the term itself. For instance, Euclid’s algorithm for computing the greatest
common divisor of two positive integers was devised over 1000 years before.
The Big-Oh notation was used as long ago as 1894 by Paul Bachmann and then
Edmund Landau for use in number theory. However the other asymptotic notations
Big-Omega and Big-Theta were introduced in 1976 by Donald Knuth (at time of writ-
ing, perhaps the world’s greatest living computer scientist).
Algorithms running in Θ(n log n) time are sometimes called linearithmic, to match
“logarithmic”, “linear”, “quadratic”, etc.
The quadratic equation for φ in Example 1.28 is called the characteristic equa-
tion of the recurrence. A similar technique can be used for solving any constant
coefficient linear recurrence of the form F(n) = ∑Kk=1 ak F(n − k) where K is a fixed
positive integer and the ak are constants.
Chapter 2
Efficiency of Sorting
Sorting rearranges input data according to a particular linear order (see Section D.3
for definitions of order and ordering relations). The most common examples are the
usual dictionary (lexicographic) order on strings, and the usual order on integers.
Once data is sorted, many other problems become easier to solve. Some of these
include: finding an item, finding whether any duplicate items exist, finding the fre-
quency of each distinct item, finding order statistics such as the maximum, mini-
mum, median and quartiles. There are many other interesting applications of sort-
ing, and many different sorting algorithms, each with their own strengths and weak-
nesses. In this chapter we describe and analyse some popular sorting algorithms.
• are the items only related by the order relation, or do they have other restric-
tions (for example, are they all integers from the range 1 to 1000);
• can they be placed into an internal (fast) computer memory or must they be
sorted in external (slow) memory, such as on disk (so called external sorting ).
No one algorithm is the best for all possible situations, and so it is important to
understand the strengths and weaknesses of several algorithms.
As far as computer implementation is concerned, sorting makes sense only for
linear data structures. We will consider lists (see Section C.1 for a review of ba-
sic concepts) which have a first element (the head), a last element (the tail) and a
method of accessing the next element in constant time (an iterator). This includes
array-based lists, and singly- and doubly-linked lists. For some applications we will
need a method of accessing the previous element quickly; singly-linked lists do not
provide this. Also, array-based lists allow fast random access. The element at any
given position may be retrieved in constant time, whereas linked list structures do
not allow this.
Exercises
Exercise 2.1.1. The well-known and obvious selection sort algorithm proceeds as
follows. We split the input list into a head and tail sublist. The head (“sorted”) sublist
is initially empty, and the tail (“unsorted”) sublist is the whole list. The algorithm
successively scans through the tail sublist to find the minimum element and moves
it to the end of the head sublist. It terminates when the tail sublist becomes empty.
(Java code for an array implementation is found in Section A.1).
How many comparisons are required by selection sort in order to sort the input
list (6, 4, 2, 5, 3, 1) ?
Exercise 2.1.2. Show that selection sort uses the same number of comparisons on
every input of a fixed size. How many does it use, exactly, for an input of size n?
Before each step i = 1, 2, . . . , n − 1, the sorted and unsorted parts have i and n −
i elements, respectively. The first element of the unsorted sublist is moved to the
correct position in the sorted sublist by exhaustive backward search, by comparing
it to each element in turn until the right place is reached.
Example 2.2. Table 2.1 shows the execution of insertion sort. Variables Ci and Mi
denote the number of comparisons and number of positions to move backward, re-
spectively, at the ith iteration. Elements in the sorted part are italicized, the currently
sorted element is underlined, and the element to sort next is boldfaced.
Table 2.1: Sample execution of insertion sort.
i Ci Mi Data to sort
25 8 2 91 70 50 20 31 15 65
1 1 1 8 25 2 91 70 50 20 31 15 65
2 2 2 2 8 25 91 70 50 20 31 15 65
3 1 0 2 8 25 91 70 50 20 31 15 65
4 2 1 2 8 25 70 91 50 20 31 15 65
5 3 2 2 8 25 50 70 91 20 31 15 65
6 5 4 2 8 20 25 50 70 91 31 15 65
7 4 3 2 8 20 25 31 50 70 91 15 65
8 7 6 2 8 15 20 25 31 50 70 91 65
9 3 2 2 8 15 20 25 31 50 65 70 91
Since the best case is so much better than the worst, we might hope that on aver-
age, for random input, insertion sort would perform well. Unfortunately, this is not
true.
Proof. We first calculate the average number Ci of comparisons at the ith step. At the
beginning of this step, i elements of the head sublist are already sorted and the next
element has to be inserted into the sorted part. This element will move backward j
steps, for some j with 0 ≤ j ≤ i. If 0 ≤ j ≤ i − 1, the number of comparisons used will
be j + 1. But if j = i (it ends up at the head of the list), there will be only i comparisons
(since no final comparison is needed).
Assuming all possible inputs are equally likely, the value of j will be equally likely
to take any value 0, . . . , i. Thus the expected number of comparisons will be
1 1 i(i + 1) i i
Ci = (1 + 2 + · · · + i − 1 + i + i) = +i = + .
i+1 i+1 2 2 i+1
The running time of insertion sort is strongly related to inversions. The number
of inversions of a list is one measure of how far it is from being sorted.
Definition 2.5. An inversion in a list a is an ordered pair of positions (i, j) such that
i < j but a[i] > a[ j].
Example 2.6. The list (3, 2, 5) has only one inversion corresponding to the pair (3, 2),
the list (5, 2, 3) has two inversions, namely, (5, 2) and (5, 3), the list (3, 2, 5, 1) has four
inversions (3, 2), (3, 1), (2, 1), and (5, 1), and so on.
Example 2.7. Table 2.2 shows the number of inversions, Ii , for each element a[i] of
the list in Table 2.1 with respect to all preceding elements a[0], . . . , a[i − 1] (Ci and Mi
are the same as in Table 2.1).
Note that Ii = Mi in Table 2.1. This is not merely a coincidence—it is always true.
See Exercise 2.2.4.
n−1
The total number of inversions I = ∑i=1 Ii in a list to be sorted by insertion sort
is equal to the total number of positions an element moves backward during the
n−1
sort. The total number of data comparisons C = ∑i=1 Ci is also equal to the total
number of inversions plus at most n − 1. For the initial list in Tables 2.1 and 2.2,
Table 2.2: Number of inversions Ii , comparisons Ci and data moves Mi for each element a[i] in
sample list.
Index i 0 1 2 3 4 5 6 7 8 9
List element a[i] 25 8 2 91 70 50 20 31 15 65
Ii 1 2 0 1 2 4 3 6 2
Ci 1 2 1 2 3 5 4 7 3
Mi 1 2 0 1 2 4 3 6 2
Exercise 2.2.3. Prove that the worst-case time complexity of insertion sort is Θ(n2 )
and the best case is Θ(n).
Exercise 2.2.4. Prove that the number of inversions, Ii , of an element a[i] with respect
to the preceding elements, a[0], . . . , a[i − 1], in the initial list is equal to the number of
positions moved backward by a[i] in the execution of insertion sort.
Exercise 2.2.5. Suppose a sorting algorithm swaps elements a[i] and a[i + gap] of a
list a which were originally out of order. Prove that the number of inversions in the
list is reduced by at least 1 and at most 2 gap − 1.
Exercise 2.2.6. Bubble sort works as follows to sort an array. There is a sorted left
subarray and unsorted right subarray; the left subarray is initially empty. At each
iteration we step through the right subarray, comparing each pair of neighbours in
turn, and swapping them if they are out of order. At the end of each such pass, the
sorted subarray has increased in size by 1, and we repeat the entire procedure from
the beginning of the unsorted subarray. (Java code is found in Section A.1.)
Prove that the average time complexity of bubble sort is Θ(n2 ), and that bubble
sort never makes fewer comparisons than insertion sort.
2.3 Mergesort
This algorithm exploits a recursive divide-and-conquer approach resulting in a
worst-case running time of Θ(n log n), the best asymptotic behaviour that we have
seen so far. Its best, worst, and average cases are very similar, making it a very good
choice if predictable runtime is important. Versions of mergesort are particularly
good for sorting data with slow access times, such as data that cannot be held in
internal memory or are stored in linked lists.
Mergesort is based on the following basic idea.
• Otherwise, separate the list into two lists of equal or nearly equal size and re-
cursively sort the first and second halves separately.
• Finally, merge the two sorted halves into one sorted list.
Clearly, almost all the work is in the merge step, which we should make as effi-
cient as possible. Obviously any merge must take at least time that is linear in the
total size of the two lists in the worst case, since every element must be looked at in
order to determine the correct ordering. We can in fact achieve a linear time merge,
as we see in the next section.
Analysis of mergesort
Lemma 2.8. Mergesort is correct.
Proof. We use induction on the size n of the list. If n = 0 or 1, the result is obviously
correct. Otherwise, mergesort calls itself recursively on two sublists each of which
has size less than n. By induction, these lists are correctly sorted. Provided that the
merge step is correct, the top level call of mergesort then returns the correct answer.
Almost all the work occurs in the merge steps, so we need to perform those effi-
ciently.
Theorem 2.9. Two input sorted lists A and B of size nA and nB , respectively, can be
merged into an output sorted list C of size nC = nA + nB in linear time.
Proof. We first show that the number of comparisons needed is linear in n. Let i,
j, and k be pointers to current positions in the lists A, B, and C, respectively. Ini-
tially, the pointers are at the first positions, i = 0, j = 0, and k = 0. Each time the
smaller of the two elements A[i] and B[ j] is copied to the current entry C[k], and the
corresponding pointers k and either i or j are incremented by 1. After one of the
input lists is exhausted, the rest of the other list is directly copied to list C. Each
comparison advances the pointer k so that the maximum number of comparisons is
nA + nB − 1.
All other operations also take linear time.
The above proof can be visualized easily if we think of the lists as piles of playing
cards placed face up. At each step, we choose the smaller of the two top cards and
move it to the temporary pile.
Example 2.10. If a = (2, 8, 25, 70, 91) and b = (15, 20, 31, 50, 65), then merge into c =
(2, 8, 15, 20, 25, 31, 50, 65, 70, 91) as follows.
Step 1 a[0] = 2 and b[0] = 15 are compared, 2 < 15, and 2 is copied to c, that is, c[0] ← 2,
i ← 0 + 1, and k ← 0 + 1.
Step 2 a[1] = 8 and b[0] = 15 are compared to copy 8 to c, that is, c[1] ← 8, i ← 1 + 1,
and k ← 1 + 1.
Step 3 a[2] = 25 and b[0] = 15 are compared and 15 is copied to c so that c[2] ← 15,
j ← 0 + 1, and k ← 2 + 1.
Step 4 a[2] = 25 and b[1] = 20 are compared and 20 is copied to c: c[3] ← 20, j ← 1 + 1,
and k ← 3 + 1.
Step 5 a[2] = 25 and b[2] = 31 are compared, and 25 is copied to c: c[4] ← 25, i ← 2 + 1,
and k ← 4 + 1.
The process continues as follows: comparing a[3] = 70 and b[2] = 31, a[3] = 70 and
b[3] = 50, and a[3] = 70 and b[4] = 65 results in c[5] ← (b[2] = 31), c[6] ← (b[3] = 50),
and c[7] ← (b[4] = 65), respectively. Because the list b is exhausted, the rest of the list
a is then copied to c, c[8] ← (a[3] = 70) and c[9] ← (a[4] = 91).
We can now see that the running time of mergesort is much better asymptotically
than the naive algorithms that we have previously seen.
Theorem 2.11. The running time of mergesort on an input list of size n is Θ(n log n)
in the best, worst, and average case.
algorithm mergeSort
Input: array a[0..n − 1]; array indices l, r; array t[0..n − 1]
sorts the subarray a[l..r]
begin
if l < r then
m ← l+r
2
mergeSort(a, l, m,t)
mergeSort(a, m + 1, r,t)
merge(a, l, m + 1, r,t)
end if
end
It is easy to see that the recursive version simply divides the list until it reaches
lists of size 1, then merges these repeatedly. We can eliminate the recursion in a
straightforward manner. We first merge lists of size 1 into lists of size 2, then lists of
size 2 into lists of size 4, and so on. This is often called straight mergesort .
Example 2.12. Starting with the input list (1, 5, 7, 3, 6, 4, 2) we merge repeatedly. The
merged sublists are shown with parentheses.
This method works particularly well for linked lists, because the merge steps can
be implemented simply by redefining pointers, without using the extra space re-
quired when using arrays (see Exercise 2.3.4).
Exercises
Exercise 2.3.1. What is the minimum number of comparisons needed when merg-
ing two nonempty sorted lists of total size n into a single list?
Exercise 2.3.2. Give two sorted lists of size 8 whose merging requires the maximum
number of comparisons.
algorithm merge
Input: array a[0..n − 1]; array indices l, r; array index s; array t[0..n − 1]
merges the two sorted subarrays a[l..s − 1] and a[s..r] into a[l..r]
begin
i ← l; j ← s; k ← l
while i ≤ s − 1 and j ≤ r do
if a[i] ≤ a[ j] then t[k] ← a[i]; k ← k + 1; i ← i + 1
else t[k] ← a[ j]; k ← k + 1; j ← j + 1
end if
end while
while i ≤ s − 1 do copy the rest of the first half
t[k] ← a[i]; k ← k + 1; i ← i + 1
end while
while j ≤ r do copy the rest of the second half
t[k] ← a[ j]; k ← k + 1; j ← j + 1
end while
return a ← t
end
Exercise 2.3.3. The 2-way merge in this section can be generalized easily to a k-
way merge for any positive integer k. The running time of such a merge is c(k − 1)n.
Assuming that the running time of insertion sort is cn2 with the same scaling factor
c, analyse the asymptotic running time of the following sorting algorithm (you may
assume that n is an exact power of k).
Find the optimum value of k to get the fastest sort and compare its worst/average
case asymptotic running time with that of insertion sort and mergesort.
Exercise 2.3.4. Explain how to merge two sorted linked lists in linear time into a
bigger sorted linked list, using only a constant amount of extra space.
2.4 Quicksort
This algorithm is also based on the divide-and-conquer paradigm. Unlike merge-
sort, quicksort dynamically forms subarrays depending on the input, rather than
sorting and merging predetermined subarrays. Almost all the work of mergesort was
in the combining of solutions to subproblems, whereas with quicksort, almost all the
work is in the division into subproblems.
Quicksort is very fast in practice on “random” data and is widely used in software
libraries. Unfortunately it is not suitable for mission-critical applications, because it
has very bad worst case behaviour, and that behaviour can sometimes be triggered
more often than an analysis based on random input would suggest.
Basic quicksort is recursive and consists of the following four steps.
• Finally, return the result of quicksort of the “head” sublist, followed by the
pivot, followed by the result of quicksort of the “tail” sublist.
The first step takes into account that recursive dynamic partitioning may pro-
duce empty or single-item sublists. The choice of a pivot at the next step is most
critical because the wrong choice may lead to quadratic time complexity while a
good choice of pivot equalizes both sublists in size (and leads to “n log n” time com-
plexity). Note that we must specify in any implementation what to do with items
equal to the pivot. The third step is where the main work of the algorithm is done,
and obviously we need to specify exactly how to achieve the partitioning step (we do
this below). The final step involves two recursive calls to the same algorithm, with
smaller input.
Analysis of quicksort
All analysis depends on assumptions about the pivot selection and partitioning
methods used. In particular, in order to partition a list about a pivot element as
described above, we must compare each element of the list to the pivot, so at least
n − 1 comparisons are required. This is the right order: it turns out that there are
several methods for partitioning that use Θ(n) comparisons (we shall see some of
them below).
Proof. We use mathematical induction on the size of the list. If the size is 1, the al-
gorithm is clearly correct. Suppose then that n ≥ 2 and the algorithm works correctly
on lists of size smaller than n. Suppose that a is a list of size n, p is the pivot ele-
ment, and i is the position of p after partitioning. Due to the partitioning principle
of quicksort, all elements of the head sublist are no greater than p, and all elements
Other documents randomly have
different content
jamais plus volontiers qu'au milieu des dangers, et qui ne
l'abandonna pas en prison. Quoi qu'on en ait dit, il était plein de
cœur. Il aimait ses amis; il n'en a jamais trahi un seul. Il en exigeait
beaucoup, mais il leur donnait beaucoup. Il prodiguait leur sang,
comme le sien, sur les champs de bataille; mais il les poussait et
demandait pour eux encore plus que pour lui. Un autre, après
Rocroy, eût été jaloux de Gassion, qu'on voulait faire passer pour
avoir conseillé la manœuvre qui décida du sort de la journée[200]; lui,
du champ de bataille, demanda pour Gassion le bâton de maréchal
de France, et la charge de maréchal de camp pour Sirot qui, à la tête
de la réserve, avait achevé la victoire. Lorsqu'au combat de la rue
Saint-Antoine, échappé à grand'peine du carnage, harassé de
fatigue, défait, couvert de sang, il arriva l'épée encore à la main chez
Mademoiselle, son premier cri fut, avec un torrent de larmes: «Ah!
Madame, vous voyez un homme qui a perdu tous ses amis!» A
Bruxelles, quand il négocia sa rentrée en France, il mit dans les
conditions de son traité tous ceux qui l'avaient suivi. Après cela il
était prince, et se permettait tout en paroles. Il a fait des vers très
spirituels, mais satiriques et quelque peu soldatesques[201]. Il aima
une fois et à l'espagnole, selon toutes les règles de l'hôtel de
Rambouillet. Tout à l'heure, nous ferons connaître l'objet de cette
passion touchante qui honore à jamais le grand Condé; mais nous
pouvons dire d'avance que l'héroïne était digne du héros.
Représentez-vous ces deux jeunes gens à l'hôtel de Rambouillet.
Condé s'y amusait beaucoup et riait très volontiers avec Voiture et
les beaux-esprits à sa suite; mais son homme était particulièrement
Corneille. Celui-ci qui était pauvre, sans nul ordre, et avait toujours
besoin d'argent, s'est plaint à Segrais, Normand comme lui, que le
prince de Condé qui professait tant d'admiration pour ses ouvrages,
ne lui avait jamais fait de grandes largesses[202]. Mais quelle pension,
je vous prie, eût valu Condé assistant à la première représentation
de Cinna et laissant éclater ses sanglots à ces incomparables vers:
«Le bruit que le zéphyr excite parmi les feuilles des bocages quand
la nuit va couvrir la terre agitoit doucement la forêt de Chantilly,
lorsque, dans la grande route, trois nymphes apparurent au solitaire
Tircis. Elles n'étoient pas de ces pauvres nymphes des bois, plus
dignes de pitié que d'envie, qui, pour logis et pour habit, n'ont que
l'écorce des arbres. Leur équipage étoit superbe et leurs vêtements
brillants... La plus âgée, par la majesté de son visage, imprimoit un
profond respect à ceux qui l'approchoient. Celle qui se trouvoit à
côté faisoit éclater une beauté plus accomplie que la peinture, la
sculpture ni la poésie n'en ont pu jamais imaginer. La troisième avoit
cet air aisé et facile que l'on donne aux Grâces.
On se doute bien que Mlle de Bourbon n'avait pas plus mal choisi que
son frère. Elle s'était liée avec la marquise de Sablé, qui devint l'amie
de toute sa vie; mais, beaucoup plus jeune qu'elle, elle avait des
compagnes sinon plus chères, au moins plus familières: elle s'était
formé une société intime, particulièrement composée de Mlle de
Rambouillet, de Mlles Du Vigean, et de ses deux cousines, Mlles de
Bouteville. Il faut convenir que c'était là un nid de beautés
attrayantes et redoutables, encore unies dans leur gracieuse
adolescence, mais destinées à se séparer bientôt et à devenir rivales
ou ennemies.
Voiture, on le conçoit, prenait grand soin de ces belles demoiselles,
et surtout de Mlle de Bourbon: il la célébrait en vers et en prose, sur
tous les tons et en toute occasion. Même dans ses lettres écrites à
d'autres, il ne tarit pas sur son esprit et sa beauté: «L'esprit de Mlle
de Bourbon, dit-il, peut seul faire douter si sa beauté est la plus
parfaite chose du monde.» Lui aussi, c'est toujours à un ange qu'il
se plaît à la comparer:
Ailleurs:
Ce ne sont là que des fadeurs banales, tandis que les deux petites
pièces suivantes ont au moins l'avantage de décrire la personne de
Mlle de Bourbon telle qu'elle était alors, avant son mariage, quelques
années après le portrait de Du Cayer. On y voit Mlle de Bourbon
commençant à tenir les promesses de son adolescence, et
l'angélique visage, que nous a montré rapidement Mme de Motteville,
déjà accompagné des autres attraits de la véritable beauté:
POUR MADEMOISELLE DE BOURBON.
POUR LA MÊME.
Beau teint de lis sur qui la rose éclate,
Attraits doux et perçans
Qui nous charment les sens,
Beaux cheveux blonds, belle bouche incarnate;
Rare beauté, peut-on n'admirer pas
Vos aimables appas?
..........................
Sitôt qu'elle nacquit, ses beaux yeux sans pareils
Parurent deux soleils;
Son teint fut fait de lis, et sur ses lèvres closes
On vit naître des roses;
Puis elle les ouvrit et fit voir en riant
Des perles d'Orient.
Elle faisoit mourir par un regard aimable
Autant que redoutable;
Puis d'un autre soudain que la sainte jetoit,
Elle ressuscitoit, etc.
.........................
Les dames bien souvent, aux plus belles journées,
Montent des haquenées;
On vole la perdrix ou l'on chasse le lou
En allant à Merlou.
.........................
ebookbell.com