Algorithm
Algorithm
Segment 1
References:
1. Introduction to Algorithms- Cormen
2. Computer Algorithms-Shanny
3. Algorithms-S. Dasgupta
1
Definition
• Algorithm is considered as one of the fundamental concepts in
computer science
• Informally
1) An algorithm is a set of steps that define how a task is
performed
-Examples include algorithms for constructing model airplanes,
for operating washing machines and for playing music.
2) An algorithm is any well-defined computational procedure
that takes some value, or set of values, as input and produces
some value or set of values as output. An algorithm is thus a
sequence of computational steps that transform the input into the
output.
- Example of this type is a sorting problem (i.e. sort a sequence
2
of numbers into non-decreasing order)
What kinds of problems are solved by
algorithms?
Sorting is by no means the only computational problem for
which algorithms have been developed. Practical
applications of algorithms are ubiquitous and include the
following examples:
• The Human Genome Project has made great progress toward
the goals of identifying all the 100,000 genes in human DNA,
determining the sequences of the 3 billion chemical base pairs
that make up human DNA, storing this information in databases,
and developing tools for data analysis. Each of these steps
requires sophisticated algorithms.
3
What kinds of problems are solved by
algorithms?
• The Internet enables people all around the world to quickly
access and retrieve large amounts of information. With the aid of
clever algorithms, sites on the Internet are able to manage and
manipulate this large volume of data. Examples of problems that
make essential use of algorithms include finding good routes on
which the data will travel and using a search engine to quickly find
pages on which particular information resides.
• Electronic commerce enables goods and services to be negotiated
and exchanged electronically, and it depends on the privacy of
personal information such as credit card numbers, passwords, and
bank statements. The core technologies used in electronic commerce
include public-key cryptography and digital signatures, which are
based on numerical algorithms and number theory.
4
What kinds of problems are solved by
algorithms?
• Manufacturing and other commercial enterprises often need to
allocate scarce resources in the most beneficial way.
• An oil company may wish to know where to place its wells in order to
maximize its expected profit.
• A political candidate may want to determine where to spend money
buying campaign advertising in order to maximize the chances of
winning an election.
• An airline may wish to assign crews to flights in the least expensive
way possible, making sure that each flight is covered and that
government regulations regarding crew scheduling are met.
• An Internet service provider may wish to determine where to place
additional resources in order to serve its customers more effectively.
All of these are examples of problems that can be solved using linear
programming,
5
What kinds of problems are solved by
algorithms? (Some Specific Problems)
• We are given a road map on which the distance between each pair
of adjacent intersections is marked, and we wish to determine the
shortest route from one intersection to another. The number of
possible routes can be huge, even if we disallow routes that cross
over themselves. How do we choose which of all possible routes is
the shortest?
• We are given two ordered sequences of symbols, X =<x1, x2,…,
xm> and Y=<y1, y2, …,yn>, and we wish to find a longest
common subsequence of X and Y . A subsequence of X is just X
with some (or possibly all or none) of its elements removed.
• We are given n points in the plane, and we wish to find the
convex hull of these points.
6
Definition
• Formally
- An algorithm is an ordered set of unambiguous,
executable steps, defining a terminating process
• Question
• In what sense do the steps described by the following
list of instructions fail to constitute an algorithm?
Step 1. Take a coin out of your pocket and put it on the
table
Step 2. Return to Step 1
• A machine-compatible representation of an algorithm is
called a program 7
Criteria/Properties
•All algorithms must satisfy the following criteria:
1. Input: Zero or more quantities are externally supplied.
2. Output: At least one quantity is produced.
3. Definiteness: Each instruction is clear and umabiguous.
4. Finiteness: If we trace out the instructions of an
algorithm, then for all cases, the algorithm terminates
after a finite number of steps.
5. Effectiveness: Every instruction must be very basic so
that it can be carried out, in principle, by a person using
only pencil and paper. It must be feasible.
•Algorithms that are definite and effective are also called computational
procedures. One important example of computational procedures 8is the
operating system of a digital computer.
Areas of Study
• The study of algorithms includes many important and
active areas of research. There are four distinct areas of
study one can identify
1. How to devise algorithms: an understanding of the
algorithmic structures and their representation in the form of
Pseudocode or flowcharts.
- algorithm discovery - problem solving approaches/techniques
2. How to validate algorithms: determining the correctness
3. How to analyze algorithms: determining the time and
storage of an algorithm requires.
4. How to test a program: debugging
• Algorithm specification
- Pseudocode: algorithms are represented with precisely
defined textural structure
- Flowcharts 9
History
• The word algorithm comes from the name of a Persian
author, Abu Ja’far Mohammad ibn Musa al Khowarizmi,
who wrote a text book on mathematics.
• Began as a subject in mathematics.
• Greek mathematician Euclid invented an algorithm for
finding the greatest common divisor (gcd) of two
positive integer (between 400 and 300 B.C.) -
meaningful algorithm ever discovered.
• Weaving loom invented in 1801 by a Frenchman, Joseph
Jacquard
• Charles Babbage, English mathematician - the difference
engine and the analytical engine, capable of executing
algorithms
10
Algorithms as a technology
• Suppose computers were infinitely fast and computer memory was
free. Would you have any reason to study algorithms?
• Of course, computers may be fast, but they are not infinitely fast. And
memory may be inexpensive, but it is not free. Computing time is
therefore a bounded resource, and so is space in memory. You should
use these resources wisely, and algorithms that are efficient in terms
of time or space will help you do so.
• Analysis of algorithm or performance analysis refers to the task of
determining how mach computing time and storage an algorithm
requires. Generally, by analyzing several candidate algorithm for a
problem, a most efficient one can be easily identified.
11
Algorithms as a technology
Efficiency
• Different algorithms devised to solve the same problem often
differ dramatically in their efficiency. These differences can be
much more significant than differences due to hardware and
software.
we will see two algorithms for sorting. The first, insertion
sort, takes time roughly equal to c1n 2 to sort n items,
where c1 is a constant that does not depend on n. The second,
merge sort, takes time roughly equal to c2 n log2 n, where c2 is
another constant that also does not depend on n. Insertion sort
typically has a smaller constant factor than merge sort, so that
c1 < c2. Let us fit a faster computer (computerA) running
insertion sort against a slower computer (ComputerB) running
merge sort. Each computer sort an array of 10 million
12
numbers.
Algorithms as a technology
Assuming 1m operations/sec. 14
Insertion Sort
We start with insertion sort, which is an efficient algorithm
for sorting a small number of elements.
15
Insertion Sort (Cont.)
16
Insertion Sort (Cont.)
17
Loop invariants and the correctness of
insertion sort
Figure 2.2 shows how this algorithm works for A =<5; 2; 4; 6;
1; 3>. The index j indicates the “current card” being inserted
into the hand. At the beginning of each iteration of the for
loop, which is indexed by j , the subarray consisting of
elements A[1.. j-1] constitutes the currently sorted hand, and
the remaining subarray A[j+1, n] corresponds to the pile of
cards still on the table. In fact, elements A[1…j-1] are the
elements originally in positions 1 through j-1, but now in
sorted order. We state these properties of A[1… j-1] formally
as a loop invariant.
18
Loop invariants and the correctness of
insertion sort
We use loop invariants to help us understand why an
algorithm is correct. We must show three things about a
loop invariant:
Initialization: It is true prior to the first iteration of the loop.
Maintenance: If it is true before an iteration of the loop, it
remains true before the next iteration.
Termination: When the loop terminates, the invariant gives us
a useful property that helps show that the algorithm is correct.
19
Correctness of insertion sort
Initialization: We start by showing that the loop invariant holds
before the first loop iteration, when j=2. The subarray A[1.. j-1],
therefore, consists of just the single element A[1], which is in fact the
original element in A[1]. Moreover, this subarray is sorted, which
shows that the loop invariant holds prior to the first iteration of the
loop.
Maintenance: Next we tackle the second property: showing that
each iteration maintains the loop invariant. Informally, the body of
the for loop works by moving A[j-1], A[j -2], A[j-3], and so on by
one position to the right until it finds the proper position for A[j]
(lines 4–7), at which point it inserts the value of A[j] (line 8). The
subarray A[1… j-1] then consists of the elements originally in A[1…
j], but in sorted order. Incrementing j for the next iteration of the for
loop then preserves the loop invariant.
20
Correctness of insertion sort
Termination: Finally, we examine what happens when the loop
terminates. The condition causing the for loop to terminate is that
j >A.length=n. Because each loop iteration increases j by 1, we
must have j=n+1 at that time. Substituting n+1 for j in the wording
of loop invariant, we have that the subarray A[1… n] consists of
the elements originally in A[1…n], but in sorted order. Observing
that the subarray A[1…n] is the entire array, we conclude that the
entire array is sorted. Hence, the algorithm is correct.
21
Pseudocode conventions
Home work: See the page 19 & 20
Rewrite the INSERTION-SORT procedure to sort into
nonincreasing instead of nondecreasing order.
22
Analyzing algorithms
Analyzing an algorithm has come to mean predicting the
resources that the algorithm requires. Occasionally, resources
such as memory, communication bandwidth, or computer
hardware are of primary concern, but most often it is
computational time that we want to measure.
Generally, by analyzing several candidate algorithms for a
problem, we can identify a most efficient one.
Such analysis may indicate more than one viable candidate, but
we can often discard several inferior algorithms in the process.
23
Level of analysis
• Time taken by an algorithm?
– Number of seconds a program takes to run? No…
– Too machine dependent: not measuring the algorithm
so much as the program, machine, implementation
• Here, not interested in low level machine details
– Want technology-independent answers
• Interested rather in measuring
– Which data take most space
– Which operations take most time in a device-
independent way
24
Level of analysis (Cont.)
• Notion of elementary operation
– 1 assignment, 1 addition, 1 comparison (test), …
• Determine how many elementary operations
algorithm performs for inputs of different
sizes (problem sizes), or same size diff. order
• Difference thus between performance and
complexity
– Complexity affects performance, but not v.v.
– Interested in how resource requirements scale as
problem gets larger
25
Best, worse, average cases
• Hardly ever true that algorithm takes same time
on different instances of same size
• Choice: best, worse or average case analysis
• Best case analysis
– Time taken on best input of size n
• Worst case analysis
– Time taken on worst input of size n
• Average case analysis
– Time taken as average of time taken on inputs of
size n
26
Which to choose?
• Usually interested in worst case scenario:
– The most operations that might be made for some
problem size
• Worst case is only safe analysis – guaranteed
upper bound (best case too optimistic)
• Average case analysis harder
– Usually have to assume some probability distribution
of the data, more sophisticated model
– E.g. if looking for a specific letter in a random list of
letters, might expect letter to appear 1/26 of time (for
English)
27
int Max(int A[n])
\\ returns Max value in array A
{
int i;
Max = A[0];
for (i = 1; i < n; i++)
if (Max < A[i])
Max = A[i];
return(Max);
}
• Worst case: THEN always executed
28
Scale, order vs exactness
• When looking at complexity, usually ignore exact
number of operations and cost of small, infrequent
ones – concentrate on size of input, i.e.
• Interested more in how number of operations
relates to problem size – of big problems
29
Analysis example
// sum first n integers in array a
int sum(int a[], int n)
{
int sum = 0;
for (int j = 0; j < n; j++)
sum = sum + a[j];
return(sum);
}
30
General methodology
1. Characterize the size of the input
Input is an array containing at least n integers
Thus, size of input is n
2. Count how many operations (steps) are
taken in the algorithm for an input of size n
1 step is an elementary operation
+, <, a[j] (indexing into an array), =, …
31
Detail of analysis
int sum(int a[], int n) {
int sum = 0;
for (int j = 0; j < n; j++ )
sum = sum + a[j];
return(sum);
}
• 1,2,8: only happen once (so 3 such operations)
• 3,4,5,6,7: happen once for each iteration of loop (5 * n)
• Total operations: 5n + 3
• Complexity function: f(n) = 5n + 3
32
How does size affect running time?
• 5n + 3 is an estimate of running time for
different values of n
n Operations
10 53
100 503
1000 5003
1000000 5000003
• As n grows, number of operations grows in
linear proportion to n, for this sum function
33
Summary of methodology
• Count operations/steps taken by algorithm
• Use count to derive formula, based on size
of problem n
– Another formula might be, e.g.: n2
– Or 2n or n2/2 or (n+1)2 or 7n2 + 12n + 4
• Use formula to help understand overall
efficiency
34
Usefulness?
• What if we kept doubling size of n?
n log2n 5n nlog2n n2 2n
8 3 40 24 64 256
16 4 80 64 256 65536
32 5 160 160 1024 ~109
64 6 320 384 4096 ~1019
128 7 640 896 16384 ~1038
256 8 1280 2048 65536 ~1076
10000 13 50000 105 108 ~103010
35
Analyzing Insertion Sort
36
Analyzing Insertion Sort
The running time of the algorithm is the sum of running times for
each statement executed; a statement that takes ci steps to execute
and executes n times will contribute ci.n to the total running time.
To compute T(n), the running time of INSERTION-SORT on an
input of n values, we sum the products of the cost and times
columns, obtaining
37
Analyzing Insertion Sort
• Even for inputs of a given size, an algorithm’s running time may
depend on which input of that size is given.
• The best case occurs if the array is already sorted. For each i=2, 3,4,
….n, we find that A[j]<= key in while loop when j has its initial
value of i-1. Thus ti 1 for i=2,3,4,….n, and the best case
running time is
38
Analyzing Insertion Sort
• If the array is in reverse sorted order—that is, in decreasing order—
the worst case results. We must compare each element A[j] with each
element in the entire sorted subarray A[1… j-1], and so tj = j for
j=2,3…,n. Noting that.
39
Analyzing Insertion Sort
We find that in the worst case, the running time of
INSERTION-SORT is
41
Worst-case and average-case analysis
2. For some algorithms, the worst case occurs fairly often. For
example, in searching a database for a particular piece of
information, the searching algorithm’s worst case will often occur
when the information is not present in the database. In some
applications, searches for absent information may be frequent.
3. The “average case” is often roughly as bad as the worst case.
Suppose that we randomly choose n numbers and apply insertion
sort. How long does it take to determine where in subarray A[1.. j-1]
to insert element A[j]? On average, half the elements in A[1…j-1]
are less than A[j], and half the elements are greater. On average,
therefore, we check half of the subarray A[1…j-1], and so tj is about
j/2. The resulting average-case running time turns out to be a
quadratic function of the input size, just like the worst-case running
time.
•The scope of average-case analysis is limited, because it may not be
apparent what constitutes an “average” input for a particular problem.
42
Asymptotic Notation
• The notations we use to describe the asymptotic
running time of an algorithm are defined in terms of
functions whose domains are the set of natural
numbers N={0,1,2,…}. Such notations are convenient
for describing the worst-case running-time function
T(n), which is usually defined only on integer input
sizes.
• Three basic asymptotic notations
1. O-notation (Big “oh”): When we have an asymptotic
upper bound, we use O-notation. For a given function
g(n), we denote by O(g(n)) (pronounced “big-oh of g
of n” or sometimes just “oh of g of n”) the set of
functions
O(g(n)) ={f(n): there exist positive constants c and N
such that 0 <= f(n) <= c g(n) for all n>=N}
43
Asymptotic Notation (cont.)
2. Ω-notation (Big “omega”): Ω-notation provides an asymptotic
lower bound. For a given function g(n), we denote by Ω(g(n))
(pronounced “big-omega of g of n” or sometimes just “omega of
g of n”) the set of functions
Ω(g(n)) ={f(n): there exist positive constants c and N such that
0 <= c g(n) <= f(n) for all n>=N}.
3. Θ-notation (Big “theta”): Θ-notation asymptotically bounds a
function from above and below. For a given function g(n), we
denote by Θ(g(n)) (pronounced “big-theta of g of n” or
sometimes just “theta of g of n”) the set of functions
Θ(g(n)) ={f(n): there exist positive constants c1, c2 and N such
that 0 <= c1 g(n) <= f(n) <= c2 g(n) for all n>=N}.
44
45
Asymptotic Notation (cont.)
Big-O: 3n+2=O(n) as 3n+2<=4n for all n>=2.
100n+6=O(n) as 100n+6<=101n for all n>=6.
10n2 +4n+2=O(n2) as 10n2 +4n+2<=11n2 for all n>=5.
1000n2 +100n – 6 = O(n2) as 1000n2 +100n - 6<= 1001n2 for all
n>=100.
3n+2!=O(1) as 3n+2 is not less than or equal to c for any constant c
and all n>=N. 10n2 +4n+2!=O(n).
Big-Ω: 3n+2= Ω(n) as 3n+2>=3n for all n>=1.
100n+6= Ω(n) as 100n+6>=100n for all n>=1.
10n2 +4n+2= Ω(n2) as 10n2 +4n+2>=n2 for all n>=1. Also observe
that 3n+3= Ω(1) . 10n2 +4n+2= Ω(n). 10n2 +4n+2= Ω(1).
Big-Θ: 3n+2= Θ(n) as 3n+2>=3n for all n>=2 and 3n+2<=4n for
all n>=2 so c1=3,c2=4 and N=2. 10n2 +4n+2= Θ(n2). 10logn+4=
Θ(logn) . 3n+2 != Θ(1). 3n+3!= Θ(n2). 10n2 +4n+2!= Θ(n).
10n2 +4n+2!= Θ(1). 46
Asymptotic Notation (cont.)
• For more example see the pages 29, 30, 31, 32, 33 of
book by Sahni)
• Home work : (i) Is 2n+1 =O(2n )? (ii) Is 22n =O(2n )?
47
Some commonly found orders of growth
O(1) Constant (bounded)
O(log n) Logarithmic
polynomial
increasing complexity
O(n) Linear
O(nlogn) Log linear
O(n2) Quadratic
O(n3) Cubic
O(2n) Exponential
O(n!) Exponential
(Factorial)
O(nn) Exponential
48
polynomial good, exponential bad
n
2n
49
Determining complexities
• Simple statements: O(1)
– Assignments of simple data types: int x = y;
– Arithmetic operations: x = 5 * y;
– Indexing into an array: array[i];
– (de)referencing pointers: listptr =list -> next;
– Declarations of simple data types: int x, y;
– Simple conditional tests: if (i < 12)
• But not e.g. if (SomeFunction() < 25)
50
If…then…else statements
• Worst case is slowest possibility
if (cond) {
sequence of statements 1 say this is O(n)
}
else {
sequence of statements 2 say this is O(1)
}
• Overall, this if…then…else is thus O(n)
51
Loops
• Two parts to loop analysis
– How many iterations?
– How many steps for each iteration?
int sum=0;
for (int j=0; j < n; j++)
sum = sum + j;
• Loop executes n times
• 4 = O(1) steps per iteration
• Total time is here n * O(1) = O(n * 1) =
O(n) 52
Loops: simple
int sum=0;
for (int j=0; j < 100; j++)
sum = sum + j;
• Loop executes 100 times
• 4 = O(1) steps per iteration
• Total time here is
100 * O(1) = O(100* 1) = O(1)
• Thus, faster than previous loop, for values
up to 100
53
Loops: while loops
done=FALSE; n=22;
while (!done) {
result = result * n;
n--;
if (n == 1) done = TRUE;
}
• Loop terminates when done==TRUE
• This happens after n iterations
• O(1) time per iteration
• O(n) total time here
54
Loops: nested
for (i=0; i < n; i++) {
for (j=0; j<m; j++) {
sequence of statements
}
}
• Outer loop executes n times
• For every outer, inner executes m times
• Thus, inner loop total = n * m times
• Complexity is O(n*m)
• Where inner condition is j<n (a common
case), total complexity is O(n2)
55
Sequences of statements
• For a sequence, determine individual times
and add up
for (j=0; j<n; j++)
for (k=0; k<j; k++) O(n2)
sum = sum + j*k;
58
• As N > 1, this means
7n2 + 12n < 7n2 + 12n2
add: 7n2 + 12n2 = 19n2
and observe: 19n2 19n2
• Thus, let c be 19 (to be sure, let us say 20)
• Checking: check for N > 1, so take N=2
• Is 7 * 22 + 12 * 2 < 20 * 22 ?
– Yes: 52 < 80
• Thus 7 * n2 + 12 * n 20 * n2 n 2
• Thus 7n2 + 12n 20n2 n 2
• Proved: 7n2 + 12n is O(n2)
59
Example 2
• Prove (n + 1)2 is O(n2)
• (n + 1)2 expands to n2 + 2n + 1
• n2 + 2n + 1 n2 + 2n2+ n2
• n2 + 2n2+ n2 = 4n2
• Thus, c is 4
• We can find N=1 as before
• Arrive at: (n + 1)2 4n2 n 1
• Thus, (n + 1)2 is O(n2)
60
More on Example 2
• Could also pick N=3 and c=2
• As (n + 1)2 2n2 n 3
• Because 2n + 1 n2 n 3
61
Example 3
• Prove 3n2 + 5 is O(n2)
• We choose constant c = 4 and N = 2
• Why? Because n 2:
3n2 + 5 4n2
Thus, by our definition, 3n2 + 5 O(n2)
• Now, prove 3n2 + 5 is not O(n) – easy:
• Whatever constant c and value N one
chooses, we can always find a value of n
greater than N such that 3n2 + 5 is greater
than c n
62
Example 4
• Prove 2n+4 has order n i.e. 2n+4 O(n)
• Clearly, 2n = 2n n
• 4 n if n 4
• 2n + 4 3n n 4
• Thus, our constants are c = 3 and N = 4
• Re-express: 2n + 4 c n n N
• Thus, by our definition
• 2n+4 O(n)
63
Example 5
• Prove 17n2 + 45n +46 is O(n2)
• 17n2 17n2 n
• 45n n2 n 45
• 46 n2 n 7
• 17n2 + 45n +46 17n2 + n2 + n2
• So, 17n2 + 45n +46 19n2
• Thus c = 19 and N = 45
• Thus 17n2 + 45n +46 19n2 n 45
• Proved: 17n2 + 45n +46 is O(n2)
64