0% found this document useful (0 votes)
2 views

Unit I Introduction to Algorithm2 (1)

The document provides an introduction to algorithms, focusing on their design and analysis, particularly in terms of cost and performance. It discusses the nature of algorithms, their applications in various fields such as sorting, data analysis, and electronic commerce, and introduces the concept of insertion sort as a specific algorithm. Additionally, it emphasizes the importance of analyzing algorithms to predict resource requirements and efficiency based on input size and characteristics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit I Introduction to Algorithm2 (1)

The document provides an introduction to algorithms, focusing on their design and analysis, particularly in terms of cost and performance. It discusses the nature of algorithms, their applications in various fields such as sorting, data analysis, and electronic commerce, and introduces the concept of insertion sort as a specific algorithm. Additionally, it emphasizes the importance of analyzing algorithms to predict resource requirements and efficiency based on input size and characteristics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 79

Introduction to Algorithms

Design and Analysis of Algorithms

• Analysis: predict the cost of an algorithm in


terms of resources and performance

• Design: design algorithms which minimize the


cost

L1.2
Algorithms
• An algorithm is any well-defined computational procedure
that takes some value, or set of values, as input and produces
some value, or set of values, as output in a unit amount of
time.
• An algorithm is thus a sequence of computational steps that
transform the input into the output.
• Algorithm is a tool for solving a well-specified computational
problem. The statement of the problem specifies in general
terms the desired input/output relationship for problem
instances, typically of arbitrarily large size.
• The algorithm describes a specific computational procedure
for achieving that input/ output relationship for all problem
instances.
L1.3
The problem of sorting

Input: sequence a1, a2, …, an of n numbers.


Output: permutation a'1, a'2, …, a'n of the
input sequence such that a'1  a'2 …  a'n .

Example:
Input: 8 2 4 9 3 6
Output: 2 3 4 6 8 9
L1.4
The problem of sorting
• Given the input sequence: 8 2 4 9 3 6; a correct sorting
algorithm returns as output the sequence; 2 3 4 6 8 9.
• Such an input sequence is called an instance of the sorting
problem.
• In general, an instance of a problem consists of the input
(satisfying whatever constraints are imposed in the problem
statement) needed to compute a solution to the problem.
• An algorithm can be specified in English, as a computer
program, or even as a hardware design. The only requirement
is that the specification must provide a precise description of
the computational procedure to be followed.

L1.5
What kinds of problems are
solved by algorithms?
• The Human Genome Project has made great progress
toward the goals of identifying all the roughly 30,000 genes
in human DNA, determining the sequences of the roughly 3
billion chemical base pairs that make up human DNA,
storing this information in databases, and developing tools
for data analysis.
• https://youtu.be/MNFUf8dqk68

L1.6
What kinds of problems are
solved by algorithms?
• The internet enables people all around
the world to quickly access and
retrieve large amounts of information.
• With the aid of clever algorithms, sites
on the internet are able to manage and
manipulate this large volume of data.
• Examples of problems that make
essential use of algorithms include
finding good routes on which the data
travels, and using a search engine to
quickly find pages on which particular
information reside.

L1.7
What kinds of problems are
solved by algorithms?
• Electronic commerce enables
goods and services to be negotiated
and exchanged electronically, and it
depends on the privacy of personal
information such as credit card
numbers, passwords, and bank
statements.
• The core technologies used in
electronic commerce include public-
key cryptography and digital
signatures which are based on
numerical algorithms and number
theory.

L1.8
What kinds of problems are
solved by algorithms?
• Manufacturing and other commercial enterprises often need to
allocate scarce resources in the most beneficial way. An oil company
might wish to know where to place its wells in order to maximize its
expected profit.
• A political candidate might want to determine where to spend
money buying campaign advertising in order to maximize the
chances of winning an election.
• An airline might wish to assign crews to flights in the least
expensive way possible, making sure that each flight is covered and
that government regulations regarding crew scheduling are met.
• An internet service provider might wish to determine where to
place additional resources in order to serve its customers more
effectively.

L1.9
What kinds of problems are
solved by algorithms?
• You have a road map on which the distance between each
pair of adjacent intersections is marked, and you wish to
determine the shortest route from one intersection to another.
• The number of possible routes can be huge, even if you
disallow routes that cross over themselves.
• How can you choose which of all possible routes is the
shortest?
• You can start by modeling the road map (which is itself a
model of the actual roads) as a graph.

L1.10
What kinds of problems are
solved by algorithms?
• Given a mechanical design in terms of a library of parts, where
each part may include instances of other parts, list the parts in
order so that each part appears before any part that uses it.
• If the design comprises n parts, then there are n! possible orders,
where n! denotes the factorial function.
• Because the factorial function grows faster than even an
exponential function, you cannot feasibly generate each possible
order and then verify that, within that order, each part appears
before the parts using it (unless you have only a few parts).
• This problem is an instance of topological sorting,

L1.11
What kinds of problems are
• A
solved by algorithms?
doctor needs to determine whether an image represents a
cancerous tumor or a benign one.
• The doctor has available images of many other tumors, some of
which are known to be cancerous and some of which are known to
be benign.
• A cancerous tumor is likely to be more similar to other cancerous
tumors than to benign tumors, and a benign tumor is more likely to
be similar to other benign tumors.
• By using a clustering algorithm, the doctor can identify which
outcome is more likely.

L1.12
What kinds of problems are
solved by algorithms?
• You need to compress a large file
containing text so that it occupies
less space.
• Many ways to do so are known,
including ‘LZW compression’
which looks for repeating
character sequences.
• ‘Huffman coding’ which encodes
characters by bit sequences of
various lengths, with characters
occurring more frequently
encoded by shorter bit sequences.

L1.13
Insertion sort

Input: sequence a1, a2, …, an of n numbers.


Output: permutation a'1, a'2, …, a'n of the
input sequence such that a'1  a'2 …  a'n .

Example:
Input: 8 2 4 9 3 6
Output: 2 3 4 6 8 9
L1.14
Insertion sort
• The numbers to be sorted are also known as the
keys.
• Although the problem is conceptually about
sorting a sequence, the input comes in the form
of an array with n elements.
• When we want to sort numbers, it’s often because
they are the keys associated with other data,
which we call satellite data.
• Together, a key and satellite data form a record.
Insertion sort
• For example, consider a spreadsheet containing
student records with many associated pieces of
data such as age, grade-point average, and
number of courses taken.
• Any one of these quantities could be a key, but
when the spreadsheet sorts, it moves the
associated record (the satellite data) with the
key.
• When describing a sorting algorithm, we focus
on the keys, but it is important to remember that
there usually is associated satellite data.
Insertion sort
• Insertion sort is an efficient algorithm for sorting a small
number of elements.
• Insertion sort works the way you might sort a hand of
playing cards.
• Start with an empty left hand and the cards in a pile on
the table.
• Pick up the first card in the pile and hold it with your left
hand.
• Then, with your right hand, remove one card at a time
from the pile, and insert it into the correct position in
your left hand.
Insertion sort
• As Figure 2.1 illustrates, you find the correct position for a card
by comparing it with each of the cards already in your left
hand, starting at the right and moving left.
• As soon as you see a card in your left hand whose value is less
than or equal to the card you’re holding in your right hand,
insert the card that you’re holding in your right hand just to the
right of this card in your left hand.
• If all the cards in your left hand have values greater than the card in
your right hand, then place this card as the leftmost card in your
left hand.
• At all times, the cards held in your left hand are sorted, and
these cards were originally the top cards of the pile on the
table.
Example of insertion sort
8 2 4 9 3 6

L1.19
Example of insertion sort
8 2 4 9 3 6

L1.20
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6

L1.21
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6

L1.22
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6

L1.23
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6

L1.24
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6

L1.25
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6

L1.26
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6

L1.27
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6

L1.28
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6
2 3 4 6 8 9 done

L1.29
Insertion sort
INSERTION-SORT (A, n) ⊳ A[1 . . n]
for j ← 2 to n
do key ← A[ j]
i←j–1
“pseudocode” while i > 0 and A[i] > key
do A[i+1] ← A[i]
i←i–1
A[i+1] = key
1 i j n
A:
key
sorted
L1.30
Analyzing algorithms
• Analyzing an algorithm has come to mean predicting the
resources that the algorithm requires.
• You might consider resources such as memory,
communication bandwidth, or energy consumption.
• Most often, however, you’ll want to measure computational
time.
• If you analyze several candidate algorithms for a problem you
can identify the most efficient one.
• There might be more than just one viable candidate, but you
can often rule out several inferior algorithms in the process.
Analyzing algorithms
• Before you can analyze an algorithm, you need a model of the
technology that it runs on, including the resources of that
technology and a way to express their costs.
• Most of this book assumes a generic one-processor, random-
access machine (RAM) model of computation as the
implementation technology, with the understanding that algorithms
are implemented as computer programs.
• In the RAM model, instructions execute one after another, with
no concurrent operations.
• The RAM model assumes that each instruction takes the same
amount of time as any other instruction and that each data access
using the value of a variable or storing into a variable takes the
same amount of time as any other data access.
• In other words, in the RAM model each instruction or data
access takes a constant amount of time even indexing into an
array.
Analysis of insertion sort
• How long does the INSERTION-SORT procedure take? One way
to tell would be for you to run it on your computer and time how
long it takes to run.
• Of course, you’d first have to implement it in a real programming
language, since you cannot run our pseudocode directly. What would
such a timing test tell you?
• You would find out how long insertion sort takes to run on your
particular computer, on that particular input, under the
particular implementation that you created, with the particular
compiler or interpreter that you ran, with the particular
libraries that you linked in, and with the particular background
tasks that were running on your computer concurrently with your
timing test (such as checking for incoming information over a
network).
• If you run insertion sort again on your computer with the same input
you might even get a different timing result.
Analysis of insertion sort
• How do we analyze insertion sort? First, let’s acknowledge
that the running time depends on the input.
• Sorting a thousand numbers takes longer than sorting three
numbers.
• Moreover, insertion sort can take different amounts of time
to sort two input arrays of the same size, depending on how
nearly sorted they already are.
• Even though the running time can depend on many features
of the input, we’ll focus on the one that has been shown to
have the greatest effect, namely the size of the input, and
describe the running time of a program as a function of
the size of its input.
• To do so, we need to define the terms ‘running time’ and
‘input size’ more carefully.
Analysis of insertion sort
• The best notion for input size depends on the problem being
studied.
• For many problems, such as sorting or computing discrete Fourier
transforms, the most natural measure is the number of items in the
input for example, the number n of items being sorted.
• The running time of an algorithm on a particular input is the
number of instructions and data accesses executed.
• How we account for these costs should be independent of any
particular computer, but within the framework of the RAM model.
• For the moment, let us adopt the following view. A constant
amount of time is required to execute each line of our pseudocode.
• One line might take more or less time than another line, but we’ll
assume that each execution of the kth line takes c k time, where c k
is a constant.
Analysis of insertion sort
• To analyze the INSERTION-SORT procedure, let’s view it on
the following page with the time cost of each statement and the
number of times each statement is executed.
• For each i D 2; 3; : : : ; n, let t i denote the number of times the
while loop test in line 5 is executed for that value of i .
• When a for or while loop exits in the usual way because the
test in the loop header comes up FALSE the test is executed
one time more than the loop body.
• Because comments are not executable statements, assume that
they take no time.
Analysis of insertion sort

L1.37
Analysis of insertion sort

L1.38
Analysis of insertion sort

L1.39
Analysis of insertion sort
We can express this running time as an C b for constants a and b that
depend on the statement costs c k (where a D c 1 Cc 2 Cc 4 Cc 5 Cc 8
and b D c 2 Cc 4 Cc 5 Cc 8 ).
The running time is thus a linear function of n. The worst case arises
when the array is in reverse sorted order4that is, it starts out in
decreasing order.
The procedure must compare each element AOEi c with each element
in the entire sorted subarray AOE1 W i  1c, and so t i D i for i D 2;
3; : : : ; n. (The procedure ûnds that AOEj c > key every time in line 5,
and the while loop exits only when j reaches 0.) Noting that

L1.40
Analysis of insertion sort

L1.41
Worst-case and average-case
analysis
• The worst-case running time of an algorithm gives an upper
bound on the running time for any input.
• If you know it, then you have a guarantee that the algorithm
never takes any longer.
• You need not make some educated guess about the running time
and hope that it never gets much worse.
• This feature is especially important for real-time computing, in
which operations must complete by a deadline.

L1.42
Analysis of insertion sort

L1.43
Running time

• The running time depends on the input: an


already sorted sequence is easier to sort.
• Major Simplifying Convention:
Parameterize the running time by the size of
the input, since short sequences are easier to
sort than long ones.
TA(n) = time of A on length n inputs
• Generally, we seek upper bounds on the
running time, to have a guarantee of
performance. L1.44
Kinds of analyses
Worst-case: (usually)
• T(n) = maximum time of algorithm
on any input of size n.
Average-case: (sometimes)
• T(n) = expected time of algorithm
over all inputs of size n.
• Need assumption of statistical
distribution of inputs.
Best-case: (NEVER)
• Cheat with a slow algorithm that
works fast on some input.
L1.45
Machine-independent time
What is insertion sort’s worst-case time?

BIG IDEAS:
• Ignore machine dependent constants,
otherwise impossible to verify and to compare algorithms

• Look at growth of T(n) as n → ∞ .

“Asymptotic Analysis”
L1.46
-notation

DEF:
(g(n)) = { f (n) : there exist positive constants c1, c2, and
n0 such that 0  c1 g(n)  f (n)  c2 g(n)
for all n  n0 }
Basic manipulations:
• Drop low-order terms; ignore leading constants.
• Example: 3n3 + 90n2 – 5n + 6046 = (n3)

L1.47
Asymptotic performance
When n gets large enough, a (n2) algorithm
always beats a (n3) algorithm.
.
• Asymptotic analysis is a
useful tool to help to
structure our thinking
toward better algorithm
• We shouldn’t ignore
T(n) asymptotically
slower algorithms,
however.
n0 • Real-world design
n
situations often call for a
L1.48
Insertion sort analysis
Worst case: Input reverse sorted.
n
T ( n )    ( j )  n 2  [arithmetic series]
j 2
Average case: All permutations equally likely.
n
T ( n )    ( j / 2)  n 2 
j 2
Is insertion sort a fast sorting algorithm?
• Moderately so, for small n.
• Not at all, for large n.
L1.49
Example 2: Integer
Multiplication

• Let X = A B and Y = C D where A,B,C


and D are n/2 bit integers
• Simple Method: XY = (2n/2A+B)(2n/2C+D)
• Running Time Recurrence
T(n) < 4T(n/2) + 100n

• Solution T(n) = (n2)


L1.50
Better Integer Multiplication

• Let X = A B and Y = C D where A,B,C and D


are n/2 bit integers
• Karatsuba:
XY = (2n/2+2n)AC+2n/2(A-B)(C-D) + (2n/2+1) BD
• Running Time Recurrence
T(n) < 3T(n/2) + 100n

• Solution: (n) = O(n log 3)

L1.51
Example 3:Merge sort

MERGE-SORT A[1 . . n]
1. If n = 1, done.
2. Recursively sort A[ 1 . . n/2 ]
and A[ n/2+1 . . n ] .
3. “Merge” the 2 sorted lists.

Key subroutine: MERGE

L1.52
Merging two sorted arrays
20 12
13 11
7 9
2 1

L1.53
Merging two sorted arrays
20 12
13 11
7 9
2 1

L1.54
Merging two sorted arrays
20 12 20 12
13 11 13 11
7 9 7 9
2 1 2

L1.55
Merging two sorted arrays
20 12 20 12
13 11 13 11
7 9 7 9
2 1 2

1 2

L1.56
Merging two sorted arrays
20 12 20 12 20 12
13 11 13 11 13 11
7 9 7 9 7 9
2 1 2

1 2

L1.57
Merging two sorted arrays
20 12 20 12 20 12
13 11 13 11 13 11
7 9 7 9 7 9
2 1 2

1 2 7

L1.58
Merging two sorted arrays
20 12 20 12 20 12 20 12
13 11 13 11 13 11 13 11
7 9 7 9 7 9 9
2 1 2

1 2 7

L1.59
Merging two sorted arrays
20 12 20 12 20 12 20 12
13 11 13 11 13 11 13 11
7 9 7 9 7 9 9
2 1 2

1 2 7 9

L1.60
Merging two sorted arrays
20 12 20 12 20 12 20 12 20 12
13 11 13 11 13 11 13 11 13 11
7 9 7 9 7 9 9
2 1 2

1 2 7 9

L1.61
Merging two sorted arrays
20 12 20 12 20 12 20 12 20 12
13 11 13 11 13 11 13 11 13 11
7 9 7 9 7 9 9
2 1 2

1 2 7 9 11

L1.62
Merging two sorted arrays
20 12 20 12 20 12 20 12 20 12 20 12
13 11 13 11 13 11 13 11 13 11 13
7 9 7 9 7 9 9
2 1 2

1 2 7 9 11

L1.63
Merging two sorted arrays
20 12 20 12 20 12 20 12 20 12 20 12
13 11 13 11 13 11 13 11 13 11 13
7 9 7 9 7 9 9
2 1 2

1 2 7 9 11 12

L1.64
Merging two sorted arrays
20 12 20 12 20 12 20 12 20 12 20 12
13 11 13 11 13 11 13 11 13 11 13
7 9 7 9 7 9 9
2 1 2

1 2 7 9 11 12

Time = (n) to merge a total


of n elements (linear time).
L1.65
Analyzing merge sort

T(n) MERGE-SORT A[1 . . n]


(1) 1. If n = 1, done.
2T(n/2) 2. Recursively sort A[ 1 . . n/2 ]
and A[ n/2+1 . . n ] .
(n) 3. “Merge” the 2 sorted lists
Sloppiness: Should be T( n/2 ) + T( n/2 ) ,
but it turns out not to matter asymptotically.

L1.66
Recurrence for merge sort
(1) if n = 1;
T(n) =
2T(n/2) + (n) if n > 1.
• We shall usually omit stating the base
case when T(n) = (1) for sufficiently
small n, but only when it has no effect on
the asymptotic solution to the recurrence.
• Lecture 2 provides several ways to find a
good upper bound on T(n).

L1.67
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.

L1.68
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
T(n)

L1.69
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn
T(n/2) T(n/2)

L1.70
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn
cn/2 cn/2

T(n/4) T(n/4) T(n/4) T(n/4)

L1.71
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn
cn/2 cn/2

cn/4 cn/4 cn/4 cn/4


(1)

L1.72
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn
cn/2 cn/2
h = lg n cn/4 cn/4 cn/4 cn/4

(1)

L1.73
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2
h = lg n cn/4 cn/4 cn/4 cn/4

(1)

L1.74
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2 cn
h = lg n cn/4 cn/4 cn/4 cn/4

(1)

L1.75
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2 cn
h = lg n cn/4 cn/4 cn/4 cn/4 cn


(1)

L1.76
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2 cn
h = lg n cn/4 cn/4 cn/4 cn/4 cn


(1) #leaves = n (n)

L1.77
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2 cn
h = lg n cn/4 cn/4 cn/4 cn/4 cn


(1) #leaves = n (n)
Total(n lg n)
L1.78
Conclusions
• (n lg n) grows more slowly than (n2).
• Therefore, merge sort asymptotically
beats insertion sort in the worst case.
• In practice, merge sort beats insertion
sort for n > 30 or so.

L1.79

You might also like