0% found this document useful (0 votes)
109 views66 pages

Unit 8

This document discusses algorithmic complexity and big O notation. It begins by explaining that while computers are fast, data sets can become very large, so algorithm efficiency is still important. It then discusses measuring efficiency by counting the number of operations rather than timing, as operation counts are independent of machine or implementation. Different complexity classes like O(1), O(log n), O(n), and O(n^2) are introduced based on how the number of operations grows with input size. Rules for combining complexity classes like addition and multiplication are covered. Overall, the document provides an introduction to analyzing algorithms using big O notation to describe their asymptotic runtime complexity.

Uploaded by

Sudarshan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views66 pages

Unit 8

This document discusses algorithmic complexity and big O notation. It begins by explaining that while computers are fast, data sets can become very large, so algorithm efficiency is still important. It then discusses measuring efficiency by counting the number of operations rather than timing, as operation counts are independent of machine or implementation. Different complexity classes like O(1), O(log n), O(n), and O(n^2) are introduced based on how the number of operations grows with input size. Rules for combining complexity classes like addition and multiplication are covered. Overall, the document provides an introduction to analyzing algorithms using big O notation to describe their asymptotic runtime complexity.

Uploaded by

Sudarshan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

1

▪ Measuring orders of growth of algorithms


▪ Big “Oh” notation
▪ Complexity classes
▪ Complexity classes examples

2
• computers are fast and getting faster – so maybe efficient programs
don’t matter?
• but data sets can be very large (e.g., in 2014, Google served
30,000,000,000,000 pages, covering 100,000,000 GB – how long to
search brute force?)
• thus, simple solutions may simply not scale with size in acceptable
manner.

• how can we decide which option for program is most efficient?

• separate time and space efficiency of a program

• Trade off between them:


• can sometimes pre-compute results are stored; then use “lookup” to
retrieve (e.g., memorization for Fibonacci)
• will focus on time efficiency 3
Challenges in understanding efficiency of solution to a
computational problem:
• a program can be implemented in many different
ways
• you can solve a problem using only a handful of
different algorithms
• would like to separate choices of implementation
from choices of more abstract algorithm

4
▪ measure with timer
▪ count the operations.
▪ abstract notation of order of growth

5
• use time module

import time

def c_to_f(c):
return c*9/5 + 32
▪ start clock t0 = time.clock()
c_to_f(100000)
▪ call function
t1 = time.clock() - t0
▪ stop clock Print(t1, "s")

6
GOAL: to evaluate different algorithms
• running time varies between algorithms
• running time varies between implementation
• running time varies between computers
• running time is not predictable based on small
inputs

▪ time varies for different inputs but


cannot really express a relationship
between inputs and time

7
COUNTING OPERATIONS
• assume these steps take def c_to_f(c):
constant time : return c*9.0/5 + 32

• mathematical operations def mysum(x):


• comparisons total = 0
• assignments for i in range(x+1):
total += i
• accessing objects in return total
memory
• then count the number of mysum - > 1+3x ops
operations executed as
function of size of input

8
GOAL: to evaluate different algorithms
• count depends on algorithm
• count depends on implementation
• count independent of computers
• no clear definition of which operation to
count

• count varies for different inputs


and can come up with a
relationship between inputs
and the count 9
• Timing and counting evaluate implementaion.
• Timing evaluates machines.

• want to evaluate algorithm.


• want to evaluate scalability.
• want to evaluate in terms of input size.

10
• Going to focus on idea of counting operations in an
algorithm, but not worry about small variations in
implementation (e.g., whether we take 3 or 4 primitive
operations to execute the steps of a loop)
• Going to focus on how algorithm performs when size of
problem gets arbitrarily large
• Want to relate time needed to complete a computation,
measured this way, against the size of the input to the
problem
• Need to decide what to measure, given that actual
number of steps may depend on specifics of trial

11
• want to express efficiency in terms of size of input,
so need to decide what your input is
• could be an integer
-- mysum(x)
• could be length of list
-- list_sum(L)
• you decide when multiple parameters to a function
-- search_for_elmt(L, e)

12
• a function that searches for an element in a list
def search_for_elmt(L, e):
for i in L:
if i == e:
return True
return False
• when e is first element in the list a BEST CASE
• when e is not in list a WORST CASE
• when look through about half of the elements in list
a AVERAGE CASE
• want to measure this behavior in a general way

13
• suppose you are given a list L of some length
len(L)
• best case: minimum running time over all possible
inputs of a given size, len(L)
• constant for search_for_elmt
• first element in any list
• average case: average running time over all possible
inputs of a given size, len(L)
• practical measure

• worst case: maximum running time over all possible


inputs of a given size, len(L)
• linear in length of list for search_for_elmt
• must search entire list and not find it
14
Goals:
• want to evaluate program’s efficiency when input is very
big
• want to express the growth of program’s run time as
input size grows
• want to put an upper bound on growth – as tight as
possible
• do not need to be precise: “order of” not “exact” growth
• we will look at largest factors in run time (which
sections of the program will take the longest to run?)
• thus, generally we want tight upper bound on
growth, as function of size of input, in worst case
15
• Big Oh notation measures an upper bound on the
asymptotic growth, often called order of growth

• Big Oh or O() is used to describe worst case


• worst case occurs often and is the bottleneck when a
program runs

• express rate of growth of program relative to the input


size

• evaluate algorithm NOT machine or implementation

16
def fact_iter(n):
"""assumes n an int >= 0"""
answer = 1
while n > 1:
answer *= n
n -= 1
return answer

• computes factorial
• number of steps:
▪ worst case asymptotic complexity:
• ignore additive constants
• ignore multiplicative constants

17
• Interested in describing how amount of time needed
grows as size of (input to) problem grows
• Thus, given an expression for the number of operations
needed to compute an algorithm, want to know
asymptotic behavior as size of problem gets large
• Hence, will focus on term that grows most rapidly in a
sum of terms
• And will ignore multiplicative constants, since want to
know how rapidly time required increases as increase
size of input

18
• drop constants and multiplicative factors
• focus on dominant terms

O(n2) : n2 + 2n + 2
O(n2) : n2 + 100000n + 31000
O(n) : log(n) + n + 4
n) : 0.0001*n*log(n) + 300n
O(n log
O(3n) : 2n30 + 3n

19
20
• combine complexity classes
• analyze statements inside function
• apply some rules, focus on dominant term
Law of Addition for O():
• used with sequential statements
• O(f(n)) + O(g(n)) is O( f(n) + g(n) )
• for example,
for i in range(n):
print('a')
for j in range(n*n):
print('b')
is O(n) + O(n*n) = O(n+n2) = O(n2) because of dominant term
21
▪ combine complexity classes
• analyze statements inside functions
• apply some rules, focus on dominant term
Law of Multiplication for O():
• used with nested statements/loops
• O(f(n)) * O(g(n)) is O( f(n) * g(n) )
• for example,
for i in range(n):
for j in range(n):
print('a')
is O(n)*O(n) = O(n*n) = O(n2) because the outer loop goes n
times and the inner loop goes n times for every outer loop
iteration. 22
• O(1) denotes constant running time
• O(log n) denotes logarithmic running time
• O(n) denotes linear running time
• O(n log n) denotes log-linear running time
• O(nc) denotes polynomial running time (c is a
constant)
• O(cn) denotes exponential running time (c is a constant
being raised to a power based on size of input)

23
O(1) : constant

O(log n) : logarithmic

O(n) : linear

O(n log n): loglinear

O(nc) : polynomial

O(cn) : Exponential

24
CLASS n=10 n=100 n=1000 n=1000000

O(1) 1 1 1 1

O(log n) 1 2 3 6

O(n) 10 100 1000 1000000

O(n log n) 10 200 3000 6000000

O(n^2) 100 10000 1000000 1000000000000

O(2^n) 1024 126765060022 1071508607186267320948425049060


822940149670 0018105614048117055336074437503
3205376 883703510511249361224931983788
156958581275946729175531468251
871452856923140435984577574698
574803934567774824230985421074 Have fun
605062371141877954182153046474
983581941267398767559165543946
077062914571196477686542167660
429831652624386837205668069376

25
• complexity independent of inputs

• very few interesting algorithms in this class, but can


oYen have pieces that fit this class.

• can have loops or recursive calls, but ONLY IF


number of iterations or calls independent of size of
input.

26
• Simple iterative loop algorithms are typically
linear in complexity

27
UNSORTED
def linear_search(L, e):
found = False
for i in range(len(L)):
if e == L[i]:
found = True
return found

▪ must look through all elements to decide it’s not


there
▪ O(len(L)) for the loop * O(1) to test if e == L[i]
◦O(1 + 4n + 1) = O(4n + 2) = O(n)
▪ overall complexity is O(n) – where n is len(L)
6.0001 LECTURE12
28
SORTED
def search(L, e):
for i in range(len(L)):
if L[i] == e:
return True
if L[i] > e:
return False
return False
• must only look until reach a number greater than e
• O(len(L)) for the loop * O(1) to test if e == L[i]
• overall complexity is O(n) – where n islen(L)
• NOTE: order of growth is same, though run time
may differ for two search methods

6.0001 LECTURE12
29
§ searching a list in sequence to see if an element is present

§ add characters of a string, assumed to be composed of


decimal digits

def addDigits(s):
val = 0
for c in s:
val += int(c)
return val

O(len(s)) à O(n)
6.0001

30
• complexity often depends on number of iterations
def fact_iter(n):
prod = 1
for i in range(1, n+1):
prod *= i
return prod
• number of times around loop is n
• number of operations inside loop is a constant
• (in this case, 3 – set i, multiply, set prod)
• O(1 + 3n + 1) = O(3n + 2) = O(n)
• overall just O(n)

31
▪ simple loops are linear in complexity
▪ what about loops that have loops within them?

32
§ determine if one list is subset of second, i.e., every element of
first, appears in second (assume no duplicates)

def isSubset(L1, L2):

for e1 in L1:
matched = False
for e2 in L2:
if e1 == e2:
matched = True
break
if not matched:
return False
return True

33
def isSubset(L1, L2): • outer loop executed len(L1) times
for e1 in L1:
matched = False • each iteration will execute inner loop
for e2 in L2: up to len(L2) times , with constant
if e1 == e2: number of operation
matched = true • O(len(L1)*len(L2))
break
if not matched: • worst case when L1 and L2 same
return False lengthO(len(L1)2)
return True! 6.0001

34
find intersection of two lists, return a list with each element
appearing only once

def intersect(L1, L2):


tmp = []
for e1 in L1:
for e2 in L2:
if e1 == e2:
tmp.append(e1)
res = []
for e in tmp:
if not(e in res):
res.append(e)
return res

35
def intersect(L1, L2):
tmp = [] • first nested loop takes len(L1)*len(L2) steps
for e1 in L1:
for e2 in L2: • second loop takes at most len(L1) steps
if e1 == e2:
tmp.append(e1) • determining if element in list might take len(L1)
res = [] steps.
for e in tmp: • if we assume lists are of roughly same length,
if not(e in res): then O(len(L1)^2)
6.0001 LECTURE 10 36
res.append(e)
return res

36
def g(n):
""" assume n >= 0 """
x = 0
for i in range(n):
for j in range(n):
x += 1
return x

▪ computes n2 very inefficiently


▪ when dealing with nested loops, look at the ranges
▪ nested loops, each iterating n times
▪ O(n2)

37
• complexity grows as log of size of one of its
inputs
• example:
• Bisection search
• Binary search of a list

38
• suppose we want to know if a particular
element is present in a list
• Linear search just “walk down” the list,
checking each element
• complexity was linear in length of the list
• suppose we know that the list is ordered
from smallest to largest ( what we can
do better)

39
1. pick an index, i, that divides list in half
2. ask if L[i] == e
3. if not, ask if L[i] is larger or smaller than e
4. depending on answer, search left or right half of L for e

A new version of a divide-and-conquer algorithm


▪ break into smaller version of problem (smaller list), plus
some simple operations.
▪ answer to smaller version is answer to original problem

40
• finish looking
through list
when
1 = n/2i
… so i = log n

… ▪complexity of recursion
is O(log n) – where n is
len(L)

41
def bisect_search1(L, e):
if L == []:
return False
elif len(L) == 1:
return L[0] == e
else:
half = len(L)//2
if L[half] > e:
return bisect_search1( L[:half] , e)
else:
return bisect_search1( L[half:] , e)

42
▪ Implementation 1 – bisect_search1
• O(log n) bisection search calls
• On each recursive call, size of range to be searched is cut inhalf
• If original range is of size n, in worst case down to range of size 1
when n/(2^k) = 1; or when k = log n
• O(n) for each bisection search call to copy list
• This is the cost to set up each call, so do this for each level of
recursion
• O(log n) * O(n) > O(n log n)
• if we are really careful, note that length of list to be
copied is also halved on each recursive call
• turns out that total cost to copy is O(n) and this dominates the log
n cost due to the recursivecalls

43
• Still reduce size of
problem by factor of
two on each step

• but just keep track


of low and high
Portion of list to be
searched
• avoid copying the
list

• complexity of
recursion is again
O(log n) – where n
is len(L)
44
def bisect_search2(L, e):
def bisect_search_helper(L, e, low, high):
if high == low:
return L[low] == e
mid = (low + high)//2
if L[mid] == e:
return True
elif L[mid] > e:
if low == mid: #nothing left to search
return False
else:
return bisect_search_helper(L, e, low, mid - 1)
else:
return bisect_search_helper(L, e, mid + 1, high)
if len(L) == 0:
return False
else:
return bisect_search_helper(L, e, 0, len(L) - 1)

45
▪ Implementation 2 – bisect_search2 and its helper
• O(log n) bisectionsearch calls
• On each recursive call, size of range to be searched is cut in half
• If original range is of size n, in worst case down to range of size 1
when n/(2^k) = 1; or when k = log n
• pass list and indices as parameters
• list never copied, just re-passed as apointer
• thus O(1) work on each recursive call
• O(log n) * O(1) - > O(log n)

46
def intToStr(i):
digits = '0123456789'
if i == 0:
return '0'
result = ''
while i > 0:
result = digits[i%10] + result
i = i//10
return result

47
LOGARITHMIC COMPLEXITY
def intToStr(i): • only have to look at loop
digits = '0123456789' as no function calls
if i == 0:
return '0'
• within while loop,
res = ''
constant number of
while i > 0:
steps
res = digits[i%10] + res • how many times
i = i//10 through loop?
return result • how many times
canone divide i by
10?
• O(log(i))

48
▪ complexity can depend on number of itera;ve calls
def fact_iter(n):
prod = 1
for i in range(1, n+1):
prod *= i
return prod
▪ overall O(n) – n times round loop, constant cost
each time

49
def fact_recur(n):
""" assume n >= 0 """
if n <= 1:
return 1
else:
return n*fact_recur(n – 1)
• computes factorial recursively
• if you time it, may notice that it runs a bit slower
than iterative version due to function calls
• Still O(n) because the number of function calls
is linear in n, and constant effort to set up call
• Iterative and recursive factorial
implementation are the same order of
growth
50
▪ many practical algorithms are log-linear
▪ very commonly used log-linear algorithm is
▪ merge sort

51
• most common polynomial algorithms are
quadratic, i.e., complexity grows with square
of size of input.
• commonly occurs when we have nested
loops or recursive function calls.

52
• recursive functions where more than one recursive
call for each size of problem
• Towers of Hanoi
• many important problems are inherently
exponentially
• unfortunate, as cost can be high
• will lead us to consider approximate solutions as may
provide reasonable answer more quickly

53
▪ Let tn denote timeto solve tower of size n
▪ tn = 2tn-1 +1
= 2(2tn-2 + 1) + 1
= 4tn-2 + 2 +1
= 4(2tn-3 + 1) + 2 + 1 Geometric growth
= 8tn-3 + 4 + 2 + 1 a= 2n-1 + … + 2 + 1
= 2k tn-k + 2k-1 + …+ 4 + 2 + 1 2a = 2n + 2n-1 + ... +2
a = 2n -1
= 2n-1 + 2n-2 + ... + 4 + 2 +1
= 2n – 1
▪ so order of growth isO(2n)

54
• given a set of integers (with no repeats), want to
generate the collection of all possible subsets –
called the power set
• {1, 2, 3, 4} would generate
• {}, {1}, {2}, {3}, {4}, {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4},
{3, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}, {1, 2, 3,
4}
• order doesn’t matter
• {}, {1}, {2}, {1, 2}, {3}, {1, 3}, {2, 3}, {1, 2, 3}, {4}, {1, 4},
{2,4}, {1, 2, 4}, {3, 4}, {1, 3, 4}, {2, 3, 4}, {1, 2, 3, 4}

55
• we want to generate the power set of integers from 1 to n

• assume we can generate power set of integers from 1 to n-1

• then all of those subsets belong to bigger power set (choosing not
include n); and all of those subsets with n added to each of them
also belong to the bigger power set (choosing to include n)

• {}, {1}, {2}, {1, 2}, {3}, {1, 3}, {2, 3}, {1, 2, 3}, {4}, {1, 4}, {2,4},
{1, 2, 4}, {3, 4}, {1, 3, 4}, {2, 3, 4}, {1, 2, 3, 4}

• nice recursive description.

56
def genSubsets(L):
res = []
if len(L) == 0:
return [[]] #list of empty list
smaller = genSubsets(L[:-1]) # all subsets without
last element
extra = L[-1:] # create a list of just last element
new = []
for small in smaller:
new.append(small+extra) # for all smaller
solutions, add one with last element
return smaller+new # combine those with last
element and those without

57
EXPONENTIAL COMPLEXITY
def genSubsets(L):
res = []
if len(L) == 0:
• assuming
return [[]]
append is
smaller = genSubsets(L[:-1])
constant time
extra = L[-1:]
new = [] • Time includes time to
for small in smaller: solve smaller problem,
new.append(small+extra) plus time needed to
return smaller+new make a copy of all
elements in smaller
problem

58
EXPONENTIAL COMPLEXITY
def genSubsets(L): but important to think
res = [] about size of smaller
if len(L) == 0:
return [[]] know that for a set ofsize
smaller = genSubsets(L[:-1]) k there are 2k cases
extra = L[-1:]
new = [] how can we deduce
for small in smaller: overall complexity?
new.append(small+extra)
return smaller+new

59
▪let t n denote time to solve problem of size n
▪ let sn denote size of solution for problem ofsize n
▪t n = tn-1 + sn-1 + c (where c is some constant number of operations)
▪ t n = tn-1 + 2n-1 + c

= tn-2 + 2n-2 + c + 2n-1 + c


Thus
= tn-k + 2n-k + …+ 2n-1 + kc
computing
= t0 + 20 + ... + 2n-1 + nc power set is
= 1 + 2n + nc &'( O(2n)
1 − )&
=" )# =
1−)
#$% 60
▪O(1) – code does not depend on size of problem
▪O(log n) – reduce problem in half each time
through process
▪ O(n) – simple iterative or recursive programs
▪ O(n log n) –
▪ O(nc) – nested loops or recursive calls
▪ O(cn) – multiple recursive calls at each level

61
COMPLEXITY OF
ITERATIVE FIBONACCI
▪ Best case:
def fib_iter(n):
if n == 0:
return 0 O(1)
elif n == 1:
return 1 ▪ Worst case:
else:
fib_i = 0 O(1) + O(n) + O(1) > O(n)
fib_ii = 1
for i in range(n-1):
tmp = fib_i
fib_i = fib_ii
fib_ii = tmp + fib_ii
return fib_ii

62
def fib_recur(n):
""" assumes n an int >= 0 """
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib_recur(n-1) + fib_recur(n-2)

▪ Worst case:
O(2n)

63
fib(5)

fib(4) fib(3)

fib(3) fib(2) fib(2) fib(1)

fib(2) fib(1)

• actually can do a bit better than 2n since tree


of cases thins out to right
• but complexity is still exponential

64
▪ compare efficiency of algorithms
• Notation that describes growth
• lower order of growth is better
• independent of machine or specific implementation

▪ use Big Oh
• describe order of growth
• Asymptotic notation
• upper bound
• worst case analysis

65
▪ Lists: n is len(L) ▪ Dictionaries : n is len(d)
• index O(1) ▪ worst case
• store O(1) • index O(n)
• length O(1) • store O(n)
• append O(1) • length O(n)
• == O(n) • delete O(n)
• remove O(n) • iteration O(n)
• copy O(n) ▪ average case
• reverse O(n) • index O(1)
• iteration O(n) • store O(1)
• in list O(n) • delete O(1)
• iteration O(n)

66

You might also like