0% found this document useful (0 votes)
69 views20 pages

Week 01

Uploaded by

dlshnchethiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views20 pages

Week 01

Uploaded by

dlshnchethiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

5SENG003W - Algorithms, Week 1

Dr. Klaus Draeger


Introduction

Algorithms are everywhere


I Web search
I Computer Graphics
I Cryptography
I Image recognition
I Security
I Recommendations
I ...
Some main topics

I We will see ways of analysing and designing algorithms


I Big-O notation
I Some important complexity classes (logarithmic, linear,
quadratic, exponential, . . . )
I How to determine them empirically (doubling hypothesis)
I Strategies (Greedy, Divide-and-Conquer)
I We will also focus on the relationship between algorithms
and data structures
I Linear vs non-linear structures
I Indexed vs linked structures
Some logistics

I In-person lectures
I Live lecture recordings available on blackboard later
I Tutorials in labs
I One in-class test, one coursework
I Worth 50% each
I Need to score at least 30 in each and at least 40 on
average
What is an algorithm?

I General idea: a set of instructions to solve a problem


I Find a solution (search in a data set, solve equations, . . . )
I Find an optimal solution (shortest path, minimal solution of
equations, . . . )
I Transform data (sort a data set, multiply matrices, . . . )
I Instructions include
I Atomic operations such as assignments
I Decisions (branching or loops)
I Can be represented in different ways, including
pseudocode and flowcharts
Pseudocode

I Similar to a program, but without irrelevant details


I Example: one of the earliest known algorithms is Euclid’s
algorithm for computing greatest common divisors
Input: x, y
Output: gcd of x and y
while x > 0
if x < y
y := y - x
else
x := x - y
output y
Flowcharts

I Diagrams representing the flow of an algorithm


I Boxes represent instructions and decisions
I A flowchart for Euclid’s algorithm could look like this:
Analysis of algorithms

I A central question about algorithms is how much time they


take to solve a problem
I Can also ask about other resources such as memory or
bandwidth usage
I By “time” we can mean
I Actual (milli-, micro-, . . . ) seconds
I Directly measurable
I But depends on implementation details, hardware etc
I Number of atomic operations
I Can be determined by analysing loop structure
I More advanced tools like Master’s Theorem for recursive
algorithms
I Not straightforward: on which input?
I Many different ones, of arbitrary sizes
Analysis of algorithms

I Algorithmic analysis
I Considers the worst case (i.e. maximal) time required for
any input size n
I average case and best case complexity are also
sometimes used
I Focuses on the order of growth: for inputs of size n, does
the time required grow like log n? n2 ? 2n ?
I A related (harder) question is the complexity of the
problem itself:
What is the best we can hope for from any algorithm?
Orders of growth

I Suppose we have a mathematical function in the variable


n, like f (n) = 5 · 2n + 3 · n4 + n · log n
I This could be the number of steps some algorithm needs
on an input of size n
I Only the fastest-growing term is relevant
I Among powers of n: the one with the highest exponent
I 2n grows faster than any power, log n more slowly
I The hierarchy looks like this:
1 < log n < n < n · log n < n2 < . . . < 2n < n · 2n < . . .
I Constant factors are irrelevant: for the function f , the
relevant term is 2n , not 5 · 2n
I So the order of growth of f is 2n
Big-O and Theta notation

I In order to describe the complexity of an algorithm or


problem, we use the Big-O and Theta(Θ) notation.
I Suppose f is some function of n.
I An algorithm
I Is in O(f (n)) if its (worst case) runtime for inputs of size n
grows no faster than f (n)
I Is in Θ(f (n)) if its (worst case) runtime for inputs of size n
grows no slower than f (n)
I Similarly, a problem
I Is in O(f (n)) if the time needed to solve it for inputs of size
n grows no faster than f (n)
I Is in Θ(f (n)) if the time needed to solve it for inputs of size n
grows no slower than f (n)
Big-O and Theta notation: example

I One important problem we will encounter in more detail


later is sorting an array
I Atomic operations are
I Comparing two array entries
I Assigning a new value (to an array entry or auxiliary
variable)
I Swapping two array entries
I It can be proven that the sorting problem is in Θ(n · log n)
I We will see some algorithms whose complexity is in
O(n · log n), i.e. as good as possible
Examples

I Complexity classes for some typical code samples:


I Constant (O(1)):
Atomic statement like a = 1;
I Linear (O(n)):
for(int i = 0; i < n; i++)
a[i] = 1;

I Quadratic (O(n2 )):


for(int i = 0; i < n; i++)
for(int j = 0; j < n; j++)
a[i][j] = 1;

Or also
for(int i = 0; i < n; i++)
for(int j = 0; j < i; j++)
a[i][j] = 1;
Examples

I Generally, e nested loops which all run up to n are in O(ne )


I More involved example:
for(int i = 0; i < n; i++)
for(int j = 0; j < n; j++)
for(int k = 0; k < 3; k++)
for(int l = 0; l < i; n++)
for(int m = 0; m < k; m++)
x++;

I i runs up to n, so it counts
I j runs up to n, so it counts
I k runs up to 3, independent of n, so doesn’t count
I l runs up to i which runs up to n, so it counts
I m runs up to k which is independent of n, so doesn’t count
I So this example takes O(n3 ) steps
Examples: logarithmic factors

I Complexity classes containing logarithmic factors, like


I logarithmic (O(log n))
I linearithmic (O(n · log n))
typically arise from divide-and-conquer algorithms which
we will see later
I The core idea is to solve the problem by
I cutting the data into halves
I solving sub-problems on each half
Examples: exponential factors

I Complexity classes containing exponential factors usually


arise when
I We have a number of variables (or array entries, or . . . )
growing with n
I We have to go through all combinations of values to find the
solution
I Example: in the satisfiability (SAT) problem
I We are given a logical formula like
(¬A ∨ B ∨ C) ∧ (A ∨ ¬C ∨ ¬D) ∧ (¬B ∨ D ∨ E) ∧ . . .
I We want to know if there is some way of assigning true or
false to A, B, C, . . . which makes the formula true
I The straightforward algorithm is to just try all combinations
I For n variables, that is 2n combinations
The empirical approach
I We can get evidence regarding complexity by sampling
runtimes of an implementation on a variety of inputs
I This is an example to the empirical approach used
throughout science
I Advantage: easy to implement given an implementation of
the algorithm and a way to obtain suitable inputs
I Some caveats:
I Depends on chosen inputs
I Many problems have easy special cases with lower
complexity
I How to ensure that our inputs are representative?
I Depends on implementation details
I You are essentially testing the implementation
I This can help find implementation errors if you know what
the complexity of the algorithm actually should be
I Depends on hardware (memory, cache) , amount of CPU
used by other processes, . . .
The empirical approach

I Basic idea: if the complexity of an implementation is


I Logarithmic, then repeatedly doubling the input size will
always increase the runtime by the same amount
I Linear/quadratic/cubic, then repeatedly doubling the input
size will always multiply the runtime by 2/4/8, respectively
I Exponential, then repeatedly increasing the input size by a
fixed amount will always multiply the runtime by some fixed
amount
I These relations will usually be approximate only
I Need enough data points to draw any conclusions
The empirical approach

I Suppose we have measured these runtimes:


Input size Algorithm 1
100 7
200 13
400 27
800 52
I The input size doubles from row to row
I Dividing each runtime for by the previous one gives
13/7 = 1.857, 27/13 = 2.077, 52/27 = 1.926
so they are approximately doubling
I This is evidence that the algorithm’s complexity is linear
The empirical approach

I Suppose we have measured these runtimes:


Input size Algorithm 2
100 17
200 22
400 28
800 33
I The input size doubles from row to row
I Taking the differences between successive runtimes gives
22 − 17 = 5, 28 − 22 = 6, 33 − 28 = 5
i.e. they don’t change except for small fluctuations
I This is evidence that the algorithm’s complexity is
logarithmic

You might also like