Week 10 Randomised Algorithms, Algorithm and Data Ethics, Course Review
Week 10 Randomised Algorithms, Algorithm and Data Ethics, Course Review
Course Review
1/79
Randomised Algorithms
Algorithms employ randomness to
2/79
Randomness
Randomness is also useful
3/79
Sidetrack: Random Numbers
How can a computer pick a number at random?
!"it cannot
The most widely-used technique is called the Linear Congruential Generator (LCG)
LCG is not good for applications that need extremely high-quality random numbers
!"the period length is too short
!"period length … length of sequence at which point it repeats itself
!"a short period means the numbers are correlated
Trivial example:
11, 28, 29, 9, 6, 4, 13, 19, 23, 5, 24, 16, 21, 14, 30, 20, 3, 2, 22, 25,
27, 18, 12, 8, 26, 7, 15, 10, 17, 1, 11, 28, 29, 9, 6, 4, 13, 19, 23, 5,
24, 16, 21, 14, 30, 20, 3, 2, 22, 25, 27, 18, 12, 8, 26, 7, 15, 10, 17, 1,
11, 28, 29, 9, 6, 4, 13, 19, 23, 5, 24, 16, 21, 14, 30, 20, 3, 2, 22, 25, 27,
18, 12, 8, 26, 7, 15, 10, 17, 1, 11, 28, 29, 9, 6, 4, 13, 19, 23, 5, 24, 16,
21, 14, 30, 20, 3, 2, 22, 25, 27, 18, 12, 8, 26, 7, 15, 10, 17, 1, 11, 28,
29, 9, 6, 4, 13, 19, 23, 5, 24, 16, 21, 14, 30, 20, 3, 2, 22, 25, 27, 18,
12, 8, 26, 7, 15, 10, 17, 1, ...
12, 24, 18, 6, 12, 24, 18, 6, 12, 24, 18, 6, 12, 24, 18, 6, 12, 24, 18, 6,
12, 24, 18, 6, 12, 24, 18, 6, 12, 24, 18, 6, ...
Most compilers use LCG-based algorithms that are slightly more involved; see www.mscs.dal.ca/~selinger/random/ for details (including a short C program that
produces the exact same pseudo-random numbers as gcc for any given seed value)
Using the remainder to compute a random number is not the best way:
Some applications require more sophisticated, cryptographically secure pseudo random numbers
#include <stdlib.h>
#include <stdio.h>
int main(void) {
srand(1234567); // choose arbitrary seed
int coin1, coin2, n, sum = 0;
for (n = 0; n < RUNS; n++) {
do {
coin1 = rand() % 2;
coin2 = rand() % 2;
} while (coin1 != coin2);
if (coin1==1 && coin2==1)
sum += BET;
else
sum -= BET;
}
printf("Final result: %d\n", sum);
printf("Average outcome: %f\n", (float) sum / RUNS);
return 0;
}
Seeding
There is one significant problem:
!"every time you run a program with the same seed, you get exactly the same sequence of 'random' numbers (why?)
To vary the output, can give the random seeder a starting point that varies with time
#include <time.h>
time(NULL) // returns the time as the number of seconds
// since the Epoch, 1970-01-01 00:00:00 +0000
Randomised Algorithms
14/79
Analysis of Randomised Algorithms
Math needed for the analysis of randomised algorithms:
!"Combinatorics
#"number of ways to choose k objects from n objects…
(k)
n n · (n − 1) · … · (n − k + 1)
=
1·2·…·k
1. ½
2. ½
3. Yes
4. 1 – ½·½ = ¾
5. 2 times
is the infinite sum ⋅ 1 + (1 − ) ⋅ ⋅ 2 + (1 − ) ⋅ ⋅ 3 + (1 − ) ⋅ ⋅ 4 + ...
1 1 1 1 1 2 1 1 3 1
Note that 2 =
½ 2 2 2 2 2 2 2
findKey(L,k):
| Input array L, key k
| Output some element in L with key k
|
| repeat
| randomly select e∈L
| until key(e)=k
| return e
Analysis:
1
!"p … ratio of elements in L with key k (e.g. p = )
3
!"Probability of success: 1 (if p > 0)
1
!"Expected runtime:
p
If we cannot guarantee that the array contains any elements with key k …
findKey(L,k,d):
| Input array L, key k, maximum #attempts d
| Output some element in L with key k
|
| repeat
| | if d=0 then
| | return failure
| | end if
| | randomly select e∈L
| | d=d-1
| until key(e)=k
| return e
Analysis:
( i=1..d−1 )
i ⋅ (1 − p)i−1 ⋅ p + d ⋅ (1 − p)d−1
∑
!"Expected runtime:
#"O(1) if d is a constant
21/79
Randomised Quicksort
Quicksort applies divide and conquer to sorting: First item: pivot = 0
!"Divide
#"pick a pivot element
#"move all elements smaller than the pivot to its left
#"move all elements greater than the pivot to its right
!"Conquer
#"sort the elements on the left
#"sort the elements on the right
22/79
Non-randomised Quicksort
Divide ...
partition(array,low,high):
| Input array, index range low..high
| Output selects array[low] as pivot element
| moves all smaller elements between low+1..high to its left
| moves all larger elements between low+1..high to its right
| returns new position of pivot element
|
| pivot_item=array[low], left=low+1, right=high
| repeat
| | right = find index of rightmost element <= pivot_item
| | left = find index of leftmost element > pivot_item // left=right if none
| | if left<right then
| | swap array[left] with array[right]
| | end if
| until left≥right do
| if low<right then
| swap array[low] with array[right] // right is final position for pivot
| end if
| return right
Quicksort(array,low,high):
| Input array, index range low..high
| Output array[low..high] sorted
|
| if high > low then // termination condition low >= high
| | pivot = partition(array,low,high)
| | Quicksort(array,low,pivot-1)
| | Quicksort(array,pivot+1,high)
| end if
1 2 | 3 | 5 4 6
1 2 | 3 | 4 | 5 | 6
25/79
Worst-case Running Time
Worst case for Quicksort occurs when the pivot is the unique minimum or maximum element:
!"One of the intervals low..pivot-1 and pivot+1..high is of size n-1 and the other is of size 0
⇒ running time is proportional to n + n-1 + … + 2 + 1
!"Hence the worst case for non-randomised Quicksort is O(n2)
1 2 3 4 5 6
1 | 2 3 4 5 6
1 | 2 | 3 4 5 6
...
1 | 2 | 3 | 4 | 5 | 6
26/79
Randomised Quicksort
partition(array,low,high):
| Input array, index range low..high
| Output randomly select a pivot element from array[low..high]
| moves all smaller elements between low..high to its left
| moves all larger elements between low..high to its right
| returns new position of pivot element
|
| randomly select pivot_index∈[low..high]
| pivot_item=array[pivot_index], swap array[low] with array[pivot_index]
| left=low+1, right=high
| repeat
| | right = find index of rightmost element <= pivot_item
| | left = find index of leftmost element > pivot_item // left=right if none
| | if left<right then
| | swap array[left] with array[right]
| | end if
| until left≥right do
| if low<right then
| swap array[low] with array[right] // right is final position for pivot
| end if
| return right
Analysis:
6 3 7 5 8 2 4 1
4 3 6 5 1 2 | 7 | 8
4 3 6 5 1 2 | 7 | 8
1 2 | 3 | 5 6 4 | 7 | 8
n … size of array
From probability theory we know that the expected number of coin tosses required in order to get k heads is 2·k
29/79
Minimum Cut Problem
Given:
Cut of a graph …
Example:
T
S
31/79
Contraction
Contracting edge e = {v,w} …
!"remove edge e
!"replace vertices v and w by new node n
!"replace all edges {x,v}, {x,w} by {x,n}
Randomised algorithm for graph contraction = repeated edge contraction until 2 vertices remain
contract(G):
| Input graph G = (V,E) with |V|≥2 vertices
| Output cut of G
|
| while |V|>2 do
| randomly select e∈E
| contract edge e in G
| end while
| return the only cut in G
Apply the contraction algorithm twice to the following graph, with different random choices:
... Contraction 34/79
Analysis:
V … number of vertices
Theorem. Every graph has 2V-1-1 cuts. At most
(2)
V
(2)
V
≥ 1/
!"This is much higher than the probability of picking a minimum cut at random, which is
(2)
V
≤ / (2V−1 − 1)
!"Single edge contraction can be implemented in O(V) time on an adjacency-list representation ⇒ total running time: O(V2)
35/79
Karger's Algorithm
Idea: Repeat random graph contraction several times and take the best cut found
MinCut(G):
| Input graph G with V≥2 vertices
| Output smallest cut found
|
| min_weight=∞, d=0
| repeat
| | cut=contract(G)
| | if weight(cut)<min_weight then
| | min_cut=cut, min_weight=weight(cut)
| | end if
| | d=d+1
| until d > binomial(V,2)·ln V
| return min_cut
Analysis:
V … number of vertices
E … number of edges
1
!"Probability of success: ≥ 1 −
V
(2)
V
#"probability of not finding a minimum cut when the contraction algorithm is repeated d = ⋅ ln V times:
[ ( 2 )]
d
V 1 1
≤ 1 − 1/ ≤ ln V =
e V
!"Total running time: O(E·d) = O(E·V2·log V)
#"assuming edge contraction implemented in O(E)
37/79
Sidetrack: Maxflow and Mincut
Given: flow network G=(V,E) with
ω(S,T) = 4
43/79
Randomised Algorithms for NP-hard Problems
Many NP-hard problems can be tackled by randomised algorithms that
Examples:
!"travelling salesman
!"constraint satisfaction problems, satisfiability
!"… and many more
Simulation
45/79
Simulation
In some problem scenarios
46/79
Example: Area inside a Curve
Scenario:
i.e. we approximate the area within the curve by using the ratio of points inside the curve against those outside
48/79
Summary
!" Analysis of randomised algorithms
!"probability of success
!"expected runtime
!"Suggested reading:
#"Moffat, Ch. 9.3, 9.5
50/79
Data Breaches
Major incidents …
Millions of people's Facebook profiles used for political purpose without their consent
⇒ Cambridge Analytica went bust as a consequence
Businesses and organisations must comply with the Australian Privacy Principles:
54/79
Data (Mis-)use
In 2012 several newspapers reported that …
!"Target used data analysis to predict whether female customers are likely pregnant
!"Target then sent coupons by mail
!"A Minneapolis man thus found out about the pregnancy of his teenage daughter
!"Respect privacy
#"Store only the minimum amount of personal information necessary
#"Prevent re-identification of anonymised data
!"Carefully analyse the consequences of data aggregation
!"Access data only when authorised or compelled by the public good
#"Whistleblower Manning's disclosing of classified military data to Wikileaks (2010-11)
#"Paradise papers that disclosed offshore investments (2017)
56/79
Costly Software Errors
NASA's Mars Climate Orbiter …
!"launched 11/12/1998
!"reached Mars on 23/9/1999
!"came too close to surface and disintegrated
Cause of failure:
Cause of failure:
59/79
Sidetrack: Year 2038 Problem
Recall:
#include <time.h>
time(NULL) // returns the time as the number of seconds
// since the Epoch, 1970-01-01 00:00:00 +0000
!"Software engineers shall ensure that their products meet the highest professional standards possible
#"Strive to fully understand the specifications for software
#"Ensure that specifications have been well documented and satisfy the users' requirements
#"Ensure adequate testing, debugging, and review of software and related documents
!"passenger jet V9 2937 and cargo jet QY 611 on collision course at 36,000 feet
!"ground air traffic controller instructed V9 pilot to descend
!"seconds later, the automatic Traffic Collision Avoidance System (TCAS)
#"instructed V9 2937 to climb
#"instructed QY 611 to descend
!"flight 611's pilot followed TCAS, flight 2937's pilot ignored TCAS
!"all 71 people on board the two planes killed
⇒ Collision would not have occurred had both pilots followed TCAS
The TCAS …
What algorithm would you use for reaching an agreement (climb vs. descent)?
63/79
Moral Dilemmas
How to program an autonomous car …
Variations:
Course Review
66/79
Course Review
Goal:
67/79
Assessment Summary
LAB = mark for programs/quizzes (out of 8+8)
MIDTRM = mark for mid-term test (out of 12)
ASST1 = mark for large assignment (out of 12)
EXAM = mark for final exam (out of 60)
Lectures, problem sets and assignments have built you up to this point.
Must start test between 1:45pm and 1:50pm to get the full 135 minutes (= 2hrs+15mins)
#"give me some feedback on how you might like the course to run in the future
#"even if that is "Exactly the same. I liked how it was run."
73/79
Revision Strategy
!"Re-read lecture slides and example programs
!"Read the corresponding chapters in the recommended textbooks
!"Review/solve problem sets
!"Attempt the prac exam on Moodle
!"Invent your own variations of the weekly exercises (problem solving is a skill that improves with practice)
74/79
Supplementary Exam
If you attend an exam
!"you are making a statement that you are "fit and healthy enough"
!"it is your only chance to pass (i.e. no second chances)
Note: Exam has been designed to account for up to 10mins of non-severe technical issues
75/79
Assessment
Assessment is about determining how well you understand the syllabus of this course.
Failure is a fact of life. For example, my scientific papers or project proposals get rejected sometimes too.
Summing Up …
77/79
So What Was the Real Point?
The aim was for you to become a better computer scientist
!"more confident in your own ability to design data structures and algorithms
!"with an expanded set of fundamental structures and algorithms to draw on
!"able to analyse and justify your choices
!"ultimately, enjoying the software design and development process
78/79
Finally …
Book 2
The Ancient Masters
😀 👍 🎓
Produced: 20 Apr 2022