06 Binsearch
06 Binsearch
Searching an Array
1
Linear Search
Go through the array position by position until we find x
2
Linear Search on Sorted Arrays
Stop early if we find an element greater than x
This is
Binary Search
Why better?
o we are throwing out half of the array each time!
with linear search, we were throwing out just one element!
o if array has length n, we can halve it only log n times
4
A Cautionary Tale
Jon Bentley
Read more at
https://reprog.wordpress.com/2010/04/19/ar
e-you-one-of-the-10-percent/
5
More of a Cautionary Tale
Joshua Bloch
Read more at
https://ai.googleblog.com/2006/06/extra-extra-read-all-
about-it-nearly.html
6
Even More of a Cautionary Tale
Researchers find a bug in
Joshua Bloch’s code for
TimSort
o Implemented it in a language
with contracts (JML – Java
Modelling Language)
o Tried to prove correctness using
KeY theorem prover
Some of the same contract
mechanisms as C0
(and a few more)
(we borrowed our
contracts of them)
Read more at
http://www.envisage-project.eu/proving-android-
java-and-python-sorting-algorithm-is-broken-and-
how-to-fix-it/
7
Piece of cake?
Implementing binary search is not as simple as it sounds
o many professionals have failed!
8
Binary Search
9
0 1 2 3 4 5 6 7
Binary A: 2 3 5 9 11 13 17
find midpoint of A[0,7)
Search • index 3
• A[3] = 9 0 1 2 3 4 5 6 7
A: 2 3 5 9 11 13 17
A is sorted 4<9
Looking for • ignore A[4,7)
• ignore also A[3] 0 1 2 3 4 5 6 7
x=4 A: 2 3 5 9 11 13 17
find midpoint of A[0,3)
• index 1
• A[1] = 3 0 1 2 3 4 5 6 7
A: 2 3 5 9 11 13 17
3<4
• ignore A[0,1)
• ignore also A[1] 0 1 2 3 4 5 6 7
A: 2 3 5 9 11 13 17
find midpoint of A[2,3)
• index 2
• A[2] = 5 0 1 2 3 4 5 6 7
A: 2 3 5 9 11 13 17
4<5
• ignore A[3,3)
• ignore also A[2] 0 1 2 3 4 5 6 7
A: 2 3 5 9 11 13 17
nothing left!
• A[2,2) is empty
10 • 4 isn’t in A
lo hi
0 1 2 3 4 5 6 7
Binary A: 2 3 5 9 11 13 17
find midpoint of A[lo,hi)
Search • index mid = 3
• A[mid] = 9
lo
0 1 2
mid
3 4 5 6
hi
7
A: 2 3 5 9 11 13 17
A is sorted 4 < A[mid]
At each step, we • ignore A[mid+1,hi)
• ignore also A[mid]
lo
0 1 2
hi
3 4 5 6 7
o examine a A: 2 3 5 9 11 13 17
segment A[lo, hi) find midpoint of A[lo,hi)
• index mid = 1 lo mid hi
o find its midpoint • A[mid] = 3 0 1 2 3 4 5 6 7
mid A: 2 3 5 9 11 13 17
A[mid] < 4
o compare x = 4 • ignore A[lo,mid) lo hi
• ignore also A[mid] 0 1 2 3 4 5 6 7
with A[mid]
A: 2 3 5 9 11 13 17
find midpoint of A[lo,hi)
• index mid = 2 lo,mid hi
• A[mid] = 5 0 1 2 3 4 5 6 7
A: 2 3 5 9 11 13 17
4 < A[mid]
• ignore A[mid+1,hi) lo,hi
• ignore also A[mid] 0 1 2 3 4 5 6 7
A: 2 3 5 9 11 13 17
nothing left!
• A[lo,hi) is empty
11 • 4 isn’t in A
lo hi
0 1 2 3 4 5 6 7
Binary A: 2 3 5 9 11 13 17
find midpoint of A[lo,hi)
Search • index mid = 3
• A[mid] = 9
lo
0 1 2
mid
3 4 5 6
hi
7
A: 2 3 5 9 11 13 17
Let’s look for A[mid] < 11
x = 11 • ignore A[lo,mid) lo hi
• ignore also A[mid] 0 1 2 3 4 5 6 7
At each step, we A: 2 3 5 9 11 13 17
2
1
3
2
5
3
9
4
11
5
13
6
17
7
with A[mid] A:
find midpoint of A[lo,hi)
• index mid = 4 lo,mid hi
• A[mid] = 11 0 1 2 3 4 5 6 7
A: 2 3 5 9 11 13 17
11 = A[mid]
• found!
• return 4
12
Implementing Binary Search
13
Setting up Binary Search
returns -1
if x not found
14
What do we Know at Each Step?
At an arbitrary iteration, the picture is:
0 lo hi n
A: … …
15
0 ≤ lo ≤ hi ≤ n
… …
Adding Loop Invariants A[0, lo) < x x < A[hi, n)
16
Are these Useful Loop Invariants?
0 ≤ lo ≤ hi ≤ n
Can they help prove the postcondition? … …
E. x A[0,lo) by math on D 5. || (0 <= \result && \result < n && A[\result] == x); @*/
6. {
F. x < A[hi,n) by line 12 (LI 3) 7. int lo = 0;
G. x A[hi,n) by math on F 8.
9.
int hi = n;
while (lo < hi)
H. x A[0,n) by math on C, E, G 10. //@loop_invariant 0 <= lo && lo <= hi && hi <= n;
11. //@loop_invariant gt_seg(x, A, 0, lo);
12. //@loop_invariant lt_seg(x, A, hi, n);
13. {
13. {
14. …
15. }
18. }
19
int binsearch(int x, int[] A, int n)
Adding //@requires n == \length(A);
//@requires is_sorted(A, 0, n);
the /*@ensures (\result == -1 && !is_in(x, A, 0, n))
|| (0 <= \result && \result < n && A[\result] == x); @*/
Body {
int lo = 0;
int hi = n;
while (lo < hi)
//@loop_invariant 0 <= lo && lo <= hi && hi <= n;
//@loop_invariant gt_seg(x, A, 0, lo);
//@loop_invariant lt_seg(x, A, hi, n);
{
int mid = (lo + hi) / 2; by high-school
math
if (A[mid] == x) return mid;
if (A[mid] < x) {
lo = mid + 1;
} else { //@assert A[mid] > x; if A[mid] not == x
and not < x,
hi = mid;
then A[mid] > x
}
}
//@assert lo == hi;
return -1;
}
20
1. int binsearch(int x, int[] A, int n)
2. //@requires n == \length(A);
4. /*@ensures … @*/
5. {
6. int lo = 0;
7. int hi = n;
A[mid] must be in bounds 8.
9.
while (lo < hi)
//@loop_invariant 0 <= lo && lo <= hi && hi <= n;
o 0 ≤ mid < \length(A) 10. //@loop_invariant gt_seg(x, A, 0, lo);
o not mid ≤ hi
otherwise we could have mid == \length(A) by lines 2, 9
Candidate assertion: lo <= mid && mid < hi
o We will check it later
21
1. int binsearch(int x, int[] A, int n)
2. //@requires n == \length(A);
6. int lo = 0;
7. int hi = n;
INIT: unchanged 8.
9.
while (lo < hi)
//@loop_invariant 0 <= lo && lo <= hi && hi <= n;
10. //@loop_invariant gt_seg(x, A, 0, lo);
PRES 11. //@loop_invariant lt_seg(x, A, hi, n);
12. {
To show: if 0 ≤ lo ≤ hi ≤ n,
13. int mid = (lo + hi) / 2;
then 0 ≤ lo’ ≤ hi’ ≤ n 14. //@assert lo <= mid && mid < hi; // Added
o if A[mid] == x, nothing to prove 15.
16.
if (A[mid] == x) return mid;
if (A[mid] < x) {
o if A[mid] < x 17.
18.
lo = mid + 1;
} else { //@assert A[mid] > x;
A. lo’ = mid+1 by line 17 19. hi = mid;
B. hi’ = hi (unchanged) 20. }
21. }
C. 0 ≤ lo by line 9 (LI1) 22. //@assert lo == hi;
D. lo ≤ mid by line 14 (to be checked) 23. return -1;
24. }
E. mid < hi by line 14 (to be checked)
F. mid < mid+1 by math on E (no overflow)
G. 0 ≤ lo’ by A, C, D, F
H. lo’ ≤ hi’ by math on A, B, E
I. hi’ ≤ n by B and assumption
o If A[mid] > x Left as exercise
22
int binsearch(int x, int[] A, int n)
1.
2. //@requires n == \length(A);
6. int lo = 0;
7. int hi = n;
PRES (continued) 8.
9.
while (lo < hi)
//@loop_invariant 0 <= lo && lo <= hi && hi <= n;
To show: if A[0, lo) < x, 10. //@loop_invariant gt_seg(x, A, 0, lo);
6. int lo = 0;
7. int hi = n;
The quantity hi-lo decreases in an 8.
9.
while (lo < hi)
//@loop_invariant 0 <= lo && lo <= hi && hi <= n;
arbitrary iteration of the loop and 10. //@loop_invariant gt_seg(x, A, 0, lo);
20. }
then 0 ≤ hi’ - lo’ < hi - lo 21. }
…
int mid = lo + (hi - lo) / 2; Joshua Bloch’s
//@assert lo <= mid && mid < hi; fix
…
26
int binsearch(int x, int[] A, int n)
//@requires n == \length(A);
Final //@requires is_sorted(A, 0, n);
/*@ensures (\result == -1 && !is_in(x, A, 0, n))
Code || (0 <= \result && \result < n && A[\result] == x); @*/
{
for int lo = 0;
int hi = n;
binsearch while (lo < hi)
//@loop_invariant 0 <= lo && lo <= hi && hi <= n;
//@loop_invariant gt_seg(x, A, 0, lo);
Safe //@loop_invariant lt_seg(x, A, hi, n);
{
Correct int mid = lo + (hi - lo) / 2;
//@assert lo <= mid && mid < hi;
if (A[mid] == x) return mid;
if (A[mid] < x) {
lo = mid + 1;
} else { //@assert A[mid] > x;
hi = mid;
}
}
//@assert lo == hi;
return -1;
}
27
int binsearch(int x, int[] A, int n)
Complexity of //@requires n == \length(A);
{
Binary Search int lo = 0;
int hi = n;
Given an array of size n, while (lo < hi) {
int mid = lo + (hi - lo) / 2;
o we halve the segment considered
at each iteration if (A[mid] == x) return mid;
o we can do this at most log n times before if (A[mid] < x) {
hitting the empty array lo = mid + 1;
} else {
hi = mid;
Each iteration has constant cost }
}
return -1; Contracts
Complexity of binary search is } omitted
O(log n)
28
The Logarithmic Advantage
29
Is O(log n) a Big Deal?
30
Visualizing Linear and Binary Search
Binary
Search
O(log n)
Linear Search
O(n)
31
Visualizing Linear and Binary Search
2m
m = log n
32
Drawing for small values of m
4
5
6
7
8
9 10
33
Searching with Ants
Place items 1 cm apart
Horizontally
Vertically
Ant walks 1cm/s
m sec
2m sec
34
Searching 1000 items with Ants
210 cm ≈ 10 m
17 minutes
10 seconds better
35
1 Million Items
20 cm
220 cm ≈ 10 km
12 days
20 seconds
36
2 Billion
31 cm
231 cm ≈ 20,000 km
way
63 years better!
31 seconds
37
35 Billion Items
35 cm
235 cm ≈ 376,289 km
forget
about it
35 seconds
38
To the Sun
44 cm
244 cm ≈ 149,600,000 km
44 seconds
39
To the Next Star
62 cm
Proxima Centauri
62 seconds
40
To the Next Galaxy
74 cm
74 seconds
41
The Observable Universe
96 cm
42 96 seconds
All the Atoms in the Universe
265 cm
1080 cm
43
Is O(log n) a Big Deal?
YES
Constant for practical purposes
o It takes just 265 steps to search all atoms in the universe!
log n is
really neat if you are
a computer scientist!
44