0% found this document useful (0 votes)
19 views45 pages

06 Binsearch

Uploaded by

Andrew Hellman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views45 pages

06 Binsearch

Uploaded by

Andrew Hellman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Binary Search

Searching an Array

1
Linear Search
 Go through the array position by position until we find x

int search(int x, int[] A, int n)


//@requires n == \length(A);
/*@ensures (\result == -1 && !is_in(x, A, 0, n))
|| (0 <= \result && \result < n && A[\result] == x); @*/
{
for (int i = 0; i < n; i++) {
if (A[i] == x) return i;
}
return -1; Loop invariants
} omitted

 Worst case complexity: O(n)

2
Linear Search on Sorted Arrays
 Stop early if we find an element greater than x

int search(int x, int[] A, int n)


//@requires n == \length(A);
//@requires is_sorted(A, 0, n);
/*@ensures (\result == -1 && !is_in(x, A, 0, n))
|| (0 <= \result && \result < n && A[\result] == x); @*/
{
for (int i = 0; i < n; i++) {
if (A[i] == x) return i;
if (x < A[i]) return -1;
//@assert A[i] < x;
Loop invariants
} omitted
return -1;
}

 Worst case complexity: still O(n)


o e.g., if x is larger than any element in A
3
Can we do Better on Sorted Arrays?
 Look in the middle! Piece of cake!

o compare the midpoint element with x


o if found, great!
o if x is smaller, look for x in the lower half
o if x is bigger, look for x in the upper half

 This is
Binary Search

 Why better?
o we are throwing out half of the array each time!
 with linear search, we were throwing out just one element!
o if array has length n, we can halve it only log n times
4
A Cautionary Tale
Jon Bentley

Only 10% of programmers


can write binary search
o 90% had bugs!

 Binary search dates back


to 1946 (at least)
o First correct description in
1962

 Jon Bentley wrote the Jon Bentley,


definitive binary search Algorithms professor
at CMU in the 1980s
 and proved it correct

Read more at
https://reprog.wordpress.com/2010/04/19/ar
e-you-one-of-the-10-percent/
5
More of a Cautionary Tale
Joshua Bloch

 Joshua Bloch finds a bug in Jon


Joshua Bloch,
Bentley’s definitive binary search! • student of Jon Bentley
• works at Google
o that Bentley had proved correct!!! • occasionally adjunct prof. at CMU

 Went on to implementing several


searching and sorting algorithms
used in Android, Java and Python
o e.g., TimSort

Read more at
https://ai.googleblog.com/2006/06/extra-extra-read-all-
about-it-nearly.html
6
Even More of a Cautionary Tale
 Researchers find a bug in
Joshua Bloch’s code for
TimSort

o Implemented it in a language
with contracts (JML – Java
Modelling Language)
o Tried to prove correctness using
KeY theorem prover
Some of the same contract
mechanisms as C0
(and a few more)
(we borrowed our
contracts of them)

Read more at
http://www.envisage-project.eu/proving-android-
java-and-python-sorting-algorithm-is-broken-and-
how-to-fix-it/
7
Piece of cake?
 Implementing binary search is not as simple as it sounds
o many professionals have failed!

 We want to proceed carefully and methodically

 Contracts will be our guide!

8
Binary Search

9
0 1 2 3 4 5 6 7
Binary A: 2 3 5 9 11 13 17
find midpoint of A[0,7)
Search • index 3
• A[3] = 9 0 1 2 3 4 5 6 7

A: 2 3 5 9 11 13 17
 A is sorted 4<9
 Looking for • ignore A[4,7)
• ignore also A[3] 0 1 2 3 4 5 6 7

x=4 A: 2 3 5 9 11 13 17
find midpoint of A[0,3)
• index 1
• A[1] = 3 0 1 2 3 4 5 6 7

A: 2 3 5 9 11 13 17
3<4
• ignore A[0,1)
• ignore also A[1] 0 1 2 3 4 5 6 7

A: 2 3 5 9 11 13 17
find midpoint of A[2,3)
• index 2
• A[2] = 5 0 1 2 3 4 5 6 7

A: 2 3 5 9 11 13 17
4<5
• ignore A[3,3)
• ignore also A[2] 0 1 2 3 4 5 6 7

A: 2 3 5 9 11 13 17
nothing left!
• A[2,2) is empty
10 • 4 isn’t in A
lo hi
0 1 2 3 4 5 6 7
Binary A: 2 3 5 9 11 13 17
find midpoint of A[lo,hi)
Search • index mid = 3
• A[mid] = 9
lo
0 1 2
mid
3 4 5 6
hi
7

A: 2 3 5 9 11 13 17
 A is sorted 4 < A[mid]
 At each step, we • ignore A[mid+1,hi)
• ignore also A[mid]
lo
0 1 2
hi
3 4 5 6 7

o examine a A: 2 3 5 9 11 13 17
segment A[lo, hi) find midpoint of A[lo,hi)
• index mid = 1 lo mid hi
o find its midpoint • A[mid] = 3 0 1 2 3 4 5 6 7

mid A: 2 3 5 9 11 13 17
A[mid] < 4
o compare x = 4 • ignore A[lo,mid) lo hi
• ignore also A[mid] 0 1 2 3 4 5 6 7
with A[mid]
A: 2 3 5 9 11 13 17
find midpoint of A[lo,hi)
• index mid = 2 lo,mid hi
• A[mid] = 5 0 1 2 3 4 5 6 7

A: 2 3 5 9 11 13 17
4 < A[mid]
• ignore A[mid+1,hi) lo,hi
• ignore also A[mid] 0 1 2 3 4 5 6 7

A: 2 3 5 9 11 13 17
nothing left!
• A[lo,hi) is empty
11 • 4 isn’t in A
lo hi
0 1 2 3 4 5 6 7
Binary A: 2 3 5 9 11 13 17
find midpoint of A[lo,hi)
Search • index mid = 3
• A[mid] = 9
lo
0 1 2
mid
3 4 5 6
hi
7

A: 2 3 5 9 11 13 17
 Let’s look for A[mid] < 11
x = 11 • ignore A[lo,mid) lo hi
• ignore also A[mid] 0 1 2 3 4 5 6 7

 At each step, we A: 2 3 5 9 11 13 17

o examine a find midpoint of A[lo,hi)


• index mid = 5 lo mid hi
segment A[lo, hi) • A[mid] = 13 0 1 2 3 4 5 6 7

o find its midpoint 11 < A[mid]


A: 2 3 5 9 11 13 17

mid • ignore A[lo,mid) lo hi


• ignore also A[mid]
o compare x = 11 0

2
1

3
2

5
3

9
4

11
5

13
6

17
7

with A[mid] A:
find midpoint of A[lo,hi)
• index mid = 4 lo,mid hi
• A[mid] = 11 0 1 2 3 4 5 6 7

A: 2 3 5 9 11 13 17
11 = A[mid]
• found!
• return 4

12
Implementing Binary Search

13
Setting up Binary Search

Same contracts as linear


search: different algorithm to
int binsearch(int x, int[] A, int n) solve the same problem
//@requires n == \length(A);
//@requires is_sorted(A, 0, n);
/*@ensures (\result == -1 && !is_in(x, A, 0, n))
|| (0 <= \result && \result < n && A[\result] == x); @*/
{
int lo = 0; lo starts at 0,
int hi = n; hi at n
while (lo < hi)
{
… bunch of
} steps
return -1;
}

returns -1
if x not found

14
What do we Know at Each Step?
 At an arbitrary iteration, the picture is:

Too small! If x is in A, Too big!


it’s got to be here

0 lo hi n

A: … …

A[0, lo) < x x < A[hi, n)

 These are candidate loop invariant:


o gt_seg(x, A, 0, lo): that’s A[0, lo) < x
o lt_seg(x, A, hi, n): that’s x < A[hi, n)
o and of course 0 <= lo && lo <= hi && hi <= n

15
0 ≤ lo ≤ hi ≤ n
… …
Adding Loop Invariants A[0, lo) < x x < A[hi, n)

int binsearch(int x, int[] A, int n)


//@requires n == \length(A);
//@requires is_sorted(A, 0, n);
/*@ensures (\result == -1 && !is_in(x, A, 0, n))
|| (0 <= \result && \result < n && A[\result] == x); @*/
{
int lo = 0;
int hi = n;
while (lo < hi)
//@loop_invariant 0 <= lo && lo <= hi && hi <= n;
//@loop_invariant gt_seg(x, A, 0, lo);
//@loop_invariant lt_seg(x, A, hi, n);
{

}
return -1;
}

16
Are these Useful Loop Invariants?
0 ≤ lo ≤ hi ≤ n
Can they help prove the postcondition? … …

 Is return -1 correct? A[0, lo) < x x < A[hi, n)


(assuming invariants are valid)
 To show: if preconditions are met, then x  A[0, n)
A. lo ≥ hi by line 9 (negation of loop guard)
B. lo ≤ hi by line 10 (LI 1)
1. int binsearch(int x, int[] A, int n)
C. lo = hi by math on A, B 2. //@requires n == \length(A);

3. //@requires is_sorted(A, 0, n);


D. A[0,lo) < x by line 11 (LI 2) 4. /*@ensures (\result == -1 && !is_in(x, A, 0, n))

E. x A[0,lo) by math on D 5. || (0 <= \result && \result < n && A[\result] == x); @*/
6. {
F. x < A[hi,n) by line 12 (LI 3) 7. int lo = 0;
G. x A[hi,n) by math on F 8.

9.
int hi = n;
while (lo < hi)
H. x A[0,n) by math on C, E, G 10. //@loop_invariant 0 <= lo && lo <= hi && hi <= n;

11. //@loop_invariant gt_seg(x, A, 0, lo);
12. //@loop_invariant lt_seg(x, A, hi, n);

 This is a standard EXIT 13. {


14. …
argument 15. }
16. return -1;
17. }
17
Are the Loop Invariants Valid?
0 ≤ lo ≤ hi ≤ n
INIT … …

o lo = 0 by line 7 and hi = n by line 8 A[0, lo) < x x < A[hi, n)


 To show: 0 ≤ 0 by math
 To show: 0 ≤ n by line 2 (preconditions) and \length
 To show: n ≤ n by math
1. int binsearch(int x, int[] A, int n)
 To show: A[0, 0) < x 2. //@requires n == \length(A);

3. //@requires is_sorted(A, 0, n);


 To show: x < A[n, n) 4. /*@ensures (\result == -1 && !is_in(x, A, 0, n))
 by math (empty intervals) 5. || (0 <= \result && \result < n && A[\result] == x); @*/
 6. {
7. int lo = 0;
PRES 8.
9.
int hi = n;
while (lo < hi)
 Trivial 10. //@loop_invariant 0 <= lo && lo <= hi && hi <= n;

11. //@loop_invariant gt_seg(x, A, 0, lo);

o body is empty 12. //@loop_invariant lt_seg(x, A, hi, n);

13. {

o nothing changes!!! 14.


15. }

 16. //@assert lo == hi;

17. return -1;


from correctness
proof
18. }
18
Is binsearch Correct?
 EXIT 
 INIT 
 PRES 
 Termination 
o Infinite loop! 1. int binsearch(int x, int[] A, int n)
2. //@requires n == \length(A);

3. //@requires is_sorted(A, 0, n);

 Let’s implement what 4. /*@ensures (\result == -1 && !is_in(x, A, 0, n))


5. || (0 <= \result && \result < n && A[\result] == x); @*/
happens in a binary 6. {
7. int lo = 0;
search step 8. int hi = n;
9. while (lo < hi)
o compute the midpoint 10. //@loop_invariant 0 <= lo && lo <= hi && hi <= n;

11. //@loop_invariant gt_seg(x, A, 0, lo);


o compare its value to x 12. //@loop_invariant lt_seg(x, A, hi, n);

13. {

14. …
15. }

16. //@assert lo == hi;

17. return -1;

18. }
19
int binsearch(int x, int[] A, int n)
Adding //@requires n == \length(A);
//@requires is_sorted(A, 0, n);
the /*@ensures (\result == -1 && !is_in(x, A, 0, n))
|| (0 <= \result && \result < n && A[\result] == x); @*/
Body {
int lo = 0;
int hi = n;
while (lo < hi)
//@loop_invariant 0 <= lo && lo <= hi && hi <= n;
//@loop_invariant gt_seg(x, A, 0, lo);
//@loop_invariant lt_seg(x, A, hi, n);
{
int mid = (lo + hi) / 2; by high-school
math
if (A[mid] == x) return mid;
if (A[mid] < x) {
lo = mid + 1;
} else { //@assert A[mid] > x; if A[mid] not == x
and not < x,
hi = mid;
then A[mid] > x
}
}
//@assert lo == hi;
return -1;
}
20
1. int binsearch(int x, int[] A, int n)
2. //@requires n == \length(A);

Is it Safe? 3. //@requires is_sorted(A, 0, n);

4. /*@ensures … @*/
5. {

6. int lo = 0;
7. int hi = n;
 A[mid] must be in bounds 8.

9.
while (lo < hi)
//@loop_invariant 0 <= lo && lo <= hi && hi <= n;
o 0 ≤ mid < \length(A) 10. //@loop_invariant gt_seg(x, A, 0, lo);

11. //@loop_invariant lt_seg(x, A, hi, n);


12. {

13. int mid = (lo + hi) / 2;


0 lo mid hi n 14.

15. if (A[mid] == x) return mid;


A: … … 16. if (A[mid] < x) {
17. lo = mid + 1;
18. } else { //@assert A[mid] > x;
A[0, lo) < x x < A[hi, n) 19. hi = mid;
20. }
21. }
22. //@assert lo == hi;
23. return -1;

 We expect lo ≤ mid < hi 24. }

o not mid ≤ hi
 otherwise we could have mid == \length(A) by lines 2, 9
 Candidate assertion: lo <= mid && mid < hi
o We will check it later
21
1. int binsearch(int x, int[] A, int n)
2. //@requires n == \length(A);

3. //@requires is_sorted(A, 0, n);

Are the LI Valid? 4. /*@ensures … @*/


5. {

6. int lo = 0;
7. int hi = n;
INIT: unchanged  8.

9.
while (lo < hi)
//@loop_invariant 0 <= lo && lo <= hi && hi <= n;
10. //@loop_invariant gt_seg(x, A, 0, lo);
PRES 11. //@loop_invariant lt_seg(x, A, hi, n);

12. {
 To show: if 0 ≤ lo ≤ hi ≤ n,
13. int mid = (lo + hi) / 2;
then 0 ≤ lo’ ≤ hi’ ≤ n 14. //@assert lo <= mid && mid < hi; // Added
o if A[mid] == x, nothing to prove 15.

16.
if (A[mid] == x) return mid;
if (A[mid] < x) {
o if A[mid] < x 17.

18.
lo = mid + 1;
} else { //@assert A[mid] > x;
A. lo’ = mid+1 by line 17 19. hi = mid;
B. hi’ = hi (unchanged) 20. }
21. }
C. 0 ≤ lo by line 9 (LI1) 22. //@assert lo == hi;
D. lo ≤ mid by line 14 (to be checked) 23. return -1;

24. }
E. mid < hi by line 14 (to be checked)
F. mid < mid+1 by math on E (no overflow)
G. 0 ≤ lo’ by A, C, D, F
H. lo’ ≤ hi’ by math on A, B, E
I. hi’ ≤ n by B and assumption
o If A[mid] > x Left as exercise

22
int binsearch(int x, int[] A, int n)
1.

2. //@requires n == \length(A);

3. //@requires is_sorted(A, 0, n);

Are the LI Valid? 4. /*@ensures … @*/


5. {

6. int lo = 0;
7. int hi = n;
PRES (continued) 8.

9.
while (lo < hi)
//@loop_invariant 0 <= lo && lo <= hi && hi <= n;
 To show: if A[0, lo) < x, 10. //@loop_invariant gt_seg(x, A, 0, lo);

11. //@loop_invariant lt_seg(x, A, hi, n);


then A[0, lo’) < x
12. {

o if A[mid] == x, nothing to prove 13.


14.
int mid = (lo + hi) / 2;
//@assert lo <= mid && mid < hi;
o if A[mid] < x 15. if (A[mid] == x) return mid;
16. if (A[mid] < x) {
A. lo’ = mid+1 by line 17 17. lo = mid + 1;
B. A[0,n) sorted by line 3 18. } else { //@assert A[mid] > x;
C. A[0,mid) ≤ A[mid] by B 19. hi = mid;
20. }
D. A[0, mid+1) < x by math on C and line 16 21. }
o If A[mid] > x 22. //@assert lo == hi;

23. return -1;


A. lo’ = lo (unchanged) 24. }

B. A[0,lo) < x by assumption

 To show: if x < A[hi, n), then x < A[hi’, n)


Left as exercise

23
1. int binsearch(int x, int[] A, int n)
2. //@requires n == \length(A);

3. //@requires is_sorted(A, 0, n);

Does it Terminate? 4. /*@ensures … @*/


5. {

6. int lo = 0;
7. int hi = n;
The quantity hi-lo decreases in an 8.

9.
while (lo < hi)
//@loop_invariant 0 <= lo && lo <= hi && hi <= n;
arbitrary iteration of the loop and 10. //@loop_invariant gt_seg(x, A, 0, lo);

11. //@loop_invariant lt_seg(x, A, hi, n);


never gets smaller than 0 12. {

13. int mid = (lo + hi) / 2;


 This is the usual operational argument 14. //@assert lo <= mid && mid < hi;
15. if (A[mid] == x) return mid;
16. if (A[mid] < x) {
lo = mid + 1;
 We can also give a point-to argument
17.

18. } else { //@assert A[mid] > x;


hi = mid;
 To show: if 0 < hi - lo, 19.

20. }
then 0 ≤ hi’ - lo’ < hi - lo 21. }

o if A[mid] == x, nothing to prove 22. //@assert lo == hi;

23. return -1;

o if A[mid] < x 24. }

A. hi’ - lo’ = hi - (mid+1) by line 17 (and hi unchanged)


B. < hi - mid by math
C. ≤ hi - lo by line 14 (to be checked)
D. hi’ - lo’ = hi - (mid+1) ≥ (mid+1) - (mid+1) = 0 by lines 17, 16, 14 and math
o If A[mid] > x
Left as exercise
24
The Midpoint Assertion

int mid = (lo + hi) / 2; by high-school
//@assert lo <= mid && mid < hi; math

 We need to show that lo <= mid && mid < hi


Linux Terminal

 … but is it true? Counterexample # coin -l util


--> int lo = int_max() - 2;
o We expect lo is 2147483645 (int)
mid == int_max() - 1 == 2147483646 --> int hi = int_max();
o but we get mid == -2 !!!! hi is 2147483647 (int)
--> int mid = (lo + hi) / 2;
lo + hi overflows! mid is -2 (int)

 This is Jon Bentley’s bug!


o Google was the first company to need arrays that big
 and Joshua Bloch worked there
25
The Midpoint Assertion
 Can we compute the midpoint without overflow?


int mid = lo + (hi - lo) / 2; Joshua Bloch’s
//@assert lo <= mid && mid < hi; fix

o Does it work? Left as exercise


 show that (lo + hi) / 2 is mathematically equal to lo + (hi - lo) / 2
 show that lo + (hi - lo) / 2 never overflows for lo ≤ hi

 What about int mid = lo / 2 + hi / 2; ?


o never overflows,
o but not mathematically equal to (lo + hi) / 2
Left as exercise

26
int binsearch(int x, int[] A, int n)
//@requires n == \length(A);
Final //@requires is_sorted(A, 0, n);
/*@ensures (\result == -1 && !is_in(x, A, 0, n))
Code || (0 <= \result && \result < n && A[\result] == x); @*/
{
for int lo = 0;
int hi = n;
binsearch while (lo < hi)
//@loop_invariant 0 <= lo && lo <= hi && hi <= n;
//@loop_invariant gt_seg(x, A, 0, lo);
 Safe //@loop_invariant lt_seg(x, A, hi, n);
{
 Correct int mid = lo + (hi - lo) / 2;
//@assert lo <= mid && mid < hi;
if (A[mid] == x) return mid;
if (A[mid] < x) {
lo = mid + 1;
} else { //@assert A[mid] > x;
hi = mid;
}
}
//@assert lo == hi;
return -1;
}
27
int binsearch(int x, int[] A, int n)
Complexity of //@requires n == \length(A);
{
Binary Search int lo = 0;
int hi = n;
 Given an array of size n, while (lo < hi) {
int mid = lo + (hi - lo) / 2;
o we halve the segment considered
at each iteration if (A[mid] == x) return mid;
o we can do this at most log n times before if (A[mid] < x) {
hitting the empty array lo = mid + 1;
} else {
hi = mid;
 Each iteration has constant cost }
}
return -1; Contracts
 Complexity of binary search is } omitted

O(log n)

28
The Logarithmic Advantage

29
Is O(log n) a Big Deal?

Just some boring


functions
we learned in
math classes?

 What does log n mean in practice?

30
Visualizing Linear and Binary Search

Binary
Search
O(log n)

Linear Search
O(n)

31
Visualizing Linear and Binary Search

2m

m = log n
32
Drawing for small values of m
4
5
6
7
8
9 10

 What do you notice?

33
Searching with Ants
 Place items 1 cm apart
 Horizontally
 Vertically
 Ant walks 1cm/s

m sec

2m sec
34
Searching 1000 items with Ants

210 cm ≈ 10 m

17 minutes

10 seconds better

35
1 Million Items

20 cm

220 cm ≈ 10 km

12 days
20 seconds

36
2 Billion

31 cm

231 cm ≈ 20,000 km

way
63 years better!

31 seconds

37
35 Billion Items

35 cm

235 cm ≈ 376,289 km

forget
about it
35 seconds

38
To the Sun

44 cm

244 cm ≈ 149,600,000 km

44 seconds

39
To the Next Star

62 cm

262 cm ≈ 4.24 light-years

Proxima Centauri

62 seconds

40
To the Next Galaxy

74 cm

274 cm ≈ 25,000 light-years

Canis Major Dwarf

74 seconds

41
The Observable Universe

96 cm

296 cm ≈ 92 billion light-years

42 96 seconds
All the Atoms in the Universe

265 cm

1080 cm

There is nothing else


we could possibly search … 265 seconds

43
Is O(log n) a Big Deal?

YES
 Constant for practical purposes
o It takes just 265 steps to search all atoms in the universe!

log n is
really neat if you are
a computer scientist!

44

You might also like