Python and Algorithms : University of New York at Stony Brook
Python and Algorithms : University of New York at Stony Brook
This text was written purely for fun (I know, I know, this is a
broad definition of the word fun...) with no pretensions for
anything big, so please forgive me (or better, let me know) if you
find any typo or mistake. I am not a computer scientist by
formation (I am actually an almost-I-swear-it-is-close-Ph.D. in
Physics) so this maybe makes things a little less usual (or risky?).
I hope you have fun!
1 Numbers 11
1.1 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Floats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 The fractions Module . . . . . . . . . . . . . . . . . . . . . 14
1.5 The decimal Module . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 Other Representations . . . . . . . . . . . . . . . . . . . . . . 15
1.7 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . 16
5
6 CONTENTS
5 Object-Oriented Design 89
5.1 Classes and Objects . . . . . . . . . . . . . . . . . . . . . . . 90
5.2 Principles of OOP . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3 Python Design Patterns . . . . . . . . . . . . . . . . . . . . . 94
5.4 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . 96
8 Sorting 139
8.1 Quadratic Sort . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.2 Linear Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.3 Loglinear Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.4 Comparison Between Sorting Methods . . . . . . . . . . . . . 148
8.5 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . 149
9 Searching 153
9.1 Sequential Search . . . . . . . . . . . . . . . . . . . . . . . . . 153
9.2 Binary Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
9.3 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . 156
9
Chapter 1
Numbers
When you learn a new language, the first thing you usually do (after our
dear hello world) is to play with some arithmetic operations. Numbers
can be integers, float point number, or complex. They are usually given
decimal representation but can be represented in any bases such as binary,
hexadecimal, octahedral. In this section we will learn how Python deals
with numbers.
1.1 Integers
Python represents integers (positive and negative whole numbers) using the
int (immutable) type. For immutable objects, there is no difference between
a variable and an object difference.
The size of Pythons integers is limited only by the machine memory, not
by a fixed number of bytes (the range depends on the C or Java compiler
that Python was built with). Usually plain integers are at least 32-bit long
(4 bytes)1 .To see how many bytes a integer needs to be represented, starting
in Python 3.1, the int.bit length() method is available:
>>> (999).bit_length()
10
11
12 CHAPTER 1. NUMBERS
>>> d = int(s)
>>> print(d)
11
>>> b = int(s, 2)
>>> print(b)
3
1.2 Floats
Numbers with a fractional part are represented by the immutable type
float. In the case of single precision, a 32-bit float is represented by 1
bit for sign (negative being 1, positive being 0) + 23 bits for the significant
digits (or mantissa) + 8 bits for the exponent. In case of a double precision,
the mantissa will have 53 bits instead. Also, the exponent is usually rep-
resented using the biased notation, where you add the number 127 to the
original value3 .
Comparing Floats
We should never compare floats for equality nor subtract them. The reason
for this is that floats are represented in binary fractions and there are many
numbers that are exact in a decimal base but not exact in a binary base (for
example, the decimal 0.1). Equality tests should instead be done in terms
of some predefined precision. For example, we can use the same approach
that Pythons unittest module has with assert AlmostEqual:
>>> def a(x , y, places=7):
... return round(abs(x-y), places) == 0
The complex data type is an immutable type that holds a pair of floats:
z = 3 + 4j, with methods such as: z.real, z.imag, and z.conjugate().
Complex numbers are imported from the cmath module, which provides
complex number versions of most of the trigonometric and logarithmic func-
tions that are in the math module, plus some complex number-specific func-
tions such: cmath.phase(), cmath.polar(), cmath.rect(), cmath.pi, and
cmath.e.
14 CHAPTER 1. NUMBERS
def float_to_fractions(number):
return Fraction(*number.as_integer_ratio())
if __name__ == __main__:
4
All the codes shown in this book show a directory structure of where you can find it
in my git repository. Also notice that, when you write your own codes, that the PEP 8
(Python Enhancement Proposal) guidelines recommend four spaces per level of indenta-
tion, and only spaces (no tabs). This is not explicit here because of the way Latex format
the text.
1.5. THE DECIMAL MODULE 15
test_testing_floats()
While The math and cmath modules are not suitable for the decimal
module, its built-in functions such as decimal.Decimal.exp(x) are enough
to most of the problems.
def test_convert_to_decimal():
number, base = 1001, 2
assert(convert_to_decimal(number, base) == 9)
print(Tests passed!)
if __name__ == __main__:
test_convert_to_decimal()
By swapping all the occurrences of 10 with any other base in our previous
method we can create a function that converts from a decimal number to
another number (2 base 10):
[general_problems/numbers/convert_from_decimal.py]
def test_convert_from_decimal():
number, base = 9, 2
assert(convert_from_decimal(number, base) == 1001)
print(Tests passed!)
1.7. ADDITIONAL EXERCISES 17
if __name__ == __main__:
test_convert_from_decimal()
def test_convert_from_decimal_larger_bases():
number, base = 31, 16
assert(convert_from_decimal_larger_bases(number, base) == 1F)
print(Tests passed!)
if __name__ == __main__:
test_convert_from_decimal_larger_bases()
if __name__ == __main__:
test_convert_dec_to_any_base_rec()
def test_finding_gcd():
number1 = 21
number2 = 12
assert(finding_gcd(number1, number2) == 3)
print(Tests passed!)
if __name__ == __main__:
test_finding_gcd()
import random
def testing_random():
testing the module random
values = [1, 2, 3, 4]
1.7. ADDITIONAL EXERCISES 19
print(random.choice(values))
print(random.choice(values))
print(random.choice(values))
print(random.sample(values, 2))
print(random.sample(values, 3))
shuffle in place
random.shuffle(values)
print(values)
if __name__ == __main__:
testing_random()
Fibonacci Sequences
The module bellow shows how to find the nth number in a Fibonacci sequence
in three ways: (a) with a recursive O(2n ) runtime; (b) with a iterative O(n2 )
runtime; and (c) using a formula that gives a O(1) runtime but is not precise
after around the 70th element:
[general_problems/numbers/find_fibonacci_seq.py]
import math
def find_fibonacci_seq_rec(n):
if n < 2: return n
return find_fibonacci_seq_rec(n - 1) + find_fibonacci_seq_rec(n
- 2)
def find_fibonacci_seq_iter(n):
if n < 2: return n
a, b = 0, 1
for i in range(n):
a, b = b, a + b
return a
def find_fibonacci_seq_form(n):
20 CHAPTER 1. NUMBERS
sq5 = math.sqrt(5)
phi = (1 + sq5) / 2
return int(math.floor(phi ** n / sq5))
def test_find_fib():
n = 10
assert(find_fibonacci_seq_rec(n) == 55)
assert(find_fibonacci_seq_iter(n) == 55)
assert(find_fibonacci_seq_form(n) == 55)
print(Tests passed!)
if __name__ == __main__:
test_find_fib()
Primes
The following program finds whether a number is a prime in three ways:
(a) brute force; (b) rejecting all the candidates up to the square root of the
number; and (c) using the Fermats theorem with probabilistic tests:
[general_problems/numbers/finding_if_prime.py]
import math
import random
def finding_prime(number):
num = abs(number)
if num < 4 : return True
for x in range(2, num):
if num % x == 0:
return False
return True
def finding_prime_sqrt(number):
num = abs(number)
if num < 4 : return True
for x in range(2, int(math.sqrt(num)) + 1):
if number % x == 0:
return False
return True
1.7. ADDITIONAL EXERCISES 21
def finding_prime_fermat(number):
if number <= 102:
for a in range(2, number):
if pow(a, number- 1, number) != 1:
return False
return True
else:
for i in range(100):
a = random.randint(2, number - 1)
if pow(a, number - 1, number) != 1:
return False
return True
def test_finding_prime():
number1 = 17
number2 = 20
assert(finding_prime(number1) == True)
assert(finding_prime(number2) == False)
assert(finding_prime_sqrt(number1) == True)
assert(finding_prime_sqrt(number2) == False)
assert(finding_prime_fermat(number1) == True)
assert(finding_prime_fermat(number2) == False)
print(Tests passed!)
if __name__ == __main__:
test_finding_prime()
import math
import random
import sys
from finding_prime import finding_prime_sqrt
def generate_prime(number=3):
while 1:
p = random.randint(pow(2, number-2), pow(2, number-1)-1)
p = 2 * p + 1
if finding_prime_sqrt(p):
return p
22 CHAPTER 1. NUMBERS
if __name__ == __main__:
if len(sys.argv) < 2:
print ("Usage: generate_prime.py number")
sys.exit()
else:
number = int(sys.argv[1])
print(generate_prime(number))
import numpy as np
def testing_numpy():
tests many features of numpy
ax = np.array([1,2,3])
ay = np.array([3,4,5])
print(ax)
print(ax*2)
print(ax+10)
print(np.sqrt(ax))
1.7. ADDITIONAL EXERCISES 23
print(np.cos(ax))
print(ax-ay)
print(np.where(ax<2, ax, 10))
if __name__ == __main__:
testing_numpy()
NumPy arrays are also much more efficient than Pythons lists, as we
can see in the benchmark tests below:
[general_problems/numbers/testing_numpy_speed.py]
import numpy
import time
def trad_version():
t1 = time.time()
X = range(10000000)
Y = range(10000000)
Z = []
for i in range(len(X)):
Z.append(X[i] + Y[i])
return time.time() - t1
def numpy_version():
t1 = time.time()
X = numpy.arange(10000000)
Y = numpy.arange(10000000)
Z = X + Y
return time.time() - t1
if __name__ == __main__:
print(trad_version())
print(numpy_version())
24 CHAPTER 1. NUMBERS
Results:
3.23564291
0.0714290142059
Chapter 2
The next step in our studies is learning how Python represents sequence
data types. A sequence type has the following properties:
? membership operator (for example, using in);
? a size method (given by len(seq));
? slicing properties (for example, seq[:-1]); and
? iterability (we can iterate the data in loops).
Python has five built-in sequence types: strings, tuples, lists, byte
arrays, and bytes:1
>>> l = []
>>> type(l)
<type list>
>>> s =
>>> type(s)
<type str>
>>> t = ()
>>> type(t)
<type tuple>
>>> ba = bytearray(b)
>>> type(ba)
<type bytearray>
>>> b = bytes([])
>>> type(byte)
<type type>
1
A named tuple is also available in the standard library, under the collections
package.
25
26 CHAPTER 2. BUILT-IN SEQUENCE TYPES
Mutability
Another propriety that any data type holds is mutability. Numbers are
obviously immutable; however, when it comes to sequence types, we can have
mutable types too. For instance, tuple, strings, and bytes are immutable,
while lists and byte arrays are mutable. Immutable types are more efficient
than mutable and some collection data types2 can only work with immutable
data types.
Since any variable is an object reference in Python, copying mutable
objects can be tricky. When you say a = b you are actually pointing a to
where b points. Therefore, to make a deep copy in Python you need to use
special procedures:
To make a copy of a list:
>>> newList = myList[:]
>>> newList2 = list(myList2)
To make a copy of a set (we will see in the next chapter), use:
>>> people = {"Buffy", "Angel", "Giles"}
2
Collection data types are the subject in the next chapter, and it includes, for example,
sets and dictionaries.
2.1. STRINGS 27
2.1 Strings
Python represents strings, i.e. a sequence of characters, using the im-
mutable str type. In Python, all objects have two output forms: while
string forms are designed to be human-readable, representational forms are
designed to produce an output that if fed to a Python interpreter, repro-
duces the represented object. In the future, when we write our own classes,
it will be important to defined the string representation of our our objects.
Unicode Strings
Pythons Unicode encoding is used to include a special characters in the
string (for example, whitespace). Starting from Python 3, all strings are
now Unicode, not just plain bytes. To create a Unicode string, we use the
u prefix:
>>> uGoodbye\u0020World !
Goodbye World !
In the example above, the escape sequence indicates the Unicode charac-
ter with the ordinal value 0x0020. It is also useful to remember that in
general ASCII representations are given by only 8-bits while the Unicode
representation needs 16-bits.
28 CHAPTER 2. BUILT-IN SEQUENCE TYPES
Joins all the strings in a list into one string. While we could use + to
concatenate these strings, when a large volume of data is involved, this
method becomes much less efficient than using join():
>>> slayer = ["Buffy", "Anne", "Summers"]
>>> " ".join(slayer)
Buffy Anne Summers
>>> "-<>-".join(slayer)
Buffy-<>-Anne-<>-Summers
>>> "".join(slayer)
BuffyAnneSummers
Some formation (aligning) can be obtained with the methods rjust() (add
only at the end), ljust() (add only at the start):
>>> name = "Agent Mulder"
>>> name.rjust(50, -)
-----------------------------Agent Mulder
From Python 3.1 it is possible to omit field names, in which case Python
will in effect put them in for us, using numbers starting from 0. For example:
>>> "{} {} {}".format("Python", "can", "count")
Python can count
However, using the operator + would allow a more concise style here. This
method allows three specifiers: s to force string form, r to force represen-
tational form, and a to force representational form but only using ASCII
characters:
>>> import decimal
>>> "{0} {0!s} {0!r} {0!a}".format(decimal.Decimal("99.9"))
"99.9 99.9 Decimal(99.9) Decimal(99.9)"
Returns the list of lines produced by splitting the string on line terminators,
stripping the terminators unless f is True:
>>> slayers = "Buffy\nFaith"
>>> slayers.splitlines()
[Buffy, Faith]
30 CHAPTER 2. BUILT-IN SEQUENCE TYPES
We can use split() to write our own method for erasing spaces from
strings:
>>> def erase_space_from_string(string):
... s1 = string.split(" ")
... s2 = "".join(s1)
... return s2
The program bellow uses strip() to list every word and the number of
the times they occur in alphabetical order for some file:3
[general_problems/strings/count_unique_words.py]
import string
import sys
3
A similar example is shown in the Default Dictionaries section.
2.1. STRINGS 31
def count_unique_word():
words = {} # create an empty dictionary
strip = string.whitespace + string.punctuation + string.digits +
"\""
for filename in sys.argv[1:]:
with open(filename) as file:
for line in file:
for word in line.lower().split():
word = word.strip(strip)
if len(word) > 2:
words[word] = words.get(word,0) +1
for word in sorted(words):
print("{0} occurs {1} times.".format(word, words[word]))
Similar methods are: lstrip(), which return a copy of the string with
all whitespace at the beginning of the string stripped away; and rstrip(),
which returns a copy of the string with all whitespace at the end of the
string stripped away.
? capitalize() returns a copy of the string with only the first character
in uppercase;
? lower() returns a copy of the original string, but with all characters
in lowercase;
? upper() returns a copy of the original string, but with all characters
in uppercase.
32 CHAPTER 2. BUILT-IN SEQUENCE TYPES
2.2 Tuples
A tuple is an immutable sequence type consisting of values separated by
commas:
>>> t1 = 1234, hello!
>>> t1[0]
1234
>>> t1
(12345, hello!)
>>> t2 = t2, (1, 2, 3, 4, 5) # nested
>>> u
((1234, hello!), (1, 2, 3, 4, 5))
>>> t = 1, 5, 7
>>> t.index(5)
1
Tuple Unpacking
In Python, any iterable can be unpacked using the sequence unpacking op-
erator, *. When used with two or more variables on the left-hand side of an
assignment, one of which preceded by *, items are assigned to the variables,
with all those left over assigned to the starred variable:
>>> x, *y = (1, 2, 3, 4)
>>> x
1
>>> y
[2, 3, 4]
Named Tuples
Pythons package collections4 contains a sequence data type called named
tuple. This behaves just like the built-in tuple, with the same performance
characteristics, but it also carries the ability to refer to items in the tuple
by name as well as by index position. This allows the creation of aggregates
of data items:
>>> import collections
>>> MonsterTuple = collections.namedtuple("Monsters","name age
power")
>>> MonsterTuple = (Vampire, 230, immortal)
>>> MonsterTuple
(Vampire, 230, immortal)
[general_problems/tuples/namedtuple_example.py]
def namedtuple_example():
show an example for named tuples
>>> namedtuple_example()
slayer
sunnydale = namedtuple(name, [job, age])
buffy = sunnydale(slayer, 17)
print(buffy.job)
if __name__ == __main__:
namedtuple_example()
2.3 Lists
In computer science, arrays are a very simple data structure where elements
are sequentially stored in continued memory and linked lists are structures
where several separated nodes link to each other. Iterating over the contents
of the data structure is equally efficient for both kinds, but directly accessing
an element at a given index has O(1) (complexity) runtime5 in an array,
while it is O(n) in a linked list with n nodes (where you would have to
transverse the list from the beginning). Furthermore, in a linked list, once
you know where you want to insert something, insertion is O(1), no matter
how many elements the list has. For arrays, an insertion would have to move
all elements that are to the right of the insertion point or moving all the
elements to a larger array if needed, being then O(n).
In Python, the closest object to an array is a list, which is a dynamic re-
sizing array and it does not have anything to do with linked lists. Why men-
tion linked lists? Linked lists are a very important abstract data structure
(we will see more about them in a following chapter) and it is fundamental
to understand what makes it so different from arrays (or Pythons lists) for
when we need to select the right data structure for a specific problem.
5
The Big-O notation is a key to understand algorithms! We will learn more about this
in the following chapters and use the concept extensively in our studies. For now just keep
in mine that O(1) times O(n) O(n2 ), etc...
36 CHAPTER 2. BUILT-IN SEQUENCE TYPES
To insert items, lists perform best (O(1)) when items are added or re-
moved at the end, using the methods append() and pop(), respectively. The
worst performance (O(n)) occurs when we perform operations that need to
search for items in the list, for example, using remove() or index(), or
using in for membership testing.6
If fast searching or membership testing is required, a collection type such
as a set or a dictionary may be a more suitable choice (as we will see in the
next chapter). Alternatively, lists can provide fast searching if they are kept
in order by being sorted (we will see searching methods that perform on
O(log n) for sorted sequences, particular the binary search, in the following
chapters).
>>> people.remove("Buffy")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.remove(x): x not in list
Removes the item at the given position in the list, and then returns it. If
no index is specified, pop() returns the last item in the list:
>>> people = ["Buffy", "Faith"]
>>> people.pop()
Faith
>>> people
[Buffy]
It deletes the object reference, not the contend, i.e., it is a way to remove
an item from a list given its index instead of its value. This can also be used
to remove slices from a list:
>>> a = [-1, 4, 5, 7, 10]
>>> del a[0]
>>> a
[4, 5, 7, 10]
>>> del a[2:3]
>>> a
[4, 5, 10]
>>> del a # also used to delete entire variable
7
Garbage is a memory occupied by objects that are no longer referenced and garbage
collection is a form of automatic memory management, freeing the memory occupied by
the garbage.
2.3. LISTS 39
Returns the index in the list of the first item whose value is x:
>>> people = ["Buffy", "Faith"]
>>> people.index("Buffy")
0
List Unpacking
Similar to tuple unpacking:
40 CHAPTER 2. BUILT-IN SEQUENCE TYPES
Python also has a related concept called starred arguments, that can be
used as a passing argument for a function:
>>> def example_args(a, b, c):
... return a * b * c # here * is the multiplication operator
>>> L = [2, 3, 4]
>>> example_args(*L)
24
>>> example_args(2, *L[1:])
24
List Comprehensions
A list comprehension is an expression and loop (with an optional condition)
enclosed in brackets:8
[item for item in iterable]
[expression for item in iterable]
[expression for item in iterable if condition]
>>> d
[3.1, 3.14, 3.142, 3.1416, 3.14159]
>>> words = Buffy is awesome and a vampire slayer.split()
>>> e = [[w.upper(), w.lower(), len(w)] for w in words]
>>> for i in e:
... print(i)
...
[BUFFY, buffy, 5]
[IS, is, 2]
[AWESOME, awesome, 7]
[AND, and, 3]
[A, a, 1]
[VAMPIRE, vampire, 7]
[SLAYER, slayer, 6]
for x in range(5):
for y in range(5):
if x != y:
for z in range(5):
if y != z:
yield (x, y, z)
[Bad]
result = [(x, y) for x in range(10) for y in range(5) if x * y >
10]
42 CHAPTER 2. BUILT-IN SEQUENCE TYPES
return ((x, y, z)
for x in xrange(5)
for y in xrange(5)
if x != y
for z in xrange(5)
if y != z)
def test1():
l = []
for i in range(1000):
l = l + [i]
def test2():
l = []
for i in range(1000):
l.append(i)
def test3():
l = [i for i in range(1000)]
def test4():
l = list(range(1000))
if __name__ == __main__:
import timeit
t1 = timeit.Timer("test1()", "from __main__ import test1")
print("concat ",t1.timeit(number=1000), "milliseconds")
t2 = timeit.Timer("test2()", "from __main__ import test2")
print("append ",t2.timeit(number=1000), "milliseconds")
t3 = timeit.Timer("test3()", "from __main__ import test3")
print("comprehension ",t3.timeit(number=1000), "milliseconds")
2.4. BYTES AND BYTE ARRAYS 43
Differently from the last chapters sequence data structures, where the data
can be ordered or sliced, collection data structures are containers which
aggregates data without relating them. Collection data structures also have
some proprieties that sequence types have:
In Python, built-in collection data types are given by sets and dicts. In
addition, many useful collection data are found in the package collections,
as we will discuss in the last part of this chapter.
3.1 Sets
In Python, a Set is an unordered collection data type that is iterable, mu-
table, and has no duplicate elements. Sets are used for membership testing
and eliminating duplicate entries. Sets have O(1) insertion, so the runtime
of union is O(m + n). For intersection, it is only necessary to transverse the
smaller set, so the runtime is O(n). 1
1
Pythons collection package has supporting for Ordered sets. This data type enforces
some predefined comparison for their members.
45
46 CHAPTER 3. COLLECTION DATA STRUCTURES
Frozen Sets
Frozen sets are immutable objects that only support methods and opera-
tors that produce a result without affecting the frozen set or sets to which
they are applied.
def difference(l1):
""" return the list with duplicate elements removed """
return list(set(l1))
def test_sets_operations_with_lists():
l1 = [1,2,3,4,5,9,11,15]
48 CHAPTER 3. COLLECTION DATA STRUCTURES
l2 = [4,5,6,7,8]
l3 = []
assert(difference(l1) == [1, 2, 3, 4, 5, 9, 11, 15])
assert(difference(l2) == [8, 4, 5, 6, 7])
assert(intersection(l1, l2) == [4,5])
assert(union(l1, l2) == [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 15])
assert(difference(l3) == [])
assert(intersection(l3, l2) == l3)
assert(sorted(union(l3, l2)) == sorted(l2))
print(Tests passed!)
if __name__ == __main__:
test_sets_operations_with_lists()
def set_operations_with_dict():
pairs = [(a, 1), (b,2), (c,3)]
d1 = OrderedDict(pairs)
print(d1) # (a, 1), (b, 2), (c, 3)
2
Sets properties can be used on the dicts attributes items() and keys() attributes,
however values() do not support set operations.
3.2. DICTIONARIES 49
if __name__ == __main__:
set_operations_with_dict()
3.2 Dictionaries
Dictionaries in Python are implemented using hash tables. Hashing func-
tions compute some random integer value from an arbitrary object in con-
stant time, that can be used as an index into an array:
>>> hash(42)
42
>>> hash("hello")
355070280260770553
def usual_dict(dict_data):
newdata = {}
for k, v in dict_data:
if k in newdata:
newdata[k].append(v)
else:
newdata[k] = [v]
return newdata
def setdefault_dict(dict_data):
newdata = {}
for k, v in dict_data:
newdata.setdefault(k, []).append(v)
return newdata
(key2, value4),
(key2, value5),)
print(usual_dict(dict_data))
print(setdefault_dict(dict_data))
if __name__ == __main__:
test_setdef()
import timeit
import random
for i in range(10000,1000001,20000):
t = timeit.Timer("random.randrange(%d) in x"%i, "from __main__
import random,x")
x = list(range(i))
lst_time = t.timeit(number=1000)
x = {j:None for j in range(i)}
d_time = t.timeit(number=1000)
print("%d,%10.3f,%10.3f" % (i, lst_time, d_time))
So we can see the linear tile for lists, and constant for dict!
can be reduced to
functions = dict(a=add_to_dict, e=edit_dict,...)
functions[actions](db)
Default Dictionaries
Default dictionaries are an additional unordered mapping type provide
by Pythons collections.defaultdict. They have all the operators and
methods that a built-in dictionary provide, but they also gracefully handle
missing keys:
[general_examples/dicts/defaultdict_example.py]
3.3. PYTHONS COLLECTION DATA TYPES 55
def defaultdict_example():
show some examples for defaultdicts
pairs = {(a, 1), (b,2), (c,3)}
d1 = {}
for key, value in pairs:
if key not in d1:
d1[key] = []
d1[key].append(value)
print(d1)
d2 = defaultdict(list)
for key, value in pairs:
d2[key].append(value)
print(d2)
if __name__ == __main__:
defaultdict_example()
Ordered Dictionaries
Ordered dictionaries are an ordered mapping type provided by Pythons
collections.OrderedDict. They have all the methods and properties of a
built-in dict, but in addition they store items in the insertion order:
[general_examples/dicts/OrderedDict_example.py]
d2 = OrderedDict(pairs)
for key in d2:
56 CHAPTER 3. COLLECTION DATA STRUCTURES
print(key, d2[key])
if __name__ == __main__:
OrderedDict_example()
"""
a [1]
c [3]
b [2]
a 1
b 2
c 3
"""
Counter Dictionaries
A specialised Counter type (subclass for counting hashable objects) is pro-
vided by Pythons collections.Counter:
[general_examples/dicts/Counter_example.py]
def Counter_example():
show some examples for Counter
it is a dictionary that maps the items to the number of
occurrences
seq1 = [1, 2, 3, 5, 1, 2, 5, 5, 2, 5, 1, 4]
seq_counts = Counter(seq1)
print(seq_counts)
seq3 = [1, 4, 3]
for key in seq3:
seq_counts[key] += 1
print(seq_counts)
if __name__ == __main__:
Counter_example()
"""
Counter({5: 4, 1: 3, 2: 3, 3: 1, 4: 1})
Counter({1: 4, 2: 4, 5: 4, 3: 2, 4: 1})
Counter({1: 5, 2: 4, 5: 4, 3: 3, 4: 2})
Counter({1: 1, 3: 1, 4: 1})
Counter({1: 6, 2: 4, 3: 4, 5: 4, 4: 3})
Counter({1: 4, 2: 4, 5: 4, 3: 2, 4: 1})
"""
58 CHAPTER 3. COLLECTION DATA STRUCTURES
if __name__ == __main__:
test_find_top_N_recurring_words()
import collections
import string
import sys
def count_unique_word():
words = collections.defaultdict(int)
strip = string.whitespace + string.punctuation + string.digits +
"\""
3.4. ADDITIONAL EXERCISES 59
Anagrams
The following program finds whether two words are anagrams. Since sets
do not count occurrence, and sorting a list is O(n log n), hash tables can
be the best solution in this case. The procedure we use is: we scan the
first string and add all the character occurrences. Then we scan the second
string, decreasing all the character occurrences. In the end, if all the entries
are zero, the string is an anagram:
[general_problems/dicts/verify_two_strings_are_anagrams.py]
for i in str1:
ana_table[i] += 1
for i in str2:
ana_table[i] -= 1
def test_verify_two_strings_are_anagrams():
str1 = marina
str2 = aniram
assert(verify_two_strings_are_anagrams(str1, str2) == True)
str1 = google
str2 = gouglo
assert(verify_two_strings_are_anagrams(str1, str2) == False)
print(Tests passed!)
60 CHAPTER 3. COLLECTION DATA STRUCTURES
if __name__ == __main__:
test_verify_two_strings_are_anagrams()
Another way to find whether two words are anagrams is using the hashing
functions proprieties, where every different amount of characters should
give a different result. In the following program, ord() returns an integer
representing the Unicode code point of the character when the argument is
a unicode object, or the value of the byte when the argument is an 8-bit
string:
[general_problems/dicts/find_anagram_hash_function.py]
if __name__ == __main__:
test_find_anagram_hash_function()
3.4. ADDITIONAL EXERCISES 61
Sums of Paths
The following program uses two different dictionary containers to determine
the number of ways two dices can sum to a certain value:
[general_problems/dicts/find_dice_probabilities.py]
cdict = Counter()
ddict = defaultdict(list)
if __name__ == __main__:
test_find_dice_probabilities()
Finding Duplicates
The program below uses dictionaries to find and delete all the duplicate
characters in a string:
[general_problems/dicts/delete_duplicate_char_str.py]
import string
def delete_unique_word(str1):
table_c = { key : 0 for key in string.ascii_lowercase}
62 CHAPTER 3. COLLECTION DATA STRUCTURES
for i in str1:
table_c[i] += 1
for key, value in table_c.items():
if value > 1:
str1 = str1.replace(key, "")
return str1
def test_delete_unique_word():
str1 = "google"
assert(delete_unique_word(str1) == le)
print(Tests passed!)
if __name__ == __main__:
test_delete_unique_word()
Chapter 4
Activation Records
1. the actual parameters of the method are pushed onto the stack,
63
64 CHAPTER 4. PYTHONS STRUCTURE AND MODULES
Whenever you create a module, remember that mutable objects should not
be used as default values in the function or method definition:
[Good]
def foo(a, b=None):
if b is None:
b = []
[Bad]
def foo(a, b=[]):
In the simplest case, it can just be an empty file, but it can also execute
initialization code for the package or set the all variable: init .py
to:
__all__ = ["file1", ...]
means importing every object in the module, except those whose names
begin with , or if the module has a global all variable, the list in it.
4.1. MODULES IN PYTHON 65
will not be executed. In the other hand, if we run the .py file directly,
Python sets name to main , and every instruction following the above
statement will be executed.
The variables sys.ps1 and sys.ps2 define the strings used as primary
and secondary prompts. The variable sys.argv allows us to use the argu-
ments passed in the command line inside our programs:
import sys
def main():
print command line arguments
for arg in sys.argv[1:]:
print arg
if __name__ == "__main__":
main()
The built-in method dir() is used to find which names a module defines
(all types of names: variables, modules, functions). It returns a sorted list
of strings:
>>> import sys
>>> dir(sys)
[ __name__ , argv , builtin_module_names , copyright ,
exit , maxint , modules , path , ps1 ,
ps2 , setprofile , settrace , stderr , stdin ,
stdout , version ]
It does not list the names of built-in functions and variables. Therefore,
we can see that dir() is useful to find all the methods or attributes of an
object.
>>> elif x == 0:
... print "Zero"
>>> elif x == 1:
... print "Single"
>>> else:
... print "More"
for
The for statement in Python differs from C or Pascal. Rather than always
iterating over an arithmetic progression of numbers (like in Pascal), or giving
the user the ability to define both the iteration step and halting condition
(as C), Pythons for statement iterates over the items of any sequence (e.g.,
a list or a string), in the order that they appear in the sequence:
>>> a = ["buffy", "willow", "xander", "giles"]
>>> for i in range(len(a)):
... print(a[i])
buffy
willow
xander
giles
The Google Python Style guide sets the following rules for using implicit
False in Python:
? For sequences (strings, lists, tuples), use the fact that empty sequences
are False, so if not seq: or if seq: is preferable to if len(seq):
or if not len(seq):.
? When handling integers, implicit False may involve more risk than
benefit, such as accidentally handling None as 0:
[Good]
if not users: print no users
if foo == 0: self.handle_zero()
if i % 10 == 0: self.handle_multiple_of_ten()
[Bad]
if len(users) == 0: print no users
if foo is not None and not foo: self.handle_zero()
if not i % 10: self.handle_multiple_of_ten()
Generators are very robust and efficient and they should considered every
time you deal with a function that returns a sequence or creates a loop. For
example, the following program implements a Fibonacci sequence using the
iterator paradigm:
def fib_generator():
a, b = 0, 1
while True:
yield b
a, b = b, a+b
if __name__ == __main__:
fib = fib_generator()
print(next(fib))
print(next(fib))
print(next(fib))
print(next(fib))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> range(4, 10)
[4, 5, 6, 7, 8, 9]
>>> range(0, 10, 3)
[0, 3, 6, 9]
[general_problems/modules/grep_word_from_files.py]
import sys
def grep_word_from_files():
word = sys.argv[1]
for filename in sys.argv[2:]:
with open(filename) as file:
for lino, line in enumerate(file, start=1):
if word in line:
print("{0}:{1}:{2:.40}".format(filename, lino,
line.rstrip()))
if __name__ == __main__:
if len(sys.argv) < 2:
print("Usage: grep_word_from_files.py word infile1
[infile2...]")
sys.exit()
else:
grep_word_from_files()
import os
import sys
def read_data(filename):
lines = []
fh = None
try:
fh = open(filename)
for line in fh:
if line.strip():
lines.append(line)
except (IOError, OSError) as err:
print(err)
finally:
if fh is not None:
fh.close()
return lines
def remove_blank_lines():
if len(sys.argv) < 2:
print ("Usage: noblank.py infile1 [infile2...]")
if __name__ == __main__:
remove_blank_lines()
? w for writing (an existing file with the same name will be erased),
of the file will be read and returned. If the end of the file has been reached,
read() will return an empty string:
>>> f.read()
This is the entire file.\n
>>> f.read()
[general_problems/files/change_ext_file.py]
import os
import sys
import shutil
def change_file_ext():
if len(sys.argv) < 2:
print("Usage: change_ext.py filename.old_ext new_ext")
sys.exit()
try:
shutil.copyfile(sys.argv[1], name)
except OSError as err:
print (err)
if __name__ == __main__:
change_file_ext()
[general_problems/files/export_pickle.py]
import pickle
finally:
if fh is not None:
fh.close()
def test_export_pickle():
mydict = {a: 1, b: 2, c: 3}
export_pickle(mydict)
if __name__ == __main__:
test_export_pickle()
import pickle
def import_pickle(filename):
fh = None
try:
78 CHAPTER 4. PYTHONS STRUCTURE AND MODULES
fh = open(filename, "rb")
mydict2 = pickle.load(fh)
return mydict2
finally:
if fh is not None:
fh.close()
def test_import_pickle():
pkl_file = test.dat
mydict = import_pickle(pkl_file)
print(mydict)
if __name__ == __main__:
test_import_pickle()
The queue.queue class can handle all the locking internally: we can rely
on it to serialize accesses, meaning that only one thread at time has access
to the data (FIFO). The program will not terminate while it has any threads
running.
It might create a problem since once the worker threads have done their
work, they are finished but they are technically still running. The solu-
tion is to transform threads into daemons. In this case, the program will
terminate as soon as there is no daemon threads running. The method
queue.queue.join() blocks the end until the queue is empty.
Handling Exceptions
When an exception is raised and not handled, Python outputs a traceback
along with the exceptions error message. A traceback (sometimes called a
backtrace) is a list of all the calls made from the point where the unhandled
exception occurred back to the top of the call stack.
We can handle predictable exceptions by using the try-except-finally
paradigm:
try:
try_suite
except exception1 as variable1:
exception_suite1
...
except exceptionN as variableN:
exception_suiteN
If the statements in the try blocks suite are all executed without raising
an exception, the except blocks are skipped. If an exception is raised inside
the try block, control is immediately passed to the suite corresponding to
the first matching exception. This means that any statements in the suite
that follow the one that caused the exception will not be executed:
while 1:
try:
82 CHAPTER 4. PYTHONS STRUCTURE AND MODULES
class Error(Exception):
pass
pdb:
The debugger pdb can be used in the command line:
>>> python3 -m pdb program.py
import pdb
pdb.set_trace()
To perform the inspection, type: s for step, p for print, and c for con-
tinue.
Profiling
If a program runs very slowly or consumes far more memory than we expect,
the problem is most often due to our choice of algorithms or data structures
or due to some inefficient implementation. Some performance verification is
useful though:
[Bad]
employee_table = <table>
for last_name, first_name in employee_list:
employee_table += <tr><td>%s, %s</td></tr> %
(last_name, first_name)
employee_table += </table>
import cProfile
cProfile.run(main())
import time
def sumOfN2(n):
a simple example of how to time a function
start = time.time()
theSum = 0
for i in range(1,n+1):
theSum = theSum + i
end = time.time()
return theSum,end-start
if __name__ == __main__:
n = 5
print("Sum is %d and required %10.7f seconds"%sumOfN2(n))
n = 200
print("Sum is %d and required %10.7f seconds"%sumOfN2(n))
86 CHAPTER 4. PYTHONS STRUCTURE AND MODULES
doctest
Use it when writing the tests inside the modules and functions docstrings.
Then just add three line in the end:
if __name__ = "__main__"
import doctest
doctest.testmod()
suite = unittest.testsuite()
suite.addtest(doctest.doctestsuite(module_to_be_tested)
runner = unittest.testtestrunner()
print(runner.run(suite))
Test Nomenclature
Test fixtures The code necessary to set up a test (for example, creating
an input file for testing and deleting afterwards).
Object-Oriented Design
However, many things are missing here. First, there are no guarantees
that anyone who uses our circle data is not going to type an invalid input
value, such as a negative number for the radius. Second, how could we also
associate to our circle some operations that are proper from it, such as its
area or perimeter?
For the first problem, we can see that the inability to validate when cre-
ating an object is a really bad aspect of taking a purely procedural approach
in programming. Even if we decide to include many exceptions handling
the invalid inputs for our circles, we still would have a data container that
is not intrinsically made and validated for its real purpose. Imagine now if
we had chosen a list instead of the named tuple, how would we handle the
fact that lists have sorting properties?
It is clear from the example above that we need to find a way to create
an object that has only the proprieties that we expect it to have. In other
words, we want to find a way to package data and restrict its methods. That
is what object-oriented programming allows you to do: to create your own
89
90 CHAPTER 5. OBJECT-ORIENTED DESIGN
Class Instantiation
Class instantiation uses function notation to create objects in a known ini-
tial state. The instantiation operation creates an empty object which has
individuality. However, multiple names (in multiple scopes) can be bound
to the same object (also know as aliasing). In Python, when an object is
created, first the special method new () is called (the constructor) and
then init () initializes it.
Attributes
Objects have the attributes from their Classes, which are methods and data.
Method attributes are functions whose first argument is the instance on
which it is called to operate (which in Python is conventionally called self).
Attributes are any name following a dot. References to names in modules
are attribute references: in the expression modname.funcname, modname is
a module object and funcname is one of its attribute. Attributes may be
read-only or writeable. Writeable attributes may be deleted with the del
statement.
Namespaces
A namespace is a mapping from names to objects. Most namespaces are
currently implemented as Python dictionaries. Examples of namespaces
5.2. PRINCIPLES OF OOP 91
are: the set of built-in names, the global names in a module, and the local
names in a function invocation. The statements executed by the top-level
invocation of the interpreter, either reading from a script file or interactively,
are considered part of a module called main , so they have their own global
namespace.
Scope
A scope is a textual region of a Python program where a namespace is
directly accessible. Although scopes are determined statically, they are used
dynamically. Scopes are determined textually: the global scope of a function
defined in a module is that modules namespace. When a class definition is
entered, a new namespace is created, and used as the local scope.
Polymorphism
Polymorphism (or dynamic method binding) is the principle where methods
can be redefined inside subclasses. In other words, if we have an object
of a subclass and we call a method that is also defined in the superclass,
Python will use the method defined in the subclass. If, for instance, we need
to recover the superclasss method, we can easily call it using the built-in
super().
For example, all instances of a custom class are hashable by default
in Python. This means that the hash() attribute can be called, allowing
92 CHAPTER 5. OBJECT-ORIENTED DESIGN
Aggregation
Aggregation (or composition) defines the process where a class includes one
of more instance variables that are from other classes. It is a has-a relation-
ship. In Python, every class uses inheritance (they are all custom classes
from the object base class), and most use aggregation since most classes
have instance variables of various types.
import math
class Point:
def __init__(self, x = 0, y = 0):
self.x = x # data attribute
self.y = y
def __repr__(self):
return "point ({0.x!r}, {0.y!r})".format(self)
def __str__(self):
return "({0.x!r}, {0.y!r})".format(self)
1
containers, which is a generic data structure that permits storage and retrieval of
data items independent of content.
5.2. PRINCIPLES OF OOP 93
class Circle(Point):
def edge_distance_from_origin(self):
return abs(self.distance_from_origin() - self.radius)
def area(self):
return math.pi*(self.radius**2)
def circumference(self):
return 2*math.pi*self.radius
def __repr__(self):
return "circle ({0.radius!r}, {0.x!r})".format(self)
def __str__(self):
return repr(self)
0.7639320225002102
Decorator Pattern
Decorators (also know as the @ notation) are a tool to elegantly specify some
transformation on functions and methods. The decorator pattern allows us
to wrap an object that provides core functionality with other objects that
alter that functionality. For example, the snippet bellow was copied from
the Google Python Style guide:
class C(object):
def method(self):
method = my_decorator(method)
can be written as
class C(object):
@my_decorator
def method(self):
import random
def benchmark(func):
import time
def wrapper(*args, **kwargs):
t = time.clock()
res = func(*args, **kwargs)
print("\t%s" % func.__name__, time.clock()-t)
return res
return wrapper
5.3. PYTHON DESIGN PATTERNS 95
@benchmark
def random_tree(n):
temp = [n for n in range(n)]
for i in range(n+1):
temp[random.choice(temp)] = random.choice(temp)
return temp
if __name__ == __main__:
random_tree(10000)
"""
python3 do_benchmark.py
random_tree 0.04999999999999999
"""
Observer Pattern
The observer pattern is useful when we want to have a core object that
maintains certain values, and then having some observers to create serialized
copies of that object. This can be implemented by using the @properties
decorator, placed before our functions (before def). This will control at-
tribute access, for example, to make an attribute to be read-only. Properties
are used for accessing or setting data instead of simple accessors or setters:
@property
def radius(self):
return self.__radius
Singleton Pattern
A class follows the singleton pattern if it allows exactly one instance of a
certain object to exist. Since Python does not have private constructors,
we use the new class method to ensure that only one instance is ever
created. When we override it, we first check whether our singleton instance
was created. If not, we create it using a super class call:
>>> class SinEx:
96 CHAPTER 5. OBJECT-ORIENTED DESIGN
>>> x = SinEx()
>>> x
<__main__.SinEx object at 0xb72d680c>
>>> y = SinEx()
>>> x == y
True
>>> y
<__main__.SinEx object at 0xb72d680c>
The two objects are equal and are in the same address, so they are the
same object.
class HashTable:
def __init__(self):
self.size = 11
self.slots = [None] * self.size
self.data = [None] * self.size
def put(self,key,data):
hashvalue = self.hashfunction(key,len(self.slots))
if self.slots[hashvalue] == None:
self.slots[hashvalue] = key
self.data[hashvalue] = data
else:
if self.slots[hashvalue] == key:
self.data[hashvalue] = data
else:
5.4. ADDITIONAL EXERCISES 97
nextslot = self.rehash(hashvalue,len(self.slots))
while self.slots[nextslot] != None and \
self.slots[nextslot] != key:
nextslot = self.rehash(nextslot,len(self.slots))
if self.slots[nextslot] == None:
self.slots[nextslot]=key
self.data[nextslot]=data
else:
self.data[nextslot] = data
def hashfunction(self,key,size):
return key%size
def rehash(self,oldhash,size):
return (oldhash+1)%size
def get(self,key):
startslot = self.hashfunction(key,len(self.slots))
data = None
stop = False
found = False
position = startslot
while self.slots[position] != None and \
not found and not stop:
if self.slots[position] == key:
found = True
data = self.data[position]
else:
position=self.rehash(position,len(self.slots))
if position == startslot:
stop = True
return data
def __getitem__(self,key):
return self.get(key)
def __setitem__(self,key,data):
self.put(key,data)
H = HashTable()
H[54]="buffy"
H[26]="xander"
H[17]="giles"
print(H.slots)
print(H.data)
s = Tests in {name} have {con}!
print(s.format(name=module_name, con=passed))
if __name__ == __main__:
test_HashTable()
Part II
99
Chapter 6
6.1 Stacks
A stack is a linear data structure that can be accessed only at one of its
ends (which we will refers as the top) for either storing or retrieving. In
other words, array access of elements in a stack is restricted and they are
an example of a last-in-first-out (LIFO) structure. You can think of a stack
as a huge pile of books on your desk. Stacks need to have the following
operations running at O(1):
101
102 CHAPTER 6. ADDITIONAL ABSTRACT DATA STRUCTURES
Stacks in Python can be implemented with lists and the methods append()
and pop() (without an explicit index):
[adt/stacks/stack.py]
class Stack(list):
define the stack class
def __init__(self):
self.items = []
def isEmpty(self):
return self.items == []
def pop(self):
if not self.isEmpty():
return self.items.pop()
else:
raise Exception(Stack is empty!)
def peek(self):
return self.items[-1]
def size(self):
return len(self.items)
def main():
stack = Stack()
stack.push(1)
stack.push(2)
stack.push(3)
print(stack.size())
print(stack.peek())
print(stack.pop())
print(stack.peek())
if __name__ == __main__:
main()
6.1. STACKS 103
class Node(object):
def __init__(self, value=None):
self.value = value
self.next = None
class StackwithNodes(object):
Define a Stack with nodes
def __init__(self):
self.top = None
def isEmpty(self):
return bool(self.top)
def pop(self):
node = self.top
if node:
self.top = node.next
return node.value
else:
raise Exception(Stack is empty.)
def size(self):
node = self.top
if node not None: num_nodes = 1
else: return 0
1
We will use similar a Node Class in many examples in the rest of these notes.
104 CHAPTER 6. ADDITIONAL ABSTRACT DATA STRUCTURES
node = node.next
while node:
num_nodes += 1
node = node.next
return num_nodes
def peek(self):
return self.top.value
def main():
stack = StackwithNodes()
stack.push(1)
stack.push(2)
stack.push(3)
print(stack.size())
print(stack.peek())
print(stack.pop())
print(stack.peek())
if __name__ == __main__:
main()
6.2 Queues
A queue, differently of a stack, is a structure where the first enqueued ele-
ment (at the back) will be the first one to be dequeued (when it is at the
front), i.e., a queue is a first-in-first-out (FIFO) structure. You can think of
a queue as a line of people waiting for a roller-coaster ride. Array access of
elements in queues is also restricted and queues should have the following
operations running at O(1):
enqueue Insert an item at the back of the queue.
dequeue Remove an item from the front of the queue.
peek/front Retrieve an item at the front of the queue without removing
it.
empty/size Check whether the queue is empty or give its size.
6.2. QUEUES 105
class Queue(object):
a class for a queue
def __init__(self):
self.items = []
def isEmpty(self):
return self.items == []
def dequeue(self):
return self.items.pop()
def size(self):
return len(self.items)
def peek(self):
if not self.isEmpty():
return self.items[-1]
else:
raise Exception(Queue is empty.)
def size(self):
return len(self.items)
def main():
queue = Queue()
queue.enqueue(1)
queue.enqueue(2)
queue.enqueue(3)
print(queue.size())
print(queue.peek())
print(queue.dequeue())
print(queue.peek())
if __name__ == __main__:
main()
106 CHAPTER 6. ADDITIONAL ABSTRACT DATA STRUCTURES
However, we have learned that the method insert() for lists in Python
is very inefficient (remember, lists only work on O(1) when we append or
pop at/from their end, because otherwise all of the other elements would
have to be shifted in memory). We can be smarter than that and write an
efficient queue using two stacks (two lists) instead of one:
[adt/queues/queue_from_two_stacks.py]
class Queue(object):
an example of a queue implemented from 2 stacks
def __init__(self):
self.in_stack = []
self.out_stack = []
def dequeue(self):
if self.out_stack:
return self.out_stack.pop()
while self.in_stack:
self.out_stack.append(self.in_stack.pop())
if not self.out_stack:
raise Exception("Queue empty!")
return self.out_stack.pop()
def size(self):
return len(self.in_stack) + len(self.out_stack)
def peek(self):
if self.out_stack:
return self.out_stack[-1]
while self.in_stack:
self.out_stack.append(self.in_stack.pop())
if self.out_stack:
return self.out_stack[-1]
else:
return None
def main():
queue = Queue()
queue.enqueue(1)
queue.enqueue(2)
queue.enqueue(3)
6.2. QUEUES 107
print(queue.size())
print(queue.peek())
print(queue.dequeue())
print(queue.peek())
if __name__ == __main__:
main()
class Node(object):
def __init__(self, value):
self.value = value
self.next = None
class LinkedQueue(object):
Queue acts as a container for nodes (objects) that are
inserted and removed according FIFO
def __init__(self):
self.front = None
self.back = None
def isEmpty(self):
return bool(self.front) and bool(self.back)
def dequeue(self):
if self.front:
value = self.front.value
self.front = self.front.next
return value
raise Exception(Queue is empty, cannot dequeue.)
def size(self):
node = self.front
if node:
num_nodes = 1
node = node.next
while node:
num_nodes += 1
node = node.next
return num_nodes
def peek(self):
return self.front.value
def main():
queue = LinkedQueue()
queue.enqueue(1)
queue.enqueue(2)
queue.enqueue(3)
print(queue.size())
print(queue.peek())
print(queue.dequeue())
print(queue.peek())
if __name__ == __main__:
main()
6.3 Deques
A deque is a double-ended queue, which can roughly be seen as an union of
a stack and a queue:
[adt/queues/dequeue.py]
class Deque(object):
a class for a double ended queue
def __init__(self):
self.items = []
def isEmpty(self):
6.3. DEQUES 109
return self.items == []
def removeFront(self):
return self.items.pop()
def removeRear(self):
return self.items.pop(0)
def size(self):
return len(self.items)
def __repr__(self):
return {}.format(self.items)
def main():
dq = Deque()
dq.addFront(1)
dq.addFront(2)
dq.addFront(3)
dq.addRear(40)
dq.addRear(50)
print(dq.size())
print(dq)
if __name__ == __main__:
main()
Note that we can also specify the size of our deque. For example, we
could have written q = deque(maxlen = 4) in the example above. Another
interesting method for deques is rotate(n), which rotated the deque n steps
to the right or, if n is negative, to the left.
Interestingly, deques in Python are based on a doubly linked list,2 not in
dynamic arrays. It means that operations such as inserting an item anywhere
are fast (O(1)), but arbitrary index accessing can be slow (O(n)).
Heaps
Conceptually, a heap is a binary tree where each node is smaller (larger) than
its children. We will learn about trees in the next chapters but we should
already keep in mind that when modifications are made, in a balanced tree,
we can repair its structure with O(logn) runtimes. Heaps are generally useful
for applications that repeatedly access the smallest (largest) element in the
list. Moreover min-(max-)heap will let you to find the smallest (largest)
element in O(1) and to extract/add/replace it in O(ln n).
2
Linked lists are another abstract data structure that we will learn about at the end
of this chapter. Doubly here means that their nodes have links to the next and to the
previous node.
6.4. PRIORITY QUEUES AND HEAPS 111
class Heapify(object):
def __init__(self, data=None):
self.data = data or []
for i in range(len(data)//2, -1, -1):
self.__max_heapify__(i)
def __repr__(self):
return {}.format(self.data)
def extract_max(self):
n = len(self.data)
max_element = self.data[0]
self.data[0] = self.data[n - 1]
self.data = self.data[:n - 1]
self.__max_heapify__(0)
return max_element
def test_Heapify():
l1 = [3, 2, 5, 1, 7, 8, 2]
h = Heapify(l1)
assert(h.extract_max() == 8)
print ("Tests Passed!")
if __name__ == __main__:
test_Heapify()
import heapq
class PriorityQueue(object):
implements a priority queue class
114 CHAPTER 6. ADDITIONAL ABSTRACT DATA STRUCTURES
def __init__(self):
self._queue = []
self._index = 0 # comparying same priority level
def pop(self):
return heapq.heappop(self._queue)[-1]
class Item:
def __init__(self, name):
self.name = name
def __repr__(self):
return "Item({!r})".format(self.name)
def test_PriorityQueue():
push and pop are all O(logN)
q = PriorityQueue()
q.push(Item(test1), 1)
q.push(Item(test2), 4)
q.push(Item(test3), 3)
assert(str(q.pop()) == "Item(test2)")
print(Tests passed!.center(20,*))
if __name__ == __main__:
test_PriorityQueue()
We can adapt this node class accept some get and set methods:
class Node(object):
def __init__(self, value):
self.value = value
self.next = None
def getData(self):
return self.value
def getNext(self):
return self.next
class Node(object):
def __init__(self, value = None, next = None):
self.value = value
self.next = next
class LinkList(object):
def __init__(self):
self.head = None
self.lenght = 0
def printList(self):
116 CHAPTER 6. ADDITIONAL ABSTRACT DATA STRUCTURES
node = self.head
while node:
print(node.value)
node = node.next
def main():
ll = LinkList()
print(ll.lenght)
ll.addNode(1)
ll.addNode(2)
ll.addNode(3)
print(ll.lenght)
ll.printList()
ll.deleteNode(4)
ll.printList()
print(ll.lenght)
if __name__ == __main__:
main()
class Node(object):
def __init__(self, value = None, next = None):
self.value = value
6.5. LINKED LISTS 117
self.next = next
class LinkList(object):
def __init__(self):
self.head = None
self.tail = None
self.length = 0
def printList(self):
node = self.head
while node:
print(node.value)
node = node.next
def removeDupl(self):
prev = None
node = self.head
aux_dict = Counter()
while node:
118 CHAPTER 6. ADDITIONAL ABSTRACT DATA STRUCTURES
value_here = node.value
if aux_dict[value_here] == 0:
aux_dict[value_here] = 1
else:
if prev == None:
self.head = node.next
else:
prev.next = node.next
self.length -= 1
prev = node
node = node.next
def removeDupl_no_buf(self):
node = self.head
while node:
pivot = node.value
pointer = node.next
prev = node
while pointer:
value_here = pointer.value
if value_here == pivot:
prev.next = pointer.next
self.length -= 1
prev = pointer
pointer = pointer.next
node = node.next
def main():
ll = LinkList()
for i in range(1, 10):
ll.addNode(i)
ll.addNode(i+1)
print(Linked List with duplicates:)
ll.printList()
print(Length before deleting duplicate is: , ll.length)
ll.removeDupl()
ll.printList()
print(Lenght after deleting duplicates is: , ll.length)
ll = LinkList()
for i in range(1, 10):
6.5. LINKED LISTS 119
ll.addNode(i)
ll.addNode(i+1)
print(Linked List with duplicates:)
ll.printList()
print(Length before deleting duplicate is: , ll.length)
ll.removeDupl_no_buf()
ll.printList()
print(Lenght after deleting duplicates is: , ll.length)
if __name__ == __main__:
main()
Linked lists have a dynamic size at runtime and they are good for when
you have an unknown number of items to store. Insertion is O(1) but dele-
tion and searching can be O(n) because locating an element in a linked list
is slow and is it done by a sequential search. Traversing backward or sort-
ing a linked list are even worse, being both O(n2 ). A good trick to obtain
deletion of a node i at O(1) is copying the data from i + 1 to i and then to
deleting the node i + 1.
120 CHAPTER 6. ADDITIONAL ABSTRACT DATA STRUCTURES
import sys
import stack
def reverse_string_with_stack(str1):
s = stack.Stack()
revStr =
for c in str1:
s.push(c)
while not s.isEmpty():
revStr += s.pop()
return revStr
def test_reverse_string_with_stack():
str1 = Buffy is a Slayer!
assert(reverse_string_with_stack(str1) == !reyalS a si yffuB)
print(Tests passed!)
if __name__ == __main__:
test_reverse_string_with_stack()
def balance_par_str_with_stack(symbolString):
s = Stack()
balanced = True
index = 0
while index < len(symbolString) and balanced:
symbol = symbolString[index]
if symbol == "(":
s.push(symbol)
else:
6.6. ADDITIONAL EXERCISES 121
if s.isEmpty():
balanced = False
else:
s.pop()
index = index + 1
if __name__ == __main__:
test_balance_par_str_with_stack()
def dec2bin_with_stack(decnum):
s = Stack()
str_aux =
while decnum > 0:
dig = decnum % 2
decnum = decnum//2
s.push(dig)
while not s.isEmpty():
str_aux += str(s.pop())
return str_aux
print(s.format(name=module_name, con=passed))
if __name__ == __main__:
test_dec2bin_with_stack()
The following example implements a stack that has O(1) minimum lookup:
[adt/stacks/stack_with_min.py]
class Stack(list):
def push(self, value):
if len(self) > 0:
last = self[-1]
minimum = self._find_minimum(value, last)
else:
minimum = value
self.minimum = minimum
self.append(NodeWithMin(value, minimum))
def min(self):
return self.minimum
class NodeWithMin(object):
def __init__(self, value, minimum):
self.value = value
self.minimum = minimum
def __repr__(self):
return str(self.value)
def min(self):
return self.minimum
def main():
stack = Stack()
stack.push(1)
stack.push(2)
stack.push(3)
node = stack.pop()
6.6. ADDITIONAL EXERCISES 123
print(node.minimum)
stack.push(0)
stack.push(4)
node = stack.pop()
print(node.min())
print(stack.min())
print(stack)
if __name__ == __main__:
main()
class SetOfStacks(list):
def __init__(self, capacity=4):
self.stacks = []
self.last_stack = []
self.capacity = capacity
self.stacks.append(self.last_stack)
def __repr__(self):
return str(self.stacks)
def pop(self):
last_stack = self.last_stack
value = last_stack.pop()
if len(last_stack) is 0:
self.stacks.pop()
self.last_stack = self.stacks[-1]
return value
124 CHAPTER 6. ADDITIONAL ABSTRACT DATA STRUCTURES
def main():
stack = SetOfStacks()
stack.push(1)
stack.push(2)
stack.push(3)
stack.push(4)
stack.push(5)
stack.push(6)
print(stack)
stack.pop()
stack.pop()
stack.pop()
print(stack)
if __name__ == __main__:
main()
Queues
The example bellow uses the concepts of a queue to rotate an array from
right to left for a given number n:3
[adt/queues/rotating_array.py]
if __name__ == __main__:
3
We could get the same effect using collections.deque with the method rotate(n).
6.6. ADDITIONAL EXERCISES 125
test_rotating_array()
Deques
A nice application for a double-ended queue is verifying whether a string is
a palindrome:
[adt/queues/palindrome_checker_with_deque.py]
import sys
import string
import collections
def palindrome_checker_with_deque(str1):
d = collections.deque()
eq = True
strip = string.whitespace + string.punctuation + "\""
for s in str1.lower():
if s not in strip: d.append(s)
while len(d) > 1 and eq:
first = d.pop()
last = d.popleft()
if first != last:
eq = False
return eq
def test_palindrome_checker_with_deque():
str1 = Madam Im Adam
str2 = Buffy is a Slayer
assert(palindrome_checker_with_deque(str1) == True)
assert(palindrome_checker_with_deque(str2) == False)
print(Tests passed!)
if __name__ == __main__:
test_palindrome_checker_with_deque()
[adt/heap/find_N_largest_smallest_items_seq.py]
import heapq
def find_smallest_items_seq_heap(seq):
find the smallest items in a sequence using heapify first
heap[0] is always the smallest item
heapq.heapify(seq)
return heapq.heappop(seq)
def find_smallest_items_seq(seq):
if it is only one item, min() is faster
return min(seq)
def test_find_N_largest_smallest_items_seq(module_name=this
module):
seq = [1, 3, 2, 8, 6, 10, 9]
N = 3
assert(find_N_largest_items_seq(seq, N) == [10, 9, 8])
assert(find_N_largest_items_seq_sorted(seq, N) == [8, 9, 10])
assert(find_N_smallest_items_seq(seq, N) == [1,2,3])
assert(find_N_smallest_items_seq_sorted(seq, N) == [1,2,3])
assert(find_smallest_items_seq(seq) == 1)
assert(find_smallest_items_seq_heap(seq) == 1)
if __name__ == __main__:
6.6. ADDITIONAL EXERCISES 127
test_find_N_largest_smallest_items_seq()
import heapq
if __name__ == __main__:
test_merge_sorted_seq()
Linked List
The following example implements a linked list class from stack methods:
[adt/linked_lists/linked_list_from_stack.py]
class Node(object):
def __init__(self,data=None,next=None):
self.data = data
self.next = next
def setnext(self,next):
self.next = next
4
Note that the result would not be sorted if we just added both lists.
128 CHAPTER 6. ADDITIONAL ABSTRACT DATA STRUCTURES
def __str__(self):
return "%s" % self.data
class LinkedListStack(object):
def __init__(self, max=0):
self.max = max
self.head = None
self.z = None
self.size = 0
def pop(self):
node = self.head.next
self.head = node
def isEmpty(self):
return self.size == 0
def __str__(self):
d = ""
if self.isEmpty(): return ""
else:
temp = self.head
d += "%s\n" % temp
while temp.next != None:
temp = temp.next
d += "%s\n" % temp
return d
def test_ll_from_stack():
ll = LinkedListStack(max = 20)
ll.push("1")
ll.push("2")
ll.push("3")
ll.push("4")
6.6. ADDITIONAL EXERCISES 129
print(ll)
ll.pop()
print(ll)
if __name__ == __main__:
test_ll_from_stack()
class OrderedList(object):
def __init__(self):
self.head = None
def add(self,item):
this method is different from linked list
current = self.head
previous = None
stop = False
while current != None and not stop:
if current.getData() > item:
stop = True
else:
previous = current
current = current.getNext()
temp = Node(item)
if previous == None:
temp.setNext(self.head)
self.head = temp
else:
temp.setNext(current)
previous.setNext(temp)
def length(self):
current = self.head
130 CHAPTER 6. ADDITIONAL ABSTRACT DATA STRUCTURES
count = 0
while current != None:
count = count + 1
current = current.getNext()
return count
def search(self,item):
this method is different from linked list
current = self.head
found = False
stop = False
while current != None and not found and not stop:
if current.getData() == item:
found = True
else:
if current.getData() > item:
stop = True
else:
current = current.getNext()
return found
def remove(self,item):
current = self.head
previous = None
found = False
while not found:
if current.getData() == item:
found = True
else:
previous = current
current = current.getNext()
if previous == None:
self.head = current.getNext()
else:
previous.setNext(current.getNext())
print(s.format(name=module_name, con=passed))
if __name__ == __main__:
test_OrderedList()
132 CHAPTER 6. ADDITIONAL ABSTRACT DATA STRUCTURES
Chapter 7
Asymptotic Analysis
133
134 CHAPTER 7. ASYMPTOTIC ANALYSIS
P
The complexity class of decision problems that can be solved on a determin-
istic Turing machine in polynomial time (in the worst case). If we can turn
a problem into a decision problem, the result would belong to P.
NP
The complexity class of decision problems that can be solved on a non-
deterministic Turing machine (NTM) in polynomial time. In other words,
it includes all decision problems whose yes instances can be solved in poly-
nomial time with the NTM. A problem is called complete if all problems
in the class are reduced to it. Therefore, the subclass called NP-complete
(NPC) contains the hardest problems in all of NP.
Any problem that is at least as hard (determined by polynomial-time
reduction) as any problem in NP, but that need not itself be in NP, is
called NP-hard. For example, finding the shortest route through a graph,
7.2. RECURSION 135
P=NP?
The class co-NP is the class of the complements of NP problems. For every
yes answer, we have the no, and vice versa. If NP is truly asymmetric,
then these two classes are different. Although there is overlap between them
because all of P lies in their intersection: both the yes and no instances in
P can be solved in polynomial time with an NTM.
What would happen if a NPC was found in a intersection of N and
co-NP? First, it would mean that all of NP would be inside co-NP, so we
would show NP = co-NP and the asymmetry would disappear. Second,
since all of P is in this intersection, P = NP. If P = NP, we could solve
any (decision) problem that had a practical (verifiable) solution.
However, it is (strongly) believed that NP and co-NP are different. For
instance, no polynomial solution to the problem of factoring numbers was
found, and this problem is in both NP and co-NP.
7.2 Recursion
The three laws of recursion are:
2. A recursive algorithm must change its state and move toward the base
case.
For every recursive call, the recursive function has to allocate memory
on the stack for arguments, return address, and local variables, costing time
to push and pop these data onto the stack. Recursive algorithms take at
least O(n) space where n is the depth of the recursive call.
Recursion is very costly when there are duplicated calculations and/or
there are overlap among subproblems. In some cases this can cause the stack
to overflow. For this reason, where subproblems overlap, iterative solutions
might be a better approach. For example, in the case of the Fibonacci
series, the iterative solution runs on O(n) while the recursive solution runs
on exponential runtime.
136 CHAPTER 7. ASYMPTOTIC ANALYSIS
Recursive Relations
To describe the running time of recursive functions, we use recursive rela-
tions:
T (n) = a T (g(n)) + f (n),
where a represents the number of recursive calls, g(n) is the size of each
subproblem to be solved recursively, and f (n) is any extra work done in the
function. The following table shows examples of recursive relations:
T (n) = T (n 1) + 1 O(n) Processing a sequence
T (n) = T (n 1) + n O(n2 ) Handshake problem
T (n) = 2T (n 1) + 1 O(2n ) Towers of Hanoi
T (n) = T (n/2) + 1 O(ln n) Binary search
T (n) = T (n/2) + n O(n) Randomized select
T (n) = 2T (n/2) + 1 O(n) Tree transversal
T (n) = 2T (n/2) + n O(n ln n) Sort by divide and conquer
where we have a recursive calls, each with a percentage 1/b of the dataset.
Summing to this, the algorithm does f (n) of work. To reach the problem of
T(1) = 1 in the final instance (leaf, as we will learn when we study trees),
the height is defined as h = lnb n, Fig. 7.2.
[general_poroblems/numbers/find_fibonacci_seq.py]
def find_fibonacci_seq_rec(n):
if n < 2: return n
return find_fibonacci_seq_rec(n - 1) +
find_fibonacci_seq_rec(n - 2)
T (n) = 2T (n 1) + 1.
T (n) = 22 T (n 2) + 2 2k T (n k) + k...
We need to make sure that the function have O(1) in the base case,
where it is T (1) = 1, this means that n k = 1 or k = n 1. So plugging
back into the equation, we have:
Sorting
139
140 CHAPTER 8. SORTING
[sorting/insertion_sort.py]
def insertion_sort(seq):
for i in range(1, len(seq)):
j = i
while j > 0 and seq[j-1] > seq[j]:
seq[j-1], seq[j] = seq[j], seq[j-1]
j -= 1
return seq
def test_insertion_sort():
seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2, 5, 4, 1, 5, 3]
assert(insertion_sort(seq) == sorted(seq))
assert(insertion_sort_rec(seq) == sorted(seq))
print(Tests passed!)
if __name__ == __main__:
test_insertion_sort()
Selection Sort
Selection sort is based on finding the smallest or largest element in a list
and exchanging it to the first, then finding the second, etc, until the end is
reached. Even when the list is sorted, it is O(n2 ) (and not stable):
[sorting/selection_sort.py]
def selection_sort(seq):
for i in range(len(seq) -1, 0, -1):
max_j = i
for j in range(max_j):
if seq[j] > seq[max_j]:
8.1. QUADRATIC SORT 141
max_j = j
seq[i], seq[max_j] = seq[max_j], seq[i]
return seq
def test_selection_sort():
seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2]
assert(selection_sort(seq) == sorted(seq))
print(Tests passed!)
if __name__ == __main__:
test_selection_sort()
Gnome Sort
Gnome sort works by moving forward to find a misplaced value and then
moving backward to place it in the right position:
[sorting/gnome_sort.py]
def gnome_sort(seq):
i = 0
while i < len(seq):
if i ==0 or seq[i-1] <= seq[i]:
i += 1
else:
seq[i], seq[i-1] = seq[i-1], seq[i]
i -= 1
return seq
def test_gnome_sort():
seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2, 5, 4, 1, 5, 3]
assert(gnome_sort(seq) == sorted(seq))
print(Tests passed!)
if __name__ == __main__:
test_gnome_sort()
142 CHAPTER 8. SORTING
Count sort sorts integers with a small value range, counting occurrences
and using the cumulative counts to directly place the numbers in the result,
updating the counts as it goes.
There is a loglinear limit on how fast you can sort if all you know about
your data is that they are greater or less than each other. However, if you
can also count events, sort becomes linear in time, O(n + k):
[sorting/count_sort.py]
def count_sort_dict(a):
b, c = [], defaultdict(list)
for x in a:
c[x].append(x)
for k in range(min(c), max(c) + 1):
b.extend(c[k])
return b
def test_count_sort():
seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2, 5, 4, 1, 5, 3]
assert(count_sort_dict(seq) == sorted(seq))
print(Tests passed!)
if __name__ == __main__:
test_count_sort()
If several values have the same key, they will have the original order with
respect with each other, so the algorithm is stable.
Merge Sort
Merge sort divides the list in half to create two unsorted lists. These two
unsorted lists are sorted and merged by continually calling the merge-sort
algorithm, until you get a list of size 1. The algorithm is stable, as well as
fast for large data sets. However, since it is not in-place, it requires much
more memory than many other algorithms. The space complexity is O(n)
for arrays and O(ln n) for linked lists2 . The best, average, and worst case
times are all O(n ln n).
Merge sort is a good choice when the data set is too large to fit into the
memory. The subsets can be written to disk in separate files until they are
small enough to be sorted in memory. The merging is easy, and involves
just reading single elements at a time from each file and writing them to the
final file in the correct order:
[sorting/merge_sort.py]
O(log(n))
def merge_sort(seq):
if len(seq) < 2 : return seq
mid = len(seq)//2
left, right = None, None
if seq[:mid]: left = merge_sort([:mid])
if seq[mid:]: right = merge_sort([mid:])
return merge_n(left,right)
#O(2n)
def merge_2n(left, right):
if not left or not right:
return left or right
result = []
1
Timsort is a hybrid sorting algorithm, derived from merge sort and insertion sort, and
invented by Tim Peters for Python.
2
Never ever consider to sort a linked list tough, it is problem the worst option you have
in terms of runtime complexity.
144 CHAPTER 8. SORTING
#O(n)
def merge_n(left,right):
if not left or not right:
return left or right
result = []
i,j = 0,0
while i < len(left) and j < len(right):
if left[i] <= right[i]:
result.append(left[i])
i+=1
else :
result.append(right[j])
j+=1
if i < len(left) - 1 : result+=left[i:]
elif j < len(right) - 1 : result += right[j:]
return result
def test_merge_sort():
seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2]
assert(merge_sort(seq) == sorted(seq))
print(Tests passed!)
if __name__ == __main__:
test_merge_sort()
Quick Sort
Quick sort works by choosing a pivot and partitioning the array so that the
elements that are smaller than the pivot goes to the left. Then, it recursively
sorts the left and right parts.
The choice of the pivot value is a key to the performance. It can be
shown that always choosing the value in the middle of the set is the best
choice for already-sorted data and no worse than most other choices for
random unsorted data.
8.3. LOGLINEAR SORT 145
The worst case is O(n2 ) in the rare cases when partitioning keeps pro-
ducing a region of n 1 elements (when the pivot is the minimum value).
The best case produces two n/2-sized lists. This and the average case are
both O(n ln n). The algorithm is not stable.
[sorting/quick_sort.py]
def quick_sort(seq):
if len(seq) < 2 : return seq
mid = len(seq)//2
pi = seq[mid]
seq = seq[:mid] + seq[mid+1:]
lo = [x for x in seq if x <= pi]
hi = [x for x in seq if x > pi]
return quick_sort(lo) + [pi] + quick_sort(hi)
def test_quick_sort():
seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2]
assert(quick_sort(seq) == sorted(seq))
assert(quick_sort_divided(seq) == sorted(seq))
print(Tests passed!)
if __name__ == __main__:
test_quick_sort()
Heap Sort
Heap sort is similar to a selection sort, except that the unsorted region is a
heap, so finding the largest element n times gives a loglinear runtime.
In a heap, for every node other than the root, the value of the node is at
least (at most) the value of its parent. Thus, the smallest (largest) element is
stored at the root and the subtrees rooted at a node contain larger (smaller)
values than does the node itself.
Although the insertion is only O(1), the performance of validating (the
heap order) is O(ln n). Searching (traversing) is O(n). In Python, a heap
sort can be implemented by pushing all values onto a heap and then popping
off the smallest values one at a time:
[sorting/heap_sort1.py]
import heapq
146 CHAPTER 8. SORTING
def heap_sort1(seq):
heap sort with Pythons heapq
h = []
for value in seq:
heapq.heappush(h, value)
return [heapq.heappop(h) for i in range(len(h))]
def test_heap_sort1():
seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2]
assert(heap_sort1(seq) == sorted(seq))
print(Tests passed!)
if __name__ == __main__:
test_heap_sort1()
If we decide to use the heap class that we have from the last chapters,
we can write a heap sort simply by:
[sorting/heap_sort2.py]
def heap_sort2(seq):
heap = Heap(seq)
res = []
for i in range(len(seq)):
res.insert(0, heap.extract_max())
return res
def test_heap_sort2():
seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2]
assert(heap_sort2(seq) == sorted(seq))
print(Tests passed!)
if __name__ == __main__:
test_heap_sort2()
def heap_sort3(seq):
for start in range((len(seq)-2)//2, -1, -1):
siftdown(seq, start, len(seq)-1)
for end in range(len(seq)-1, 0, -1):
seq[end], seq[0] = seq[0], seq[end]
siftdown(seq, 0, end - 1)
return seq
def test_heap_sort():
seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2]
assert(heap_sort3(seq) == sorted(seq))
print(Tests passed!)
if __name__ == __main__:
test_heap_sort3()
148 CHAPTER 8. SORTING
Quadratic Sort
The following program implements a bubble sort, a very inefficient sorting
algorithm:
[searching/bubble_sort.py]
def bubble_sort(seq):
size = len(seq) -1
for num in range(size, 0, -1):
for i in range(num):
if seq[i] > seq[i+1]:
temp = seq[i]
seq[i] = seq[i+1]
seq[i+1] = temp
return seq
if __name__ == __main__:
test_bubble_sort()
Linear Sort
The example bellow shows a simple count sort for people ages:
def counting_sort_age(A):
oldestAge = 100
timesOfAge = [0]*oldestAge
ageCountSet = set()
B = []
for i in A:
timesOfAge[i] += 1
ageCountSet.add(i)
for j in ageCountSet:
count = timesOfAge[j]
150 CHAPTER 8. SORTING
The example bellow uses quick sort to find the k largest elements in a
sequence:
[sorting/find_k_largest_seq_quicksort.py]
import random
def test_find_k_largest_seq_quickselect():
seq = [3, 10, 4, 5, 1, 8, 9, 11, 5]
k = 2
assert(find_k_largest_seq_quickselect(seq,k) == [10, 11])
if __name__ == __main__:
test_find_k_largest_seq_quickselect()
152 CHAPTER 8. SORTING
Chapter 9
Searching
The most common searching algorithms are the sequential search and the
binary search. If an input array is not sorted, or the input elements are
accommodated by dynamic containers (such as linked lists), the search has
to be sequential. If the input is a sorted array, the binary search algorithm
is the best choice. If we are allowed to use auxiliary memory, a hash table
might help the search, with which a value can be located in O(1) time with
a key.
153
154 CHAPTER 9. SEARCHING
if __name__ == __main__:
test_sequential_search()
Now, if we sort the sequence first, we can improve the sequential search
in the case when the item is not present to have the same runtimes as when
the item is present:
[searching/ordered_sequential_search.py]
if __name__ == __main__:
test_ordered_sequential_search()
def test_binary_search():
seq = [1,2,5,6,7,10,12,12,14,15]
key = 6
assert(binary_search_iter(seq, key) == 3)
assert(binary_search_rec(seq, key) == 3)
print(Tests passed!)
if __name__ == __main__:
test_binary_search()
Note that the module returns the index after the key, which is where you
should place the new value. Other available functions are bisect right and
bisect left.
156 CHAPTER 9. SEARCHING
if __name__ == __main__:
test_find_elem_matrix_bool()
[searching/searching_in_a_matrix.py]
import numpy
def test_searching_in_a_matrix():
a = [[1,3,5],[7,9,11],[13,15,17]]
b = numpy.array([(1,2),(3,4)])
assert(searching_in_a_matrix(a, 13) == True)
assert(searching_in_a_matrix(a, 14) == False)
assert(searching_in_a_matrix(b, 3) == True)
assert(searching_in_a_matrix(b, 5) == False)
print(Tests passed!)
if __name__ == __main__:
test_searching_in_a_matrix()
Unimodal Arrays
An array is unimodal if it consists of an increasing sequence followed by
a decreasing sequence. The example below shows how to find the locally
maximum of an array using binary search:
[searching/find_max_unimodal_array.py]
def find_max_unimodal_array(A):
if len(A) <= 2 : return None
left = 0
158 CHAPTER 9. SEARCHING
right = len(A)-1
while right > left +1:
mid = (left + right)//2
if A[mid] > A[mid-1] and A[mid] > A[mid+1]:
return A[mid]
elif A[mid] > A[mid-1] and A[mid] < A[mid+1]:
left = mid
else:
right = mid
return None
def test_find_max_unimodal_array():
seq = [1, 2, 5, 6, 7, 10, 12, 9, 8, 7, 6]
assert(find_max_unimodal_array(seq) == 12)
print(Tests passed!)
if __name__ == __main__:
test_find_max_unimodal_array()
def test_ind_sqrt_bin_search():
number = 9
assert(find_sqrt_bin_search(number) == 3)
9.3. ADDITIONAL EXERCISES 159
print(Tests passed!)
if __name__ == __main__:
test_ind_sqrt_bin_search()
def test_find_time_occurrence_list():
seq = [1,2,2,2,2,2,2,5,6,6,7,8,9]
k = 2
assert(find_time_occurrence_list(seq, k) == 6)
print(Tests passed!)
if __name__ == __main__:
test_find_time_occurrence_list()
Intersection of Arrays
The snippet bellow shows three ways to perform the intersection of two
sorted arrays. The simplest way is to use sets, however this will not preserve
the ordering. The second example uses an adaptation of the merge sort. The
160 CHAPTER 9. SEARCHING
third example is suitable when one of the arrays is much larger than other.
In this case, binary search is the best option:
[searching/intersection_two_arrays.py]
assert(intersection_two_arrays_bs(seq1,seq2) == [3,5])
assert(intersection_two_arrays_ms(seq1,seq2) == [3,5])
s = Tests in {name} have {con}!
print(s.format(name=module_name, con=passed))
if __name__ == __main__:
test_intersection_two_arrays()
162 CHAPTER 9. SEARCHING
Chapter 10
Dynamic Programming
10.1 Memoization
Dynamically Solving the Fibonacci Series
High-level languages such as Python can implement the recursive formula-
tion directly, caching return values. Memoization is a method where if a call
is made more than once with the same arguments, and the result is returned
directly from the cache.
For example, we can dynamically solve the exponential Fibonacci series
by using a memo function designed as an algorithm that uses nested scopes
to give the wrapped function memory:
[dynamic_programming/memo.py]
163
164 CHAPTER 10. DYNAMIC PROGRAMMING
def memo(func):
cache = {}
@wraps(func)
def wrap(*args):
if args not in cache:
cache[args] = func(*args)
return cache[args]
return wrap
def naive_longest_inc_subseq(seq):
exponential solution to the longest increasing subsequence
problem
for length in range(len(seq), 0, -1):
for sub in combinations(seq, length):
if list(sub) == sorted(sub):
return len(sub)
def longest_inc_subseq1(seq):
iterative solution for the longest increasing subsequence
problem
end = []
for val in seq:
idx = bisect(end, val)
if idx == len(end): end.append(val)
else: end[idx] = val
return len(end)
def longest_inc_subseq2(seq):
another iterative algorithm for the longest increasing
subsequence problem
1
See other versions of this problem in the end of the chapter about lists in Python.
166 CHAPTER 10. DYNAMIC PROGRAMMING
L = [1] * len(seq)
for cur, val in enumerate(seq):
for pre in range(cur):
if seq[pre] <= val:
L[cur] = max(L[cur], 1 + L[pre])
return max(L)
def memoized_longest_inc_subseq(seq):
memoized recursive solution to the longest increasing
subsequence problem
@memo
def L(cur):
res = 1
for pre in range(cur):
if seq[pre] <= seq[cur]:
res = max(res, 1 + L(pre))
return res
return max(L(i) for i in range(len(seq)))
@benchmark
def test_naive_longest_inc_subseq():
print(naive_longest_inc_subseq(s1))
benchmark
def test_longest_inc_subseq1():
print(longest_inc_subseq1(s1))
@benchmark
def test_longest_inc_subseq2():
print(longest_inc_subseq2(s1))
@benchmark
def test_memoized_longest_inc_subseq():
print(memoized_longest_inc_subseq(s1))
if __name__ == __main__:
from random import randrange
s1 = [randrange(100) for i in range(40)]
print(s1)
test_naive_longest_inc_subseq()
test_longest_inc_subseq1()
test_longest_inc_subseq2()
10.2. ADDITIONAL EXERCISES 167
test_memoized_longest_inc_subseq()
168 CHAPTER 10. DYNAMIC PROGRAMMING
Part III
169
Chapter 11
Introduction to Graphs
Direction of a Graph
If a graph has no direction, it is referred as undirected. In this case, nodes
with an edge between them are adjacent and adjacent nodes are neighbors.
If the edges have a direction, the graph is directed (digraph). In this
case, the graph has leaves. The edges are no longer unordered: an edge
between nodes u and v is now either an edge (u, v) from u to v, or an edge
(v, u) from v to u. We can say that in a digraph G, the function E(G) is a
relation over V (G).
Subgraphs
A subgraph of G consists of a subset of V and E. A spanning subgraph
contains all the nodes of the original graph.
Completeness of a Graph
If all the nodes in a graph are pairwise adjacent, the graph is called complete.
171
172 CHAPTER 11. INTRODUCTION TO GRAPHS
Degree in a Node
Length of a Path
The length of a path or walk is the value given by its edge count.
Weight of an Edge
Associating weights with each edge in G gives us a weighted graph. The
weight of a path or cycle is the sum of its edge weights. So, for unweighted
graphs, it is simply the number of edges.
Planar Graphs
A graph that can be drawn on the plane without crossing edges is called
planar. This graph has regions, which are areas bounded by the edges.The
Eulers formula for connected planar graphs says that V E + F = 2, where
V, E, F are the number of nodes, edges, and regions, respectively.
Graph Traversal
Adjacent Lists
For each node in an adjacent list, we have access to a list (or set or container
or iterable) of its neighbor. Supposing we have n nodes, each adjacent (or
neighbor) list is just a list of such numbers. We place the lists into a main
list of size n, indexable by the node numbers, where the order is usually
arbitrary.
Deleting objects from the middle of a Python list is O(n), but deleting
from the end is only O(1). If the order of neighbors is not important, you
can delete an arbitrary neighbor in O(1) time by swapping it in to the last
item in the list and then calling pop().
Adjacent Matrices
In adjacent matrices, instead of listing all the neighbors for each node, we
have one row with one position for each possible neighbor, filled with True
and False values. The simplest implementation of adjacent matrices is given
by nested lists. Note that the diagonal is always False:
>>> a,b,c,d,e,f = range(6) # nodes
>>> N = [[0,1,1,1,0,1], [1,0,0,1,0,1], [1,1,0,1,1,0], [1,0,0,0,1,0],
[1,1,1,0,0,0], [0,1,1,1,1,0]]
>>> N[a][b] # membership
1
>>> N[a][e]
0
>>> sum(N[f]) # degree
4
Representing Trees
The simplest way of representing a tree is by a nested lists:
>>> T = [a, [b, [d, f]], [c, [e, g]]]
>>> T[0]
a
>>> T[1][0]
b
>>> T[1][1][0]
d
>>> T[1][1][1]
f
>>> T[2][0]
11.3. INTRODUCTION TO TREES 177
c
>>> T[2][1][1]
g
class SimpleTree(object):
def __init__(self, value=None, children=None):
self.children = children or []
self.value = value
def main():
"""
a
b
d
e
c
h
g
"""
st = SimpleTree(a, [SimpleTree(b, [SimpleTree(d),
SimpleTree(e)] ), SimpleTree(c, [SimpleTree(h),
SimpleTree(g)]) ])
print(st)
if __name__ == __main__:
main()
In the next chapter we will learn how to improve this class, including
many features and methods that a tree can hold. For now, it is useful to
178 CHAPTER 11. INTRODUCTION TO GRAPHS
keep in mind that when we are prototyping data structures such as trees, we
should always be able to come up with a flexible class to specify arbitrary
attributes in the constructor. The following program implements what is
referred to as a bunch class;, a generic tool that is a specialization of the
Pythons dict class and that let you create and set arbitrary attributes on
the fly:
[trees/simple_trees/bunchclass.py]
class BunchClass(dict):
def __init__(self, *args, **kwds):
super(BunchClass, self).__init__(*args, **kwds)
self.__dict__ = self
def main():
{right: {right: Xander, left: Willow}, left:
{right: Angel, left: Buffy}}
bc = BunchClass # notice the absence of ()
tree = bc(left = bc(left="Buffy", right="Angel"), right =
bc(left="Willow", right="Xander"))
print(tree)
if __name__ == __main__:
main()
In the example above, the functions arguments *args and **kwds can
hold an arbitrary number of arguments and an arbitrary number of keywords
arguments, respectively.
Chapter 12
Binary Trees
2m = n + m 1 m = n 1,
179
180 CHAPTER 12. BINARY TREES
Figure 12.1: The height (h) and width (number of leaves) of a (perfectly
balanced) binary tree.
[trees/binary_trees/BT_lists.py]
def BinaryTreeList(r):
return [r, [], []]
def getRootVal(root):
return root[0]
root[0] = newVal
def getLeftChild(root):
return root[1]
def getRightChild(root):
return root[2]
def main():
3
[5, [4, [], []], []]
[7, [], [6, [], []]]
r = BinaryTreeList(3)
insertLeft(r,4)
insertLeft(r,5)
insertRight(r,6)
insertRight(r,7)
print(getRootVal(r))
print(getLeftChild(r))
print(getRightChild(r))
if __name__ == __main__:
main()
However this method is not very practical when we have many branches
(or at least it needs many improvements, for example, how it manages the
creation of new lists and how it displays or searches for new elements).
A more natural way to handle binary trees is (again) by representing
it as a collection of nodes. A simple node in a binary tree should carry
attributes for value and for left and right children, and it can have a method
to identify leaves:
[trees/binary_trees/BT.py]
class BT(object):
def __init__(self, value):
self.value = value
self.left = None
self.right = None
def is_leaf(self):
182 CHAPTER 12. BINARY TREES
def __repr__(self):
return {}.format(self.value)
def tests_BT():
"""
1
2 3
4 5 6 7
"""
tree = BT(1)
tree.insert_left(2)
tree.insert_right(3)
tree.left().insert_left(4)
tree.left().insert_right(5)
tree.right().insert_left(6)
tree.right().insert_right(7)
print(tree.right().right())
tree.right().right().value(8)
print(tree.right().right())
assert(tree.right().is_leaf() == False)
assert(tree.right().right().is_leaf() == True)
print("Tests Passed!")
if __name__ == __main__:
tests_BT()
12.3. BINARY SEARCH TREES 183
1. The left subtree of a node contains only nodes with keys less than the
nodes key.
2. The right subtree of a node contains only nodes with keys greater than
the nodes key.
3. Both the left and right subtrees must also be a binary search tree.
If the binary search tree is balanced, the following operations are O(ln n):
(i) finding a node with a given value (lookup), (ii) finding a node with
maximum or minimum value, and (iii) insertion or deletion of a node.
from BT import BT
class BST(BT):
def __init__(self, value=None):
self.value = value
self.left = None
self.right = None
def main():
"""
4
2 6
1 3 5 7
"""
tree = BST()
tree.insert(4)
tree.insert(2)
tree.insert(6)
tree.insert(1)
tree.insert(3)
tree.insert(7)
tree.insert(5)
print(tree.get_right())
print(tree.get_right().get_left())
print(tree.get_right().get_right())
print(tree.get_left())
print(tree.get_left().get_left())
print(tree.get_left().get_right())
assert(tree.find(30) == None)
12.3. BINARY SEARCH TREES 185
if __name__ == __main__:
main()
There are many other ways that a tree can be created. We could, for
instance, think of two classes, one simply for nodes, and a second one that
controls these nodes. This is not much different from the previous example
(and in the end of this chapter we will see a third hybrid example of these
two):
[trees/binary_trees/BST_with_Nodes.py]
class Node(object):
def __init__(self, value):
self.value = value
self.left = None
self.right = None
def __repr__(self):
return {}.format(self.value)
class BSTwithNodes(object):
def __init__(self):
self.root = None
else:
break
def main():
"""
BST
4
2 6
1 3 5 7
"""
tree = BSTwithNodes()
l1 = [4, 2, 6, 1, 3, 7, 5]
for i in l1: tree.insert(i)
print(tree.root)
print(tree.root.right)
print(tree.root.right.left)
print(tree.root.right.right)
print(tree.root.left)
print(tree.root.left.left)
print(tree.root.left.right)
if __name__ == __main__:
main()
? Node splitting (and merging): nodes are not allowed to have more
than two children, so when a node become overfull it splits into two
12.4. SELF-BALANCING BST 187
subnodes.
AVL Trees
An AVL tree is a binary search tree with a self-balancing condition where
the difference between the height of the left and right subtrees cannot be
more than one.
To implement an AVL tree, we can start by adding a self-balancing
method to our BST classes, called every time we add a new node to the
tree. The method works by continuously checking the height of the tree,
which is added as a new attribute:
def height(node):
if node is None:
return -1
else:
return node.height
def update_height(node):
node.height = max(height(node.left), height(node.right)) + 1
Now we can go ahead and implement the rebalancing method for our
tree. The method will check whether the difference between the new heights
of the right and left subtrees are up to 1. If this is not true, the method will
perform the rotations:
def rebalance(self, node):
while node is not None:
update_height(node)
else:
self.right_rotate(node.right)
self.left_rotate(node)
node = node.value
We are now ready to write the entire AVL tree class! In the following
code we have used our old BST class as a superclass, together with the
methods we have described above. In addition, two methods for traversals
12.4. SELF-BALANCING BST 189
were used, and we will explain them better in the next chapter. For now, it
is good to keep the example in mind and that this AVL tree indeed supports
insert, find, and delete-min operations at O(ln n) time:
[trees/binary_trees/avl.py]
class AVL(BSTwithNodes):
def __init__(self):
self.root = None
update_height(x)
update_height(y)
self.left_rotate(node)
node = node.value
def height(node):
if node is None: return -1
else: return node.height
def update_height(node):
node.height = max(height(node.left), height(node.right)) + 1
def main():
tree = AVL()
tree.insert(4)
tree.insert(2)
tree.insert(6)
tree.insert(1)
tree.insert(3)
tree.insert(7)
tree.insert(5)
print(Inorder Traversal:)
tree.inorder(tree.root)
if __name__ == __main__:
main()
192 CHAPTER 12. BINARY TREES
Red-black Trees
Red-black trees are an evolution of a binary search trees that aim to keep the
tree balanced without affecting the complexity of the primitive operations.
This is done by coloring each node in the tree with either red or black and
preserving a set of properties that guarantees that the deepest path in the
tree is not longer than twice the shortest one.
Red-black trees have the following properties:
? All leaf (nil) nodes are colored with black; if a nodes child is missing
then we will assume that it has a nil child in that place and this nil
child is always colored black.
? Every path from a node n to a descendent leaf has the same number
of black nodes (not counting node n). We call this number the black
height of n.
Binary Heaps
Binary heaps are complete balanced binary trees. The heap property makes
it easier to maintain the structure, i.e., the balance of the tree. There is no
need to modify a structure of the tree by splitting or rotating nodes in a
heap: the only operation will be swapping parent and child nodes.
In a binary heap, the root (the smallest or largest element) is always
found in h[0]. Considering a node at index i:
i1
? the parent index is 2 ,
[trees/binary_trees/binary_tree.py]
1 ---> level 0
2 3 ---> level 1
4 5 ---> level 2
6 7 ---> level 3
8 9 ---> level 4
METHODS TO MODIFY NODES
194 CHAPTER 12. BINARY TREES
METHODS TO PRINT/SHOW NODES ATTRIBUTES
def __repr__(self):
Private method for this class string representation
return {}.format(self.item)
return self
found = None
if self.left: found = self.left._searchForNode(value)
if self.right: found = found or
self.right._searchForNode(value)
return found
def _isLeaf(self):
Return True if the node is a leaf
return not self.right and not self.left
def _isBalanced(self):
Find whether the tree is balanced, by calculating heights
first, O(n2)
if self._getMaxHeight() - self._getMinHeight() < 2:
return False
else:
if self._isLeaf():
12.5. ADDITIONAL EXERCISES 197
return True
elif self.left and self.right:
return self.left._isBalanced() and
self.right._isBalanced()
elif not self.left and self.right:
return self.right._isBalanced()
elif not self.right and self.left:
return self.right._isBalanced()
def _isBST(self):
Find whether the tree is a BST, inorder
if self.item:
if self._isLeaf(): return True
elif self.left:
if self.left.item < self.item: return
self.left._isBST()
else: return False
elif self.right:
if self.right.item > self.item: return
self.right._isBST()
else: return False
else:
raise Exception(Tree is empty)
class BinaryTree(object):
>>> bt = BinaryTree()
>>> for i in range(1, 10): bt.addNode(i)
>>> bt.hasNode(7)
True
>>> bt.hasNode(12)
False
>>> bt.printTree()
[1, 2, 4, 6, 8, 9, 7, 5, 3]
>>> bt.printTree(pre)
[1, 2, 4, 6, 8, 9, 7, 5, 3]
>>> bt.printTree(bft)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> bt.printTree(post)
[8, 9, 6, 7, 4, 5, 2, 3, 1]
>>> bt.printTree(in)
[8, 6, 9, 4, 7, 2, 5, 1, 3]
198 CHAPTER 12. BINARY TREES
>>> bt.hasNode(9)
True
>>> bt.hasNode(11)
False
>>> bt.isLeaf(8)
True
>>> bt.getNodeLevel(1)
0
>>> bt.getNodeLevel(8)
4
>>> bt.getSizeTree()
9
>>> bt.isRoot(10)
False
>>> bt.isRoot(1)
True
>>> bt.getHeight()
4
>>> bt.isBST()
False
>>> bt.isBalanced()
False
>>> bt.isBalanced(2)
False
>>> bt.getAncestor(8, 5)
2
>>> bt.getAncestor(8, 5, pre-post)
2
>>> bt.getAncestor(8, 5, post-in)
2
def __init__(self):
Construtor for the Binary Tree, which is a container of
Nodes
self.root = None
METHODS TO MODIFY THE TREE
METHODS TO PRINT/SHOW TREES ATTRIBUTES
def __repr__(self):
Private method for this class string representation
return {}.format(self.item)
def getSizeTree(self):
Return how many nodes in the tree, O(n)
return len(self.root._printDFTpreOrder(self.root))
def getHeight(self):
Returns the height/depth of the tree, best/worst O(n)
return self.root._getMaxHeight()
if __name__ == __main__:
import doctest
doctest.testmod()
[trees/binary_trees/binary_search_tree.py]
12.5. ADDITIONAL EXERCISES 203
7 ---> level 0
4 9 ---> level 1
2 5 8 10 ---> level 2
1 6 ---> level 3
class NodeBST(NodeBT):
class BinarySearchTree(BinaryTree):
>>> bst = BinarySearchTree()
>>> l1 = [7, 4, 5, 9, 2, 8, 1, 6, 10]
>>> for i in l1: bst.addNode(i)
>>> bst.hasNode(3)
False
>>> bst.hasNode(10)
True
>>> bst.printTree(pre)
[7, 4, 2, 1, 5, 6, 9, 8, 10]
>>> bst.printTree(post)
[1, 2, 6, 5, 4, 8, 10, 9, 7]
>>> bst.printTree(in)
[1, 2, 4, 5, 6, 7, 8, 9, 10]
>>> bst.printTree(bft)
[7, 4, 9, 2, 5, 8, 10, 1, 6]
>>> bst.getHeight()
3
>>> bst.isBST()
True
>>> bst.isBalanced()
False
>>> bst.isBalanced(2)
False
>>> bst.getAncestor(2, 9)
7
>>> bst.getAncestor(2, 9, bst)
7
>>> bst.getAncestor(2, 9, pre-post)
7
>>> bst.getAncestor(2, 9, post-in)
12.5. ADDITIONAL EXERCISES 205
if __name__ == __main__:
import doctest
doctest.testmod()
206 CHAPTER 12. BINARY TREES
Chapter 13
Traversals are algorithms used to visit the objects (nodes) in some connected
structure, such as a tree or a graph. Traversal problems can be either visiting
every node or visiting only some specific nodes.
207
208CHAPTER 13. TRAVERSALS AND PROBLEMS ON GRAPHS AND TREES
nodes + total number of outgoing edges from these nodes) = O(V + E).
DFSs are usually implemented using LIFO structure such as stacks to keep
track of the discovered nodes, and they can be divided in three different
strategies:
Postorder: Visit a node after traversing all subtrees (left right root):
def postorder(root):
if root != 0:
postorder(root.left)
postorder(root.right)
yield root.value
Inorder: Visit a node after traversing its left subtree but before the right
subtree (left root right):
def inorder(root):
if root != 0:
inorder(root.left)
yield root.value
inorder(root.right)
point. Traditionally, BFSs are implemented using a list to store the values
of the visited nodes and then a FIFO queue to store those nodes that have
yet to be visited. The total runtime is also O(V + E).
class BSTTraversal(BSTwithNodes):
def __init__(self):
self.root = None
self.nodes_BFS = []
self.nodes_DFS_pre = []
self.nodes_DFS_post = []
self.nodes_DFS_in = []
def BFS(self):
self.root.level = 0
queue = [self.root]
current_level = self.root.level
if current_node.left:
current_node.left.level = current_level + 1
queue.append(current_node.left)
if current_node.right:
current_node.right.level = current_level + 1
queue.append(current_node.right)
210CHAPTER 13. TRAVERSALS AND PROBLEMS ON GRAPHS AND TREES
return self.nodes_BFS
def main():
tree = BSTTraversal()
l1 = [10, 5, 15, 1, 6, 11, 50]
for i in l1: tree.insert(i)
if __name__ == __main__:
main()
13.4. ADDITIONAL EXERCISES 211
class TranversalBST(object):
def __init__(self):
self.bst = BST(None)
self.nodes = []
def inorder(self):
current = self.bst
self.nodes = []
stack = []
while len(stack) > 0 or current is not None:
if current is not None:
stack.append(current)
current = current.left
else:
current = stack.pop()
self.nodes.append(current.value)
current = current.right
return self.nodes
def preorder(self):
212CHAPTER 13. TRAVERSALS AND PROBLEMS ON GRAPHS AND TREES
self.nodes = []
stack = [self.bst]
while len(stack) > 0:
curr = stack.pop()
if curr is not None:
self.nodes.append(curr.value)
stack.append(curr.right)
stack.append(curr.left)
return self.nodes
def preorder2(self):
self.nodes = []
current = self.bst
stack = []
while len(stack) > 0 or current is not None:
if current is not None:
self.nodes.append(current.value)
stack.append(current)
current = current.left
else:
current = stack.pop()
current = current.right
return self.nodes
def main():
"""
10
5 15
1 6 11 50
"""
t = TranversalBST()
t.insert(10)
t.insert(5)
t.insert(15)
t.insert(1)
t.insert(6)
t.insert(11)
t.insert(50)
print(t.preorder())
print(t.preorder2())
print(t.inorder())
if __name__ == __main__:
13.4. ADDITIONAL EXERCISES 213
main()
In the following example we use the class in the previous example with some
methods to find the (maximum and minimum) depths, to check whether the
tree is balanced and to find a key in preorder and inorder traversals:
[trees/traversals/BST_extra_methods.py]
class BSTwithExtra(TranversalBST):
def __init__(self):
self.bst = BST(None)
self.nodes = []
if node.left:
return 1 + self.get_max_depth(node.left, 0)
elif node.right:
return 1 + self.get_max_depth(node.right, 0)
else:
return 0
def main():
"""
10
5 15
1 6 11 50
60
70
80
"""
t = BSTwithExtra()
l1 = [10, 5, 15, 1, 6, 11, 50, 60, 70, 80]
for i in l1: t.insert(i)
print(t.inorder())
print(t.preorder())
assert(t.get_max_depth() == 5)
assert(t.get_min_depth() == 2)
assert(t.is_balanced() == 3)
assert(t.get_inorder(10) == 3)
assert(t.get_preorder(10) == 0)
"""
1
2 3
4 5 6 7
13.4. ADDITIONAL EXERCISES 215
"""
t2 = BSTwithExtra()
l2 = [1, 2, 3, 4, 5, 6, 7, 8]
for i in l2: t2.insert(i)
print(t2.inorder())
print(t2.preorder())
assert(t2.is_balanced() == 0)
print("Tests Passed!")
if __name__ == __main__:
main()
Ancestor in a BST
The example bellow finds the lowest level common ancestor of two nodes in
a binary search tree:
[trees/traversals/BST_ancestor.py]
def test_find_ancestor():
"""
216CHAPTER 13. TRAVERSALS AND PROBLEMS ON GRAPHS AND TREES
10
5 15
1 6 11 50
"""
t = TranversalBST()
l1 = [10, 5, 15, 1, 6, 11, 50]
for i in l1: t.insert(i)
path = t.preorder()
assert(find_ancestor(path, 1, 6) == 5)
assert(find_ancestor(path, 1, 11) == 10)
assert(find_ancestor(path, 11, 50) == 15)
assert(find_ancestor(path, 5, 15) == 10)
print("Tests passsed!")
if __name__ == __main__:
test_find_ancestor()
Bibliography
Websites:
[Interactive Python] http://interactivepython.org
Books:
[A nice Book for Software Eng. Interviews] Cracking the Coding In-
terview, Gayle Laakmann McDowell, 2013
217
218 BIBLIOGRAPHY
[A nice Python Book] Learn Python The Hard Way, Zed A. Shaw, 2010