Functional Python for
Learning Data Science
Joel Grus
@joelgrus
[email protected](video of talk at https://www.youtube.com/watch?v=ThS4juptJjQ )
STUPID ITERTOOLS TRICKS
Functional Python for
Learning Data Science
Joel Grus
@joelgrus
[email protected] STUPID ITERTOOLS TRICKS
Functional Python for
Learning Data Science
Joel Grus
@joelgrus
[email protected] STUPID ITERTOOLS TRICKS
Functional Python for
Learning Data Science
Joel Grus
@joelgrus
[email protected]About Me
● SWE at Google
● Previously data science at
VoloMetrix, Decide, Farecast
● Wrote a book ------>
● Functional programming
zealot (ask me about Haskell!)
What
is
Functional
Programming?
Use Functions
Avoid
Side-Effects
First-Class
Functions
Laziness
Immutability
Functional Programming in
Python
Functional Programming in
Python 3
from operator import add
functools
from functools import partial
partial function application ("currying")
def add1(x): return add(1, x)
could be written as
add1 = partial(add, 1)
from functools import reduce
So now reduce(). This is actually the one I've always hated most, because, apart
from a few examples involving + or *, almost every time I see a reduce() call with a
non-trivial function argument, I need to grab pen and paper to diagram what's
actually being fed into that function before I understand what the reduce() is
supposed to do. So in my mind, the applicability of reduce() is pretty much limited
to associative operators, and in all other cases it's better to write out the
accumulation loop explicitly.
Guido van Rossum, 2005
from functools import reduce
So now reduce(). This is actually the one I've always hated most, because, apart
from a few examples involving + or *, almost every time I see a reduce() call with a
non-trivial function argument, I need to grab pen and paper to diagram what's
actually being fed into that function before I understand what the reduce() is
supposed to do. So in my mind, the applicability of reduce() is pretty much limited
to associative operators, and in all other cases it's better to write out the
accumulation loop explicitly.
Guido van Rossum, 2005
n.b. this criticism also
applies to just about
everything we'LL do today!
iterators
In [1]: xs = [1, 2, 3] get an iterator
In [2]: it = iter(xs)
In [3]: next(it)
Out[3]: 1 take its values
with next
In [4]: next(it)
Out[4]: 2
In [5]: next(it)
Out[5]: 3
In [6]: next(it)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-121-5c05586d40e8> in <module>()
----> 1 next(iter)
StopIteration: get a StopIteration
exception when no values left
iterators
serving up values one-at-a-time with next means
you can generate them on-demand
(laziness)
allows us to create lazy infinite sequences
generators
def lazy_integers(n=0):
function with yield
while True:
yield n creates a generator
n += 1
xs = lazy_integers()
infinite sequence!
[next(xs) for _ in range(10)]
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# maintains state
[next(xs) for _ in range(10)]
# [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
generator comprehensions
# computes nothing until next or for
squares = (x**2 for x in lazy_integers())
doubles = (2*x for x in lazy_integers())
next(squares) # 0
next(squares) # 1
next(squares) # 4
next(squares) # 9
# don't do this!!!:
bad_squares = [x**2 for x in lazy_integers()]
generators and pipelines
$ cat euler.hs | grep -i prime | wc -l
63
with open("euler.hs", "r") as f:
lines = (line for line in f)
prime_lines = filter(lambda line: "prime" in line.lower(),
lines)
# Make sure to force evaluation before f goes out of scope!
# or else ValueError: I/O operation on closed file
line_count = len(list(prime_lines))
itertools
from itertools import count
count([start=0], [step=1])
Gives the infinite sequence:
start, start + step, start + 2 * step, ...
from itertools import islice
islice(seq, [start=0], stop, [step=1])
Returns a "lazy slice" out of seq
from itertools import tee
tee(it, [n=2])
splits an iterator into two or more memoized
copies
huge efficiency gains if you have to iterate through
expensive computations multiple times
from itertools import repeat
repeat(elem, [n=forever])
repeats elem n times (or forever if no n)
from itertools import cycle
cycle(p)
repeats the elements of p over and over and over
again forever
from itertools import chain
chain(p, q, …)
iterates first through the elements of p, then the
elements of q, and so on
from itertools import accumulate
accumulate(p, [func=add])
with default
returns the sequence a, where func=add
this is "running total", however
a[0] = p[0] we will use it for way more
than that
a[1] = func(a[0], p[1])
a[2] = func(a[1], p[2])
...
Down the rabbit hole!
We Need Some itertools of Our Own
# force the first n values of a sequence
def take(n, it):
return [x for x in islice(it, n)]
# new sequence with all but the first n values of a sequence
def drop(n, it):
return islice(it, n, None)
# force the first value of a sequence
head = next
# new sequence with all but the first value of a sequence
tail = partial(drop, 1)
We're also missing iterate
iterate(f, x)
should be the sequence x, f(x), f(f(x)), ...
missing iterate
def iterate(f, x):
"""will blow the stack eventually"""
yield x
yield from iterate(f, f(x))
yield from is what sold me on Python 3
missing iterate
def iterate(f, x):
"""will not blow the stack eventually"""
while True:
yield x
x = f(x)
but look at that awful mutation!
missing iterate
def iterate(f, x):
"""crazy functional version"""
return accumulate(repeat(x), lambda fx, _: f(fx))
too clever trick, ignores all elements of the
input sequence but the first, just applies f to
previously "accumulated" value
using iterate
def lazy_integers():
return iterate(add1, 0)
take(10, lazy_integers())
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
fibonacci numbers
def fib(n): super inefficient
if n == 0: return 1
if n == 1: return 1
return fib(n-1) + fib(n-2)
[fib(i) for i in range(10)]
# [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
fibonacci numbers
def fibs():
a, b = 0, 1 efficient, but look
at all that terrible
while True: mutation!
yield b
a, b = b, a + b
take(10, fibs())
# [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
fibonacci numbers
"Haskellic" version
def fibs():
yield 1
yield 1
yield from map(add, fibs(), tail(fibs()))
take(10, fibs()) but is regenerating the
# [1, 1, 2, 3, 5, 8, 13, 21, 34, 55] sequence over and over
again
%time take(30, fibs())
CPU times: user 7.62 s, sys: 392 ms, total: 8.01 s
Wall time: 8.02 s
fibonacci numbers
create efficient
def fibs(): memoized version
yield 1
yield 1
fibs1, fibs2 = tee(fibs())
yield from map(add, fibs1, tail(fibs2))
%time take(30, fibs())
CPU times: user 186 µs, sys: 11 µs, total: 197 µs
Wall time: 200 µs
fibonacci numbers
def next_fib(pair): use a pure function
x, y = pair now we're getting
return (y, x + y) functional!
def fibs():
return (y for x, y in iterate(next_fib, (0, 1)))
%time take(30, fibs())
CPU times: user 31 µs, sys: 4 µs, total: 35 µs
Wall time: 37.2 µs
prime numbers (just for fun)
def filter_primes(it):
"""will blow the stack"""
p = next(it)
yield p
yield from filter_primes(filter(lambda x: x % p > 0, it))
def all_primes():
return filter_primes(count(2))
take(10, all_primes())
# [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
What does any of this have
to do with data science?
k-means clustering
● have some points
● want to group them into k clusters
● want clusters to be "small"
k-means clustering
● have some points
● want to group them into k clusters
● want clusters to be "small"
k-means clustering
● have some points
● want to group them into k clusters
● want clusters to be "small"
● iterative approach:
○ choose k means
○ assign each point to cluster of "closest" mean
○ compute new means
○ repeat
● implementation intended to be expository not efficient
Not-functional approach
class KMeans:
def __init__(self, k):
self.k = k
self.means = [None for _ in range(k)]
def predict(self, point):
"""return index of closest mean"""
d_min = float('inf')
for j, m in enumerate(self.means):
d = sum((m_i - p_i)**2
for m_i, p_i in zip(m, point))
if d < d_min:
prediction = j
d_min = d
return prediction
def fit(self, points, num_iters=10):
"""find the k means"""
assignments = [None for _ in points]
self.means = random.sample(list(points), self.k)
for _ in range(num_iters):
# assign each point to its closest mean
for i, point in enumerate(points):
assignments[i] = self.predict(point)
# compute new means
for j in range(self.k):
cluster = [p for p, c in zip(points, assignments)
if c == j]
self.means[j] = list(
map(lambda x: x / len(cluster),
reduce(partial(map, add), cluster)))
# 100 random points in the unit square, 5 clusters
points = np.random.random((100,2))
model = KMeans(5)
model.fit(points)
assignments = [model.predict(point) for point in points]
# now plot the means and the clusters
for x, y in model.means:
plt.plot(x, y, marker='*', markersize=10, color='black')
for j, color in zip(range(5), ['r', 'g', 'b', 'm', 'c']):
cluster = [p
for p, c in zip(points, assignments)
if j == c]
xs, ys = zip(*cluster)
plt.scatter(xs, ys, color=color)
plt.show()
# 100 random points in the unit square, 5 clusters
points = np.random.random((100,2))
model = KMeans(5)
model.fit(points)
assignments = [model.predict(point) for point in points]
# now plot the means and the clusters
for x, y in model.means:
plt.plot(x, y, marker='*', markersize=10, color='black')
for j, color in zip(range(5), ['r', 'g', 'b', 'm', 'c']):
cluster = [p
for p, c in zip(points, assignments)
if j == c]
xs, ys = zip(*cluster)
plt.scatter(xs, ys, color=color)
plt.show()
Let's do it functional!
Let's do it functional!
def k_means(points, k, num_iters=10):
means = random.sample(points, k) pull work into
for _ in range(num_iters): new_means
function
means = new_means(points, means)
return means
better, but look at
all that terrible
mutation!
k_means
k_meanses
Let's do it crazy functional!
def k_meanses(points, k):
"""returns the infinite sequence of meanses"""
initial_means = random.sample(points, k)
return iterate(partial(new_means, points),
initial_means)
Let's do it crazy functional!
def k_meanses(points, k):
"""returns the infinite sequence of meanses"""
initial_means = random.sample(points, k)
return iterate(partial(new_means, points),
initial_means)
HUH?
Let's do it crazy functional!
def k_meanses(points, k):
"""returns the infinite sequence of meanses"""
initial_means = random.sample(points, k)
return iterate(partial(new_means, points),
initial_means)
partial(new_means, points) is the (curried) function that
maps
prev_means -> next_means
Let's do it crazy functional!
def k_meanses(points, k):
"""returns the infinite sequence of meanses"""
initial_means = random.sample(points, k)
return iterate(partial(new_means, points),
initial_means)
iterate produces the series x, f(x), f(f(x)), ....
so this results in the (lazy, infinite) sequence:
● initial_means
● new_means(points, initial_means)
● new_means(points, new_means(points, initial_means))
● ...
Let's do it crazy functional!
def k_meanses(points, k):
"""returns the infinite sequence of meanses"""
initial_means = random.sample(points, k)
return iterate(partial(new_means, points),
initial_means)
# 10 iterations
meanses = take(10, k_meanses(points, 5))
# until convergence
meanses = until_convergence(k_meanses(points, 5))
Let's do it crazy functional!
# until convergence
meanses = until_convergence(k_meanses(points, 5))
Why is this interesting? By generating a series of
"meanses", we can observe how they converge.
Previously we just saw the end result
Let's do it functional!
def until_convergence(it):
prev = None
while True:
value = next(it)
if value == prev: raise StopIteration
yield value
prev = value
but look at all that
terrible mutation!
Let's do it crazy functional!
def until_convergence(it):
return accumulate(it, no_repeat)
Let's do it crazy functional!
def until_convergence(it):
return accumulate(it, no_repeat)
def no_repeat(prev, curr):
if prev == curr:
raise StopIteration
else:
return curr
Let's do it crazy functional!
def until_nearly_convergence(it, tolerance=0.001):
return accumulate(it, partial(within_tolerance, tolerance))
def within_tolerance(tol, prev, curr):
if abs(prev - curr) < tol:
raise StopIteration
else:
return curr
Meanwhile, we still need new_means
def new_means(points, old_means):
k = len(old_means)
assignments = [closest_index(point, old_means)
for point in points]
clusters = [[point
for point, c in zip(points, assignments)
if c == j] for j in range(k)]
return [cluster_mean(cluster) for cluster in clusters]
Which means we need closest_index
def closest_index(point, means):
min_dist = float('inf')
for j, mean in enumerate(means):
dist = squared_distance(point, mean)
if dist < min_dist:
min_dist = dist
closest = j
return closest
but look at all that
terrible mutation!
Let's be really functional
def closest_index(point, means):
distances = [squared_distance(point, mean)
for mean in means]
return min(enumerate(distances),
key=lambda pair: pair[1])[0]
We still need squared_distance
def squared_distance(p, q):
return sum((p_i - q_i)**2
for p_i, q_i in zip(p, q))
And finally cluster_mean
def cluster_mean(points):
num_points = len(points)
dim = len(points[0]) if points else 0
sum_points = [sum(point[j] for point in points)
for j in range(dim)]
return [s / num_points for s in sum_points]
Aside: matplotlib animation
from matplotlib import animation
def animation_frame(nframe):
plt.cla()
x, y = get_data_for(nframe)
plt.plot(x, y)
fig = plt.figure(figsize=(5,4))
anim = animation.FuncAnimation(fig, animation_frame,
frames=num_frames)
anim.save('animation.gif', writer='imagemagick', fps=4)
data = [(random.random(), random.random()) for _ in range(500)]
meanses = [means for means in until_convergence(k_meanses(data, k))]
colors = ['r', 'g', 'b', 'c', 'm']
def animation_frame(nframe):
means = meanses[nframe]
plt.cla()
assignments = [closest_index(point, means)
for point in data]
clusters = [[point
for point, c in zip(data, assignments)
if c == j] for j in range(k)]
for cluster, color, mean in zip(clusters, colors, means):
x, y = zip(*cluster)
plt.scatter(x, y, color=color)
plt.plot(*mean, color=color, marker='*', markersize=20)
data = [(random.random(), random.random()) for _ in range(500)]
meanses = [means for means in until_convergence(k_meanses(data, k))]
colors = ['r', 'g', 'b', 'c', 'm']
def animation_frame(nframe):
means = meanses[nframe]
plt.cla()
assignments = [closest_index(point, means)
for point in data]
clusters = [[point
for point, c in zip(data, assignments)
if c == j] for j in range(k)]
for cluster, color, mean in zip(clusters, colors, means):
x, y = zip(*cluster)
plt.scatter(x, y, color=color)
plt.plot(*mean, color=color, marker='*', markersize=20)
data = [(random.choice([0,1,2,4,5]) + random.random(),
random.normalvariate(0, 1)) for _ in range(500)]
meanses = [mean for mean in until_convergence(k_meanses(data, 5))]
data = [(random.choice([0,1,2,4,5]) + random.random(),
random.normalvariate(0, 1)) for _ in range(500)]
meanses = [mean for mean in until_convergence(k_meanses(data, 5))]
Gradient Descent
Minimize a function by computing the gradient
and taking small steps in the opposite direction
For example, say we want to find a minimum of
def f(x_i):
return sum(x_ij**2 for x_ij in x_i)
Gradient Descent
def f(x_i):
return sum(x_ij**2 for x_ij in x_i)
gradient is
def df(x_i):
return [2 * x_ij for x_ij in x_i]
def gradient_step(df, alpha, x_i):
return [x_ij + alpha * df_j
for x_ij, df_j in zip(x_i, df(x_i))]
gradient step is a pure function
if we curry df and alpha then it maps
point -> next_point
def gradient_step(df, alpha, x_i):
return [x_ij + alpha * df_j
for x_ij, df_j in zip(x_i, df(x_i))]
which means we can just use iterate
def gradient_descent(df, x_0, alpha=0.1):
return iterate(partial(gradient_step, df, -alpha),
x_0)
def gradient_step(df, alpha, x_i):
return [x_ij + alpha * df_j
for x_ij, df_j in zip(x_i, df(x_i))]
which means we can just use iterate
def gradient_descent(df, x_0, alpha=0.1):
return iterate(partial(gradient_step, df, -alpha),
x_0)
THIS IS (basically) A WORKING IMPLEMENTATION!
def gradient_step(df, alpha, x_i):
return [x_ij + alpha * df_j
for x_ij, df_j in zip(x_i, df(x_i))]
def gradient_descent(df, x_0, alpha=0.1):
return iterate(partial(gradient_step, df, -alpha), x_0)
take(100, gradient_descent(df, [random.random(),
random.random()]))[::20]
# [[0.3580493746949883, 0.8916606598206824],
# [0.004128028237968867, 0.010280147495191952],
# [4.7592925271786166e-05, 0.00011852203117737018],
# [5.487090701298894e-07, 1.3664659851407321e-06],
# [6.326184867255761e-09, 1.57542801958253e-08]]
# run gradient descent on x^2 + y^2 from 50 random points
def random_point():
return (2 * random.random() - 1, 2 * random.random() - 1)
colors = [color for color in matplotlib.colors.cnames]
# get a length 25 "path" for each of 50 points
paths = [take(25, gradient_descent(df, random_point()))
for _ in range(50)]
# the nth frame draws the nth point in every path
def animation_frame(nframe):
points = [path[nframe] for path in paths]
for color, point in zip(colors, points):
markersize = 10 - 10 * nframe / 25
plt.plot(*point, color=color, marker='*', markersize=markersize)
# run gradient descent on x^2 + y^2 from 50 random points
def random_point():
return (2 * random.random() - 1, 2 * random.random() - 1)
colors = [color for color in matplotlib.colors.cnames]
# get a "path" for each point
paths = [take(25, gradient_descent(df, random_point()))
for _ in range(50)]
# the nth frame draws the nth point in every path
def animation_frame(nframe):
points = [path[nframe] for path in paths]
for color, point in zip(colors, points):
markersize = 10 - 10 * nframe / 25
plt.plot(*point, color=color, marker='*', markersize=markersize)
# let's try a more complex function
def random_point():
return (3 * random.random() - 1, 3 * random.random() - 1)
def f(x):
"""f(x, y) = -exp(-x^3 / 3 + x - y^2)
has min at (1,0), saddle point at (-1,0)"""
return -math.exp(x[0]**3/-3 + x[0] - x[1]**2)
def df(x):
"""just the gradient"""
return ((1 - x[0]**2) * f(x), -2 * x[1] * f(x))
# let's try a more complex function
def random_point():
return (3 * random.random() - 1, 3 * random.random() - 1)
def f(x):
"""f(x, y) = -exp(-x^3 / 3 + x - y^2)
has min at (1,0), saddle point at (-1,0)"""
return -math.exp(x[0]**3/-3 + x[0] - x[1]**2)
def df(x):
"""just the gradient"""
return ((1 - x[0]**2) * f(x), -2 * x[1] * f(x))
Stochastic Gradient Descent
In previous example, just minimized a function of a single
point.
When working with data, often want to choose parameter
(beta) to minimize an (additive) error function across all the
points
Could use gradient descent on "sum of errors" but can be very
slow if lots of points
Stochastic Gradient Descent
Instead, compute the error (and error gradient) for one point at
a time.
Take "single-point-gradient" steps.
Treat x and y as fixed (i.e. curry!), look for optimal value of beta.
def sgd_step(df, alpha, prev_beta, xy_i):
"""df is a function of x_i, y_i, beta""" deal with one point
x_i, y_i = xy_i at a time by zip-ing
x and y together
return [beta_j + alpha * df_j
for beta_j, df_j in zip(prev_beta, df(x_i, y_i, prev_beta))]
start with prev_beta
compute the gradient (for the given xi, yi)
take a small step in that direction
def sgd_step(df, alpha, prev_beta, xy_i):
"""df is a function of x_i, y_i, beta"""
x_i, y_i = xy_i
return [beta_j + alpha * df_j
for beta_j, df_j in zip(prev_beta, df(x_i, y_i, prev_beta))]
def sgd(df, x, y, beta_0, alpha=0.1):
xys = chain([beta_0], cycle(zip(x, y)))
return accumulate(xys, partial(sgd_step, df, -alpha))
what in the
name of all
that is holy
def sgd_step(df, alpha, prev_beta, xy_i):
"""df is a function of x_i, y_i, beta"""
x_i, y_i = xy_i
return [beta_j + alpha * df_j
for beta_j, df_j in zip(prev_beta, df(x_i, y_i, prev_beta))]
def sgd(df, x, y, beta_0, alpha=0.1):
xys = chain([beta_0], cycle(zip(x, y)))
return accumulate(xys, partial(sgd_step, df, -alpha))
xys is the sequence: beta_0, (x0, y0), (x1, y1), (x2, y2), …, (x0, y0), (x1, y1), (x2, y2), ….
after currying, accumulate gets the function:
(beta, (x_i, y_i)) -> next_beta
Linear Regression: y = x β + ε
x = [(1, random.randrange(100)) for _ in range(100)]
y = [-5 * x_i[0] + 10 * x_i[1] + random.random() for x_i in x]
def predict(x_i, beta): return x_i[0] * beta[0] + x_i[1] * beta[1]
Linear Regression: y = x β + ε
x = [(1, random.randrange(100)) for _ in range(100)]
y = [-5 * x_i[0] + 10 * x_i[1] + random.random() for x_i in x]
def predict(x_i, beta): return x_i[0] * beta[0] + x_i[1] * beta[1]
least squares estimate for beta
def error(x_i, y_i, beta): return predict(x_i, beta) - y_i
def sqerror(x_i, y_i, beta): return error(x_i, y_i, beta) ** 2
def sqerror_gradient(x_i, y_i, beta):
return (2 * x_i[0] * error(x_i, y_i, beta),
2 * x_i[1] * error(x_i, y_i, beta))
SGD for Linear Regression
x = [(1, random.random()) for _ in range(100)]
y = [-5 * x_i[0] + 10 * x_i[1] + random.random() for x_i in x]
# start with random beta_0
beta_0 = (random.random(), random.random())
# run the process for a fixed number of steps
results = [x for x in take(steps, sgd(sqerror_gradient, x, y, beta_0, 0.01))]
# take every show_every-th results and animate them
subresults = results[::show_every]
nframes = len(subresults)
def animation_frame(nframe):
a, b = subresults[nframe]
# regression line goes through (0, a) and (1, a + b)
plt.plot([0, 1], [a, a+b])
SGD for linear regression
x = [(1, random.random()) for _ in range(100)]
y = [-5 * x_i[0] + 10 * x_i[1] + random.random() for x_i in x]
# start with random beta_0
beta_0 = (random.random(), random.random())
# run the process for a fixed number of steps
results = [x for x in take(steps, sgd(sqerror_gradient, x, y, beta_0, 0.01))]
# take every show_every-th results and animate them
subresults = results[::show_every]
nframes = len(subresults)
def animation_frame(nframe):
a, b = subresults[nframe]
# regression line goes through (0, a) and (1, a + b)
plt.plot([0, 1], [a, a+b])
Moral of the story
● itertools is awesome
● laziness is awesome
● infinite sequences are awesome
● matplotlib animation is awesome
Thanks!
follow me on twitter: @joelgrus
check out my book ----->
(use code "AUTHD" for 50% off!)
(only works if you buy at oreilly.com)
code is at
https://github.com/joelgrus/stupid-itertools-tricks-pydata
build cool stuff and tell me about it!
[email protected]