Machine Learning Exercises in Python, Part 1: Curious Insight
Machine Learning Exercises in Python, Part 1: Curious Insight
Curious Insight
MACHINE LEARNING
This post is part of a series covering the exercises from Andrew Ng's
machine learning class on Coursera. The original code, exercise text, and
1 of 14 2/1/19, 8:32 PM
Machine Learning Exercises In Python, Part 1 https://www.johnwittenauer.net/machine-learning-exercises-i...
but had not had the time to dive in and take a class. Earlier this year I
�nally pulled the trigger and signed up for Andrew Ng's Machine
Learning class. I completed the whole thing from start to �nish, including
power of this type of education platform, and I've been hooked ever since.
This blog post will be the �rst in a series covering the programming
exercises from Andrew's class. One aspect of the course that I didn't
particularly care for was the use of Octave for assignments. Although
in either R or Python (certainly there are other languages and tools being
used, but these two are unquestionably at the top of the list). Since I'm
IPython repo on Github. You'll also �nd the data used in these exercises
and the original exercise PDFs in sub-folders o� the root directory if you're
interested.
While I can explain some of the concepts involved in this exercise along
the way, it's impossible for me to convey all the information you might
but haven't been exposed to it yet, I encourage you to check out the class
2 of 14 2/1/19, 8:32 PM
Machine Learning Exercises In Python, Part 1 https://www.johnwittenauer.net/machine-learning-exercises-i...
In the �rst part of exercise 1, we're tasked with implementing simple linear
regression to predict pro�ts for a food truck. Suppose you are the CEO of a
restaurant franchise and are considering di�erent cities for opening a new
outlet. The chain already has trucks in various cities and you have data for
pro�ts and populations from the cities. You'd like to �gure out what the
expected pro�t of a new food truck might be given only the population of
libraries.
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Now let's get things rolling. We can use pandas to load the data into a
data frame and display the �rst few rows using the "head" function.
3 of 14 2/1/19, 8:32 PM
Machine Learning Exercises In Python, Part 1 https://www.johnwittenauer.net/machine-learning-exercises-i...
Population Pro�t
0 6.1101 17.5920
1 5.5277 9.1302
2 8.5186 13.6620
3 7.0032 11.8540
4 5.8598 6.8233
This is helpful to get a "feel" for the data during the exploratory analysis
stage of a project.
data.describe()
Population Pro�t
4 of 14 2/1/19, 8:32 PM
Machine Learning Exercises In Python, Part 1 https://www.johnwittenauer.net/machine-learning-exercises-i...
Examining stats about your data can be helpful, but sometimes you need
to �nd ways to visualize it too. Fortunately this data set only has one
of what it looks like. We can use the "plot" function provided by pandas for
It really helps to actually look at what's going on, doesn't it? We can
clearly see that there's a cluster of values around cities with smaller
the city increases. Now let's get to the fun part - implementing a linear
5 of 14 2/1/19, 8:32 PM
Machine Learning Exercises In Python, Part 1 https://www.johnwittenauer.net/machine-learning-exercises-i...
independent variables (if there's one independent variable then it's called
variable then it's called multiple linear regression). There are lots of
di�erent types and variances of linear regression that are outside the
scope of this discussion so I won't go into that here, but to put it simply -
we're trying to create a linear model of the data X, using some number of
parameters theta, that describes the variance of the data such that given a
new data point that's not in X, we could accurately predict what the
called gradient descent to �nd the parameters theta. If you're familiar with
linear algebra, you may be aware that there's another way to �nd the
optimal parameters for a linear model called the "normal equation" which
However, the issue with this approach is that it doesn't scale very well for
large data sets. In contrast, we can use variants of gradient descent and
6 of 14 2/1/19, 8:32 PM
Machine Learning Exercises In Python, Part 1 https://www.johnwittenauer.net/machine-learning-exercises-i...
Okay, that's enough theory. Let's write some code. The �rst thing we need
is a cost function. The cost function evaluates the quality of our model by
calculating the error between our model's prediction for a data point,
using the model parameters, and the actual data point. For example, if the
population for a given city is 4 and we predicted that it was 7, our error is
this for each data point in X and sum the result to get the cost. Here's the
function:
Notice that there are no loops. We're taking advantage of numpy's linear
"for" loop.
In order to make this cost function work seamlessly with the pandas data
make the matrix operations work correctly (I won't go into detail on why
this is needed, but it's in the exercise text if you're interested - basically it
7 of 14 2/1/19, 8:32 PM
Machine Learning Exercises In Python, Part 1 https://www.johnwittenauer.net/machine-learning-exercises-i...
accounts for the intercept term in the linear equation). Second, we need to
variable y.
Finally, we're going to convert our data frames to numpy matrices and
look at the shape of the matrices you're dealing with. It's also helpful to
remember when walking through the steps in your head that matrix
8 of 14 2/1/19, 8:32 PM
Machine Learning Exercises In Python, Part 1 https://www.johnwittenauer.net/machine-learning-exercises-i...
Okay, so now we can try out our cost function. Remember the parameters
were initialized to 0 so the solution isn't optimal yet, but we can see if it
works.
computeCost(X, y, theta)
32.072733877455676
descent on the parameters theta using the update rules de�ned in the
for i in range(iters):
error = (X * theta.T) - y
for j in range(parameters):
term = np.multiply(error, X[:,j])
temp[0,j] = theta[0,j] - ((alpha / len(X)) * np
theta = temp
cost[i] = computeCost(X, y, theta)
The idea with gradient descent is that for each iteration, we compute the
gradient of the error term in order to �gure out the appropriate direction
9 of 14 2/1/19, 8:32 PM
Machine Learning Exercises In Python, Part 1 https://www.johnwittenauer.net/machine-learning-exercises-i...
bringing our solution closer to the optimal solution (i.e best �t).
This is a fairly complex topic and I could easily devote a whole blog post
would recommend starting with this article and branching out from
there.
Once again we're relying on numpy and linear algebra for our solution.
there's a way to get rid of that inner loop and update all of the parameters
at once. I'll leave it up to the reader to �gure it out for now (I'll cover it in a
later post).
Now that we've got a way to evaluate solutions, and a way to �nd a good
10 of 14 2/1/19, 8:32 PM
Machine Learning Exercises In Python, Part 1 https://www.johnwittenauer.net/machine-learning-exercises-i...
Note that we've initialized a few new variables here. If you look closely at
the gradient descent function, it has parameters called alpha and iters.
Alpha is the learning rate - it's a factor in the update rule for the
parameters that helps determine how quickly the algorithm will converge
hard and fast rule for how to initialize these parameters and typically
linear model for our data set. One quick way to evaluate just how good
our regression model is might be to look at the total error of our new
computeCost(X, y, g)
4.5159555030789118
That's certainly a lot better than 32, but it's not a very intuitive way to look
the scatter plot from before? Let's overlay a line representing our model
on top of a scatter plot of the data to see how well it �ts. We can use
11 of 14 2/1/19, 8:32 PM
Machine Learning Exercises In Python, Part 1 https://www.johnwittenauer.net/machine-learning-exercises-i...
within the range of our data, and then "evaluate" those points using our
model to see what the expected pro�t would be. We can then turn it into a
x = np.linspace(data.Population.min(), data.Population.max(
f = g[0, 0] + (g[0, 1] * x)
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(x, f, 'r', label='Prediction')
ax.scatter(data.Population, data.Profit, label='Traning Data'
ax.legend(loc=2)
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Predicted Profit vs. Population Size')
12 of 14 2/1/19, 8:32 PM
Machine Learning Exercises In Python, Part 1 https://www.johnwittenauer.net/machine-learning-exercises-i...
Not bad! Our solution looks like and optimal linear model of the data set.
Since the gradient decent function also outputs a vector with the cost at
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(np.arange(iters), cost, 'r')
ax.set_xlabel('Iterations')
ax.set_ylabel('Cost')
ax.set_title('Error vs. Training Epoch')
Notice that the cost always decreases - this is an example of what's called
space for the problem (i.e. plot the cost as a function of the model
parameters for every possible value of the parameters) you would see that
13 of 14 2/1/19, 8:32 PM
Machine Learning Exercises In Python, Part 1 https://www.johnwittenauer.net/machine-learning-exercises-i...
solution.
That's all for now! In part 2 we'll �nish o� the �rst exercise by extending
this example to more than 1 variable. I'll also show how the above solution
learn.
Follow @jdwittenauer
AUTHOR
John Wittenauer
14 of 14 2/1/19, 8:32 PM