Chapter 1
Chapter 1
Recommending a product that a client may be interested in, Another typical task is to predict a target numeric value, such
based on past purchases. as the price of a car, given a set of features (mileage, age,
brand, etc.) called predictors. This sort of task is called
This is a recommender system. One approach is to feed
regression (Figure 1-6). To train the system, you need to give
past purchases (and other information about the client) to
it many examples of cars, including both their predictors and
an artificial neural network and get it to output the most
their labels (i.e., their prices).
likely next purchase. This neural net would typically be
trained on past sequences of purchases across all clients. Unsupervised learning
In unsupervised learning, as you might guess, the training data
Building an intelligent bot for a game
is unlabeled (Figure 1-7). The system tries to learn without a
This is often tackled using Reinforcement Learning, which teacher.
is a branch of Machine Learning that trains agents (such
as bots) to pick the actions that will maximize their
rewards over time (e.g., a bot may get a reward every
time the player loses some life points), within a given
environment (such as the game). The famous AlphaGo
program that beat the world champion at the game of Go
was built using RL.
This list could go on and on, but hopefully it gives you a sense
of the incredible breadth and complexity of the tasks that Figure 1-7. An unlabeled training set for unsupervised learning
Machine Learning can tackle, and the types of techniques that Semisupervised learning
you would use for each task. Since labeling data is usually time-consuming and costly, you
Types of Machine Learning Systems will often have plenty of unlabeled instances, and few labeled
instances. Some algorithms can deal with data that’s partially
There are so many different types of Machine Learning
labeled. This is called semisupervised learning (Figure 1-11).
systems that it is useful to classify them in broad categories,
based on the following criteria:
Supervised/Unsupervised Learning
Machine Learning systems can be classified according to the
amount and type of supervision they get during training.
There are four major categories: supervised learning, Figure 1-11. Semisupervised learning with two classes
unsupervised learning, semisupervised learning, and (triangles and squares): the unlabeled examples (circles) help
Reinforcement Learning. classify a new instance (the cross) into the triangle class
rather than the square class, even though it is closer to the
Supervised learning labeled squares
In supervised learning, the training set you feed to the
algorithm includes the desired solutions, called labels (Figure Instance-Based Versus Model-Based
1-5). Learning
One more way to categorize Machine Learning systems is by
how they generalize. Most Machine Learning tasks are about
making predictions. This means that given a few training
examples, the system needs to be able to make good
predictions for (generalize to) examples it has never seen
before. Having a good performance measure on the training
data is good, but insufficient; the true goal is to perform well
Figure 1-5. A labeled training set for spam classification (an
on new instances.
example of supervised learning)
For example, suppose you want to know if money makes Figure 1-18. A few possible linear models
people happy, so you download the Better Life Index data
Before you can use your model, you need to define the
from the OECD’s website and stats about gross domestic
parameter values θ0 and θ1. How can you know which values
product (GDP) per capita from the IMF’s website. Then you
will make your model perform best? To answer this question,
join the tables and sort by GDP per capita. Table 1-1 shows an
you need to specify a performance measure. You can either
excerpt of what you get.
define a utility function (or fitness function) that measures
Table 1-1. Does money make people happier? how good your model is, or you can define a cost function
that measures how bad it is. For Linear Regression problems,
people typically use a cost function that measures the
distance between the linear model’s predictions and the
training examples; the objective is to minimize this distance.
This is where the Linear Regression algorithm comes in: you
feed it your training examples, and it finds the parameters
that make the linear model fit best to your data. This is called
training the model. In our case, the algorithm finds that the
optimal parameter values are θ0 = 4.85 and θ1 = 4.91 × 10-5.
Let’s plot the data for these countries (Figure 1-17). WARNING
Now the model fits the training data as closely as possible (for
a linear model), as you can see in Figure 1-19.
Figure 1-17
If you train a linear model on this data, you get the solid line,
while the old model is represented by the dotted line. As you
can see, not only does adding a few missing countries
significantly alter the model, but it makes it clear that such a
simple linear model is probably never going to work well. It
seems that very rich countries are not happier than
moderately rich countries (in fact, they seem unhappier), and
conversely some poor countries seem happier than many rich
countries.