PyTorch Neural Network Classifcation
PyTorch Neural Network Classifcation
Open in Colab
Binary Target can be one of two Predict whether or not someone has heart disease based on their
classi7cation options, e.g. yes or no health parameters.
Multi-class Target can be one of more Decide whether a photo of is of food, a person or a dog.
classi7cation than two options
Multi-label Target can be assigned Predict what categories should be assigned to a Wikipedia article
classi7cation more than one option (e.g. mathematics, science & philosohpy).
Classi7cation, along with regression (predicting a number, covered in notebook 01) is one of the most
common types of machine learning problems.
In this notebook, we're going to work through a couple of different classi7cation problems with PyTorch.
In other words, taking a set of inputs and predicting what class those set of inputs belong to.
Except instead of trying to predict a straight line (predicting a number, also called a regression problem),
we'll be working on a classi7cation problem.
Topic Contents
0. Architecture of a Neural networks can come in almost any shape or size, but they typically follow a
classi7cation neural network similar Noor plan.
1. Getting binary classi7cation Data can be almost anything but to get started we're going to create a simple binary
data ready classi7cation dataset.
2. Building a PyTorch Here we'll create a model to learn patterns in the data, we'll also choose a loss
classi7cation model function, optimizer and build a training loop speci7c to classi7cation.
3. Fitting the model to data We've got data and a model, now let's let the model (try to) 7nd patterns in the
(training) (training) data.
4. Making predictions and Our model's found patterns in the data, let's compare its 7ndings to the actual
evaluating a model (inference) (testing) data.
5. Improving a model (from a We've trained an evaluated a model but it's not working, let's try a few things to
model perspective) improve it.
6. Non-linearity So far our model has only had the ability to model straight lines, what about non-
linear (non-straight) lines?
7. Replicating non-linear We used non-linear functions to help model non-linear data, but what do these look
functions like?
8. Putting it all together with Let's put everything we've done so far for binary classi7cation together with a multi-
multi-class classi7cation class classi7cation problem.
And if you run into trouble, you can ask a question on the Discussions page there too.
There's also the PyTorch developer forums, a very helpful place for all things PyTorch.
Input layer shape Same as number of features (e.g. 5 for age, sex, height, Same as binary classi7cation
( in_features ) weight, smoking status in heart disease prediction)
Hidden layer(s) Problem speci7c, minimum = 1, maximum = unlimited Same as binary classi7cation
Output layer 1 (one class or the other) 1 per class (e.g. 3 for food,
shape person or dog photo)
( out_features )
Hidden layer Usually ReLU (recti7ed linear unit) but can be many others Same as binary classi7cation
activation
Optimizer SGD (stochastic gradient descent), Adam (see Same as binary classi7cation
torch.optim for more options)
Of course, this ingredient list of classi7cation neural network components will vary depending on the
problem you're working on.
We're going to gets hands-on with this setup throughout this notebook.
We'll use the make_circles() method from Scikit-Learn to generate two circles with different coloured
dots.
# Create circles
X, y = make_circles(n_samples,
noise=0.03, # a little bit of noise to the dots
random_state=42) # keep random state so we get the same values
First 5 X features:
[[ 0.75424625 0.23148074]
[-0.75615888 0.15325888]
[-0.81539193 0.17328203]
[-0.39373073 0.69288277]
[ 0.44220765 -0.89672343]]
First 5 y labels:
[1 1 1 1 0]
Let's keep following the data explorer's motto of visualize, visualize, visualize and put them into a pandas
DataFrame.
Out[3]:
X1 X2 label
0 0.754246 0.231481 1
1 -0.756159 0.153259 1
2 -0.815392 0.173282 1
3 -0.393731 0.692883 1
4 0.442208 -0.896723 0
5 -0.479646 0.676435 1
6 -0.013648 0.803349 1
7 0.771513 0.147760 1
8 -0.169322 -0.793456 1
9 -0.121486 1.021509 0
It looks like each pair of X features ( X1 and X2 ) has a label ( y ) value of either 0 or 1.
This tells us that our problem is binary classi7cation since there's only two options (0 or 1).
Out[4]: 1 500
0 500
Name: label, dtype: int64
Let's 7nd out how we could build a PyTorch neural network to classify dots into red (0) or blue (1).
Note: This dataset is often what's considered a toy problem (a problem that's used to try and test things
out on) in machine learning.
But it represents the major key of classi7cation, you have some kind of data represented as numerical
values and you'd like to build a model that's able to classify it, in our case, separate it into red or blue
dots.
Mismatching the shapes of tensors and tensor operations with result in errors in your models.
And there's no sure7re way to making sure they won't happen, they will.
What you can do instead is continaully familiarize yourself with the shape of the data you're working with.
Ask yourself:
It often helps to view the values and shapes of a single sample (features and labels).
Doing so will help you understand what input and output shapes you'd be expecting from your model.
Values for one sample of X: [0.75424625 0.23148074] and the same for y: 1
Shapes for one sample of X: (2,) and the same for y: ()
This tells us the second dimension for X means it has two features (vector) where as y has a single
feature (scalar).
1.2 Turn data into tensors and create train and test splits
We've investigated the input and output shapes of our data, now let's prepare it for being used with PyTorch
and for modelling.
1. Turn our data into tensors (right now our data is in NumPy arrays and PyTorch prefers to work with
PyTorch tensors).
2. Split our data into training and test sets (we'll train a model on the training set to learn the patterns
between X and y and then evaluate those learned patterns on the test dataset).
Now our data is in tensor format, let's split it into training and test sets.
We'll use test_size=0.2 (80% training, 20% testing) and because the split happens randomly across the
data, let's use random_state=42 so the split is reproducible.
Nice! Looks like we've now got 800 training samples and 200 testing samples.
2. Building a model
We've got some data ready, now it's time to build a model.
1. Setting up device agnostic code (so our model can run on CPU or GPU if it's available).
The good news is we've been through all of the above steps before in notebook 01.
Except now we'll be adjusting them so they work with a classi7cation dataset.
Let's start by importing PyTorch and torch.nn as well as setting up device agnostic code.
Out[10]: 'cuda'
Excellent, now device is setup, we can use it for any data or models we create and PyTorch will handle it
on the CPU (default) or GPU if it's available.
We'll want a model capable of handling our X data as inputs and producing something in the shape of our
y data as ouputs.
This setup where you have features and labels is referred to as supervised learning. Because your data is
telling your model what the outputs should be given a certain input.
To create such a model it'll need to handle the input and output shapes of X and y .
Remember how I said input and output shapes are important? Here we'll see why.
2. Creates 2 nn.Linear layers in the constructor capable of handling the input and output shapes of X
and y .
3. De7nes a forward() method containing the forward pass computation of the model.
Out[11]: CircleModelV0(
(layer_1): Linear(in_features=2, out_features=5, bias=True)
(layer_2): Linear(in_features=5, out_features=1, bias=True)
)
The only major change is what's happening between self.layer_1 and self.layer_2 .
self.layer_1 takes 2 input features in_features=2 and produces 5 output features out_features=5 .
This layer turns the input data from having 2 features to 5 features.
Why do this?
This allows the model to learn patterns from 5 numbers rather than just 2 numbers, potentially leading to
better outputs.
The number of hidden units you can use in neural network layers is a hyperparameter (a value you can set
yourself) and there's no set in stone value you have to use.
Generally more is better but there's also such a thing as too much. The amount you choose will depend on
your model type and dataset you're working with.
The only rule with hidden units is that the next layer, in our case, self.layer_2 has to take the same
in_features as the previous layer out_features .
That's why self.layer_2 has in_features=5 , it takes the out_features=5 from self.layer_1 and
performs a linear computation on them, turning them into out_features=1 (the same shape as y ).
A visual example of what a similar classi8ciation neural network to the one we've just built looks like. Try
create one of your own on the TensorFlow Playground website.
nn.Sequential performs a forward pass computation of the input data through the layers in the order
they appear.
model_0
Out[12]: Sequential(
(0): Linear(in_features=2, out_features=5, bias=True)
(1): Linear(in_features=5, out_features=1, bias=True)
)
Woah, that looks much simpler than subclassing nn.Module , why not just always use nn.Sequential ?
nn.Sequential is fantastic for straight-forward computations, however, as the namespace says, it always
runs in sequential order.
So if you'd something else to happen (rather than just straight-forward sequential computation) you'll want
to de7ne your own custom nn.Module subclass.
Now we've got a model, let's see what happens when we pass some data through it.
First 10 predictions:
tensor([[-0.4279],
[-0.3417],
[-0.5975],
[-0.3801],
[-0.5078],
[-0.4559],
[-0.2842],
[-0.3107],
[-0.6010],
[-0.3350]], device='cuda:0', grad_fn=<SliceBackward0>)
Hmm, it seems there's the same amount of predictions as there is test labels but the predictions don't look
like they're in the same form or shape as the test labels.
We've got a couple steps we can do to 7x this, we'll see these later on.
We've setup a loss (also called a criterion or cost function) and optimizer before in notebook 01.
For example, for a regression problem (predicting a number) you might used mean absolute error (MAE)
loss.
And for a binary classi7cation problem (like ours), you'll often use binary cross entropy as the loss function.
However, the same optimizer function can often be used across different problem spaces.
For example, the stochastic gradient descent optimizer (SGD, torch.optim.SGD() ) can be used for a
range of problems, so can too the Adam optimizer ( torch.optim.Adam() ).
Table of various loss functions and optimizers, there are more but these some common ones you'll see.
Since we're working with a binary classi7cation problem, let's use a binary cross entropy loss function.
Note: Recall a loss function is what measures how wrong your model predictions are, the higher the loss,
the worse your model.
Also, PyTorch documentation often refers to loss functions as "loss criterion" or "criterion", these are all
different ways of describing the same thing.
1. torch.nn.BCELoss() - Creates a loss function that measures the binary cross entropy between the
target (label) and input (features).
The documentation for torch.nn.BCEWithLogitsLoss() states that it's more numerically stable than
using torch.nn.BCELoss() after a nn.Sigmoid layer.
So generally, implementation 2 is a better option. However for advanced usage, you may want to separate
the combination of nn.Sigmoid and torch.nn.BCELoss() but that is beyond the scope of this notebook.
For the optimizer we'll use torch.optim.SGD() to optimize the model parameters with learning rate 0.1.
Note: There's a discussion on the PyTorch forums about the use of nn.BCELoss vs.
nn.BCEWithLogitsLoss . It can be confusing at 7rst but as with many things, it becomes easier with
practice.
# Create an optimizer
optimizer = torch.optim.SGD(params=model_0.parameters(),
lr=0.1)
An evaluation metric can be used to offer another perspective on how your model is going.
If a loss function measures how wrong your model is, I like to think of evaluation metrics as measuring how
right it is.
Of course, you could argue both of these are doing the same thing but evaluation metrics offer a different
perspective.
After all, when evaluating your models it's good to look at things from multiple points of view.
There are several evaluation metrics that can be used for classi7cation problems but let's start out with
accuracy.
Accuracy can be measured by dividing the total number of correct predictions over the total number of