0% found this document useful (0 votes)
28 views

PyTorch Neural Network Classifcation

The document discusses building a neural network classifier in PyTorch for a binary classification problem. It begins by generating circle data with two classes labeled 0 and 1 using Scikit-Learn. This data is visualized and prepared. The document then outlines the typical architecture of a classification neural network, including the input and output layers, hidden layers, activations, loss functions, and optimizers. It states that it will cover getting classification data ready, building a PyTorch model, training and evaluating the model, improving the model, adding non-linearity, and putting it all together for a multi-class problem. Building familiarity with input and output shapes is emphasized to avoid common errors.

Uploaded by

Kimpton Mukuwiri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

PyTorch Neural Network Classifcation

The document discusses building a neural network classifier in PyTorch for a binary classification problem. It begins by generating circle data with two classes labeled 0 and 1 using Scikit-Learn. This data is visualized and prepared. The document then outlines the typical architecture of a classification neural network, including the input and output layers, hidden layers, activations, loss functions, and optimizers. It states that it will cover getting classification data ready, building a PyTorch model, training and evaluating the model, improving the model, adding non-linearity, and putting it all together for a multi-class problem. Building familiarity with input and output shapes is emphasized to avoid common errors.

Uploaded by

Kimpton Mukuwiri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Zero to Mastery Learn PyTorch for Deep Learning

Open in Colab

View Source Code | View Slides | Watch Video Walkthrough

02. PyTorch Neural Network Classi7cation

What is a classi7cation problem?


A classi7cation problem involves predicting whether something is one thing or another.

For example, you might want to:

Problem type What is it? Example

Binary Target can be one of two Predict whether or not someone has heart disease based on their
classi7cation options, e.g. yes or no health parameters.

Multi-class Target can be one of more Decide whether a photo of is of food, a person or a dog.
classi7cation than two options

Multi-label Target can be assigned Predict what categories should be assigned to a Wikipedia article
classi7cation more than one option (e.g. mathematics, science & philosohpy).

Classi7cation, along with regression (predicting a number, covered in notebook 01) is one of the most
common types of machine learning problems.

In this notebook, we're going to work through a couple of different classi7cation problems with PyTorch.

In other words, taking a set of inputs and predicting what class those set of inputs belong to.

What we're going to cover


In this notebook we're going to reiterate over the PyTorch workNow we coverd in 01. PyTorch WorkNow.

Except instead of trying to predict a straight line (predicting a number, also called a regression problem),
we'll be working on a classi7cation problem.

Speci7cally, we're going to cover:

Topic Contents

0. Architecture of a Neural networks can come in almost any shape or size, but they typically follow a
classi7cation neural network similar Noor plan.

1. Getting binary classi7cation Data can be almost anything but to get started we're going to create a simple binary
data ready classi7cation dataset.

2. Building a PyTorch Here we'll create a model to learn patterns in the data, we'll also choose a loss
classi7cation model function, optimizer and build a training loop speci7c to classi7cation.

3. Fitting the model to data We've got data and a model, now let's let the model (try to) 7nd patterns in the
(training) (training) data.

4. Making predictions and Our model's found patterns in the data, let's compare its 7ndings to the actual
evaluating a model (inference) (testing) data.

5. Improving a model (from a We've trained an evaluated a model but it's not working, let's try a few things to
model perspective) improve it.

6. Non-linearity So far our model has only had the ability to model straight lines, what about non-
linear (non-straight) lines?

7. Replicating non-linear We used non-linear functions to help model non-linear data, but what do these look
functions like?

8. Putting it all together with Let's put everything we've done so far for binary classi7cation together with a multi-
multi-class classi7cation class classi7cation problem.

Where can you get help?


All of the materials for this course live on GitHub.

And if you run into trouble, you can ask a question on the Discussions page there too.

There's also the PyTorch developer forums, a very helpful place for all things PyTorch.

0. Architecture of a classi7cation neural network


Before we get into writing code, let's look at the general architecture of a classi7cation neural network.

Hyperparameter Binary Classi7cation Multiclass classi7cation

Input layer shape Same as number of features (e.g. 5 for age, sex, height, Same as binary classi7cation
( in_features ) weight, smoking status in heart disease prediction)

Hidden layer(s) Problem speci7c, minimum = 1, maximum = unlimited Same as binary classi7cation

Neurons per Problem speci7c, generally 10 to 512 Same as binary classi7cation


hidden layer

Output layer 1 (one class or the other) 1 per class (e.g. 3 for food,
shape person or dog photo)
( out_features )

Hidden layer Usually ReLU (recti7ed linear unit) but can be many others Same as binary classi7cation
activation

Output activation Sigmoid ( torch.sigmoid in PyTorch) Softmax ( torch.softmax in


PyTorch)

Loss function Binary crossentropy ( torch.nn.BCELoss in PyTorch) Cross entropy


( torch.nn.CrossEntropyLoss
in PyTorch)

Optimizer SGD (stochastic gradient descent), Adam (see Same as binary classi7cation
torch.optim for more options)

Of course, this ingredient list of classi7cation neural network components will vary depending on the
problem you're working on.

But it's more than enough to get started.

We're going to gets hands-on with this setup throughout this notebook.

1. Make classi7cation data and get it ready


Let's begin by making some data.

We'll use the make_circles() method from Scikit-Learn to generate two circles with different coloured
dots.

In [1]: from sklearn.datasets import make_circles

# Make 1000 samples


n_samples = 1000

# Create circles
X, y = make_circles(n_samples,
noise=0.03, # a little bit of noise to the dots
random_state=42) # keep random state so we get the same values

Alright, now let's view the 7rst 5 X and y values.

In [2]: print(f"First 5 X features:\n{X[:5]}")


print(f"\nFirst 5 y labels:\n{y[:5]}")

First 5 X features:
[[ 0.75424625 0.23148074]
[-0.75615888 0.15325888]
[-0.81539193 0.17328203]
[-0.39373073 0.69288277]
[ 0.44220765 -0.89672343]]

First 5 y labels:
[1 1 1 1 0]

Looks like there's two X values per one y value.

Let's keep following the data explorer's motto of visualize, visualize, visualize and put them into a pandas
DataFrame.

In [3]: # Make DataFrame of circle data


import pandas as pd
circles = pd.DataFrame({"X1": X[:, 0],
"X2": X[:, 1],
"label": y
})
circles.head(10)

Out[3]:
X1 X2 label

0 0.754246 0.231481 1

1 -0.756159 0.153259 1

2 -0.815392 0.173282 1

3 -0.393731 0.692883 1

4 0.442208 -0.896723 0

5 -0.479646 0.676435 1

6 -0.013648 0.803349 1

7 0.771513 0.147760 1

8 -0.169322 -0.793456 1

9 -0.121486 1.021509 0

It looks like each pair of X features ( X1 and X2 ) has a label ( y ) value of either 0 or 1.

This tells us that our problem is binary classi7cation since there's only two options (0 or 1).

How many values of each class is there?

In [4]: # Check different labels


circles.label.value_counts()

Out[4]: 1 500
0 500
Name: label, dtype: int64

500 each, nice and balanced.

Let's plot them.

In [5]: # Visualize with a plot


import matplotlib.pyplot as plt
plt.scatter(x=X[:, 0],
y=X[:, 1],
c=y,
cmap=plt.cm.RdYlBu);

Alrighty, looks like we've got a problem to solve.

Let's 7nd out how we could build a PyTorch neural network to classify dots into red (0) or blue (1).

Note: This dataset is often what's considered a toy problem (a problem that's used to try and test things
out on) in machine learning.

But it represents the major key of classi7cation, you have some kind of data represented as numerical
values and you'd like to build a model that's able to classify it, in our case, separate it into red or blue
dots.

1.1 Input and output shapes

One of the most common errors in deep learning is shape errors.

Mismatching the shapes of tensors and tensor operations with result in errors in your models.

We're going to see plenty of these throughout the course.

And there's no sure7re way to making sure they won't happen, they will.

What you can do instead is continaully familiarize yourself with the shape of the data you're working with.

I like referring to it as input and output shapes.

Ask yourself:

"What shapes are my inputs and what shapes are my outputs?"

Let's 7nd out.

In [6]: # Check the shapes of our features and labels


X.shape, y.shape

Out[6]: ((1000, 2), (1000,))

Looks like we've got a match on the 7rst dimension of each.

There's 1000 X and 1000 y .

But what's the second dimension on X ?

It often helps to view the values and shapes of a single sample (features and labels).

Doing so will help you understand what input and output shapes you'd be expecting from your model.

In [7]: # View the first example of features and labels


X_sample = X[0]
y_sample = y[0]
print(f"Values for one sample of X: {X_sample} and the same for y: {y_sample}")
print(f"Shapes for one sample of X: {X_sample.shape} and the same for y: {y_sample.shape}"

Values for one sample of X: [0.75424625 0.23148074] and the same for y: 1
Shapes for one sample of X: (2,) and the same for y: ()

This tells us the second dimension for X means it has two features (vector) where as y has a single
feature (scalar).

We have two inputs for one output.

1.2 Turn data into tensors and create train and test splits

We've investigated the input and output shapes of our data, now let's prepare it for being used with PyTorch
and for modelling.

Speci7cally, we'll need to:

1. Turn our data into tensors (right now our data is in NumPy arrays and PyTorch prefers to work with
PyTorch tensors).

2. Split our data into training and test sets (we'll train a model on the training set to learn the patterns
between X and y and then evaluate those learned patterns on the test dataset).

In [8]: # Turn data into tensors


# Otherwise this causes issues with computations later on
import torch
X = torch.from_numpy(X).type(torch.float)
y = torch.from_numpy(y).type(torch.float)

# View the first five samples


X[:5], y[:5]

Out[8]: (tensor([[ 0.7542, 0.2315],


[-0.7562, 0.1533],
[-0.8154, 0.1733],
[-0.3937, 0.6929],
[ 0.4422, -0.8967]]),
tensor([1., 1., 1., 1., 0.]))

Now our data is in tensor format, let's split it into training and test sets.

To do so, let's use the helpful function train_test_split() from Scikit-Learn.

We'll use test_size=0.2 (80% training, 20% testing) and because the split happens randomly across the
data, let's use random_state=42 so the split is reproducible.

In [9]: # Split data into train and test sets


from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,


y,
test_size=0.2, # 20% test, 80% train
random_state=42) # make the random spl

len(X_train), len(X_test), len(y_train), len(y_test)

Out[9]: (800, 200, 800, 200)

Nice! Looks like we've now got 800 training samples and 200 testing samples.

2. Building a model
We've got some data ready, now it's time to build a model.

We'll break it down into a few parts.

1. Setting up device agnostic code (so our model can run on CPU or GPU if it's available).

2. Constructing a model by subclassing nn.Module .

3. De7ning a loss function and optimizer.

4. Creating a training loop (this'll be in the next section).

The good news is we've been through all of the above steps before in notebook 01.

Except now we'll be adjusting them so they work with a classi7cation dataset.

Let's start by importing PyTorch and torch.nn as well as setting up device agnostic code.

In [10]: # Standard PyTorch imports


import torch
from torch import nn

# Make device agnostic code


device = "cuda" if torch.cuda.is_available() else "cpu"
device

Out[10]: 'cuda'

Excellent, now device is setup, we can use it for any data or models we create and PyTorch will handle it
on the CPU (default) or GPU if it's available.

How about we create a model?

We'll want a model capable of handling our X data as inputs and producing something in the shape of our
y data as ouputs.

In other words, given X (features) we want our model to predict y (label).

This setup where you have features and labels is referred to as supervised learning. Because your data is
telling your model what the outputs should be given a certain input.

To create such a model it'll need to handle the input and output shapes of X and y .

Remember how I said input and output shapes are important? Here we'll see why.

Let's create a model class that:

1. Subclasses nn.Module (almost all PyTorch models are subclasses of nn.Module ).

2. Creates 2 nn.Linear layers in the constructor capable of handling the input and output shapes of X
and y .

3. De7nes a forward() method containing the forward pass computation of the model.

4. Instantiates the model class and sends it to the target device .

In [11]: # 1. Construct a model class that subclasses nn.Module


class CircleModelV0(nn.Module):
def __init__(self):
super().__init__()
# 2. Create 2 nn.Linear layers capable of handling X and y input and output shapes
self.layer_1 = nn.Linear(in_features=2, out_features=5) # takes in 2 features (X),
self.layer_2 = nn.Linear(in_features=5, out_features=1) # takes in 5 features, pro

# 3. Define a forward method containing the forward pass computation


def forward(self, x):
# Return the output of layer_2, a single feature, the same shape as y
return self.layer_2(self.layer_1(x)) # computation goes through layer_1 first then

# 4. Create an instance of the model and send it to target device


model_0 = CircleModelV0().to(device)
model_0

Out[11]: CircleModelV0(
(layer_1): Linear(in_features=2, out_features=5, bias=True)
(layer_2): Linear(in_features=5, out_features=1, bias=True)
)

What's going on here?

We've seen a few of these steps before.

The only major change is what's happening between self.layer_1 and self.layer_2 .

self.layer_1 takes 2 input features in_features=2 and produces 5 output features out_features=5 .

This is known as having 5 hidden units or neurons.

This layer turns the input data from having 2 features to 5 features.

Why do this?

This allows the model to learn patterns from 5 numbers rather than just 2 numbers, potentially leading to
better outputs.

I say potentially because sometimes it doesn't work.

The number of hidden units you can use in neural network layers is a hyperparameter (a value you can set
yourself) and there's no set in stone value you have to use.

Generally more is better but there's also such a thing as too much. The amount you choose will depend on
your model type and dataset you're working with.

Since our dataset is small and simple, we'll keep it small.

The only rule with hidden units is that the next layer, in our case, self.layer_2 has to take the same
in_features as the previous layer out_features .

That's why self.layer_2 has in_features=5 , it takes the out_features=5 from self.layer_1 and
performs a linear computation on them, turning them into out_features=1 (the same shape as y ).

A visual example of what a similar classi8ciation neural network to the one we've just built looks like. Try
create one of your own on the TensorFlow Playground website.

You can also do the same as above using nn.Sequential .

nn.Sequential performs a forward pass computation of the input data through the layers in the order
they appear.

In [12]: # Replicate CircleModelV0 with nn.Sequential


model_0 = nn.Sequential(
nn.Linear(in_features=2, out_features=5),
nn.Linear(in_features=5, out_features=1)
).to(device)

model_0

Out[12]: Sequential(
(0): Linear(in_features=2, out_features=5, bias=True)
(1): Linear(in_features=5, out_features=1, bias=True)
)

Woah, that looks much simpler than subclassing nn.Module , why not just always use nn.Sequential ?

nn.Sequential is fantastic for straight-forward computations, however, as the namespace says, it always
runs in sequential order.

So if you'd something else to happen (rather than just straight-forward sequential computation) you'll want
to de7ne your own custom nn.Module subclass.

Now we've got a model, let's see what happens when we pass some data through it.

In [13]: # Make predictions with the model


untrained_preds = model_0(X_test.to(device))
print(f"Length of predictions: {len(untrained_preds)}, Shape: {untrained_preds.shape}")
print(f"Length of test samples: {len(y_test)}, Shape: {y_test.shape}")
print(f"\nFirst 10 predictions:\n{untrained_preds[:10]}")
print(f"\nFirst 10 test labels:\n{y_test[:10]}")

Length of predictions: 200, Shape: torch.Size([200, 1])


Length of test samples: 200, Shape: torch.Size([200])

First 10 predictions:
tensor([[-0.4279],
[-0.3417],
[-0.5975],
[-0.3801],
[-0.5078],
[-0.4559],
[-0.2842],
[-0.3107],
[-0.6010],
[-0.3350]], device='cuda:0', grad_fn=<SliceBackward0>)

First 10 test labels:


tensor([1., 0., 1., 0., 1., 1., 0., 0., 1., 0.])

Hmm, it seems there's the same amount of predictions as there is test labels but the predictions don't look
like they're in the same form or shape as the test labels.

We've got a couple steps we can do to 7x this, we'll see these later on.

2.1 Setup loss function and optimizer

We've setup a loss (also called a criterion or cost function) and optimizer before in notebook 01.

But different problem types require different loss functions.

For example, for a regression problem (predicting a number) you might used mean absolute error (MAE)
loss.

And for a binary classi7cation problem (like ours), you'll often use binary cross entropy as the loss function.

However, the same optimizer function can often be used across different problem spaces.

For example, the stochastic gradient descent optimizer (SGD, torch.optim.SGD() ) can be used for a
range of problems, so can too the Adam optimizer ( torch.optim.Adam() ).

Loss function/Optimizer Problem type PyTorch Code

Stochastic Gradient Descent (SGD) Classi7cation, regression, torch.optim.SGD()


optimizer many others.

Adam Optimizer Classi7cation, regression, torch.optim.Adam()


many others.

Binary cross entropy loss Binary classi7cation torch.nn.BCELossWithLogits or


torch.nn.BCELoss

Cross entropy loss Mutli-class classi7cation torch.nn.CrossEntropyLoss

Mean absolute error (MAE) or L1 Regression torch.nn.L1Loss


Loss

Mean squared error (MSE) or L2 Regression torch.nn.MSELoss


Loss

Table of various loss functions and optimizers, there are more but these some common ones you'll see.

Since we're working with a binary classi7cation problem, let's use a binary cross entropy loss function.

Note: Recall a loss function is what measures how wrong your model predictions are, the higher the loss,
the worse your model.

Also, PyTorch documentation often refers to loss functions as "loss criterion" or "criterion", these are all
different ways of describing the same thing.

PyTorch has two binary cross entropy implementations:

1. torch.nn.BCELoss() - Creates a loss function that measures the binary cross entropy between the
target (label) and input (features).

2. torch.nn.BCEWithLogitsLoss() - This is the same as above except it has a sigmoid layer


( nn.Sigmoid ) built-in (we'll see what this means soon).

Which one should you use?

The documentation for torch.nn.BCEWithLogitsLoss() states that it's more numerically stable than
using torch.nn.BCELoss() after a nn.Sigmoid layer.

So generally, implementation 2 is a better option. However for advanced usage, you may want to separate
the combination of nn.Sigmoid and torch.nn.BCELoss() but that is beyond the scope of this notebook.

Knowing this, let's create a loss function and an optimizer.

For the optimizer we'll use torch.optim.SGD() to optimize the model parameters with learning rate 0.1.

Note: There's a discussion on the PyTorch forums about the use of nn.BCELoss vs.
nn.BCEWithLogitsLoss . It can be confusing at 7rst but as with many things, it becomes easier with
practice.

In [14]: # Create a loss function


# loss_fn = nn.BCELoss() # BCELoss = no sigmoid built-in
loss_fn = nn.BCEWithLogitsLoss() # BCEWithLogitsLoss = sigmoid built-in

# Create an optimizer
optimizer = torch.optim.SGD(params=model_0.parameters(),
lr=0.1)

Now let's also create an evaluation metric.

An evaluation metric can be used to offer another perspective on how your model is going.

If a loss function measures how wrong your model is, I like to think of evaluation metrics as measuring how
right it is.

Of course, you could argue both of these are doing the same thing but evaluation metrics offer a different
perspective.

After all, when evaluating your models it's good to look at things from multiple points of view.

There are several evaluation metrics that can be used for classi7cation problems but let's start out with
accuracy.

Accuracy can be measured by dividing the total number of correct predictions over the total number of

You might also like