0% found this document useful (0 votes)
33 views13 pages

Keras1-2 Classification

1-Introduction to Deep Learning with Keras 1.2. Classification 1.2.1. Binary Classification 1.2.2. Multi-class classification

Uploaded by

Ayşe Bat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views13 pages

Keras1-2 Classification

1-Introduction to Deep Learning with Keras 1.2. Classification 1.2.1. Binary Classification 1.2.2. Multi-class classification

Uploaded by

Ayşe Bat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

10/22/24, 10:26 PM OneNote

1.2 Classification
21 April 2024 Sunday 10:08

1-Introduction to Deep Learning with Keras


1.2. Classification
1.2.1. Binary Classification
You're now ready to learn about binary classification, so let's dive in. You will use binary
classification when you want to solve problems where you predict whether an
observation belongs to one of two possible classes. A simple binary classification
problem could be learning the boundaries to separate blue from red circles as shown in
the image.

Our dataset
The dataset for this problem is very simple. The coordinates are pairs of values
corresponding to the X and Y coordinates of each circle in the graph. The labels are 1
for red circles and 0 for the blue circles.

Pair plots
We can make use of seaborn's pairplot function to explore a small dataset and identify
whether our classification problem will be easily separable. We can get an intuition for
this if we see that the classes separate well-enough along several variables. In this
case, for the circles dataset, there is a very clear boundary: the red circles concentrate
at the center while the blue are outside. It should be easy for our network to find a way
to separate them just based on x and y coordinates.

The NN architecture
This is the neural network we will build to classify red and blue dots in our graph.We
have two neurons as an input layer, one for the x coordinate and another for the y
coordinate of each of the red and blue circles in the graph. Then we have one hidden
layer with four neurons. Four is a good enough number to learn the separation of
classes in this dataset. This was found by experimentation.We finally end up with a
single output neuron which makes use of the sigmoid activation function. It's important
to note that, regardless of the activation functions used for the previous layers, we do
need the sigmoid activation function for this last output node.

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 1/13


10/22/24, 10:26 PM OneNote

The sigmoid function


The sigmoid activation function squashes the neuron output of the second to last layer
to a floating point number between 0 and 1.

You can consider the output of the sigmoid function as the probability of a pair of
coordinates being in one class or another. So we can set a threshold and say
everything below 0.5 will be a blue circle and everything above a red one.

Let's build it
So let's build our model in keras: We start by importing the sequential model and the
dense layer. We then instantiate a sequential model. We add a hidden layer of 4
neurons and we define an input shape, which consists of 2 neurons. We use the tanh as
the activation function, for this hidden layer. Activation functions are covered later in the
course, so don't worry about this choice for now. We finally add an output layer which
contains a single neuron, we make use of the sigmoid activation function so that we
achieve the behavior we expect from this network, that is obtaining a value between 0
and 1. Our model is now ready to be trained.

Compiling, training, predicting


Just as before, we need to compile our model before training. We will use stochastic
gradient descent as an optimizer and binary cross-entropy as our loss function. Binary
cross-entropy is the function we use when our output neuron is using sigmoid as its
activation function. We train our model for 20 epochs passing our coordinates and
labels as parameters. Then, we obtain the predicted labels by calling predict on
coordinates

Results
These are boundaries that were learned to classify our circles. It looks like our model
did pretty well!

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 2/13


10/22/24, 10:26 PM OneNote

Exercise: Exploring dollar bills


You will practice building classification models in Keras with the Banknote
Authentication dataset.
Your goal is to distinguish between real and fake dollar bills. In order to do this, the
dataset comes with 4 features: variance,skewness,kurtosis and entropy. These features
are calculated by applying mathematical operations over the dollar bill images. The
labels are found in the dataframe's class column.
A pandas DataFrame named banknotes is ready to use, let's do some data exploration!

Exercise: A binary classification model


Now that you know what the Banknote Authentication dataset looks like, we'll build a
simple model to distinguish between real and fake bills.
You will perform binary classification by using a single neuron as an output. The input
layer will have 4 neurons since we have 4 features in our dataset. The model's output
will be a value constrained between 0 and 1.
We will interpret this output number as the probability of our input variables coming from
a fake dollar bill, with 1 meaning we are certain it's a fake bill.

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 3/13


10/22/24, 10:26 PM OneNote

Is this dollar bill fake ?


You are now ready to train your model and check how well it performs when classifying
new bills! The dataset has already been partitioned into features: X_train & X_test, and
labels: y_train & y_test.

1.2.2. Multi-class classification


What about when we have more than two classes to classify? We run into a multi-class
classification problem, but don't worry, we just have to make a minor tweak to our neural
network architecture.

Throwing darts: Identifying who threw which dart in a game of darts is a good
example of a multi-class classification problem. Each dart can only be thrown by one
competitor. And that means our classes are mutually exclusive since no dart can be
thrown by two different competitors simultaneously.

The dataset: The darts dataset consist of dart throws by different competitors. The
coordinate pairs xCoord and yCoord show where each dart landed. Based on the
landing position of previously thrown darts we should be able to distinguish between
throwers if there's enough variation between their throws. In our pairplot we can see that
different players tend to aim at specific regions of the board.

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 4/13


10/22/24, 10:26 PM OneNote

The architecture
The model for this dataset has two neurons as inputs, since our predictors are xCoord
and yCoord. We will define them using the input_shape argument, just as we've done
before. In between there will be a series of hidden layers, we are using 3 Dense layers
of 128, 64 and 32 neurons each. As outputs we have 4 neurons, one per competitor.
Let's look closer at the output layer now.

The output layer


We have 4 outputs, each linked to a possible competitor. Each competitor has a
probability of having thrown a given dart, so we must make sure the total sum of
probabilities for the output neurons equals one. We achieve this with the softmax
activation function. Once we have a probability per output neuron we then choose as
our prediction the competitor whose associated output has the highest probability.

Multi-class model
You can build this model as we did in the previous lesson; instantiate a sequential
model, add a hidden layer, also defining an input layer with the input_shape
parameter,and finish by adding the remaining hidden layers and an output layer with
softmax activation. You will do all this yourself in the exercises.

Categorical cross-entropy
When compiling your model, instead of binary cross-entropy as we used before, we now
use categorical cross-entropy or log loss. Categorical cross-entropy measures the
difference between the predicted probabilities and the true label of the class we should
have predicted. So if we should have predicted 1 for a given class, taking a look at the
graph we see we would get high loss values for predicting close to 0 (since we'd be very
wrong) and low loss values for predicting closer to 1 (the true label).

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 5/13


10/22/24, 10:26 PM OneNote

Preparing a dataset
Since our outputs are vectors containing the probabilities of each class, our neural
network must also be trained with vectors representing this concept. To achieve that we
make use of the tensorflow.keras.utils to_categorical function. We first turn our
response variable into a categorical variable with pandas Categorical, this allows us to
redefine the column using the categorical codes (cat codes) of the different categories.
Now that our categories are each represented by a unique integer, we can use the
to_categorical function to turn them into one-hot encoded vectors, where each
component is 0 except for the one corresponding to the labeled categories

One-hot encoding
Keras to_categorical essentially perform the process described in the picture above.
Label encoded Apple, Chicken and Broccoli turn into a vector of 3 components. A 1 is
placed to represent the presence of the class and a 0 to indicate its absence

Exercise: A multi-class model


You're going to build a model that predicts who threw which dart only based on where
that dart landed! (That is the dart's x and y coordinates on the board.)
This problem is a multi-class classification problem since each dart can only be thrown
by one of 4 competitors. So classes/labels are mutually exclusive, and therefore we can
build a neuron with as many output as competitors and use the softmax activation
function to achieve a total sum of probabilities of 1 over all competitors.

Prepare your dataset


In the console you can check that your labels, darts.competitor are not yet in a format to
be understood by your network. They contain the names of the competitors as strings.
You will first turn these competitors into unique numbers,then use
the to_categorical() function from keras.utils to turn these numbers into their one-hot
encoded representation.
This is useful for multi-class classification problems, since there are as many output
neurons as classes and for every observation in our dataset we just want one of the
neurons to be activated.
The dart's dataset is loaded as darts. Pandas is imported as pd. Let's prepare this
dataset!

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 6/13


10/22/24, 10:26 PM OneNote

Training on dart throwers


Your model is now ready, just as your dataset. It's time to train!
The coordinates features and competitors labels you just transformed have been
partitioned into coord_train,coord_test and competitors_train,competitors_test.
Your model is also loaded. Feel free to visualize your training data or model.summary() in
the console.
Let's find out who threw which dart just by looking at the board!

Softmax predictions
Your recently trained model is loaded for you. This model is generalizing well!, that's why
you got a high accuracy on the test set.
Since you used the softmax activation function, for every input of 2 coordinates provided
to your model there's an output vector of 4 numbers. Each of these numbers encodes
the probability of a given dart being thrown by one of the 4 possible competitors.
When computing accuracy with the model's .evaluate() method, your model takes the
class with the highest probability as the prediction. np.argmax() can help you do this
since it returns the index with the highest value in an array.
Use the collection of test throws stored in coords_small_test and np.argmax()to check
this out!

1.2.3. Multi-label classification

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 7/13


10/22/24, 10:26 PM OneNote
Now that you know how multi-class classification works, we can take a look at multi-
label classification. They both deal with predicting classes, but in multi-label
classification, a single input can be assigned to more than one class.

Real world examples


We could use multi-label classification, for instance, to tag a serie’s genres by its plot
summary. Making models that deal with text or images is not covered in this course in
depth. We will learn more about it on chapter 4.

Multi-class vs multi-label
Imagine we had three classes; sun, moon and clouds. In multi-class problems if we took
a sample of our observations each individual in the sample will belong to a unique class.
However, in a multi-label problem each individual in the sample can have all, none or a
subset of the available classes. As you can see in the image, multi-label vectors are
also one-hot encoded, there's a 1 or a 0 representing the presence or absence of each
class.

The architecture
Making a multi-label model for this problem is not that different to what you did when
building your multi-class model. We first instantiate a sequential model. For the sake of
this example, we will assume that to differentiate between these 3 classes, we need just
one input and 2 hidden neurons. The biggest changes happen in the output layer and in
its activation function. In the output layer, we use as many neurons as possible classes
but we use sigmoid activation this time

Sigmoid outputs
We use sigmoid outputs because we no longer care about the sum of probabilities. We
want each output neuron to be able to individually take a value between 0 and 1. This
can be achieved with the sigmoid activation because it constrains our neuron output in
the range 0-1. That's what we did in binary classification, though we only had one output
neuron there.

Compile and train


Binary cross-entropy is now used as the loss function when compiling the model. You
can look at it as if you were performing several binary classification problems: for each
output we are deciding whether or not its corresponding label is present given the
current input. When training our model we can use the validation_split argument to print
validation loss and accuracy as it trains. By using validation_split, a percentage of
training data is left out for testing at each epoch.

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 8/13


10/22/24, 10:26 PM OneNote

An advantage
You can see how using neural networks for multi-label classification can be performed
with minor tweaks to our model architecture. If we were to use a classical machine
learning approach to solve multi-label problems we would need more complex methods.
One way to do so consists of training several classifiers to distinguish each particular
class from the rest. This is called one versus rest classification.

An irrigation machine
Let's tackle a new problem. A farm field has an array of 20 sensors distributed along 3
crop fields. These sensors measure, among other things, the humidity of the soil,
radiation of the sun, etc. Your task is to use the combination of measurements from
these sensors to decide which parcels to water, given each parcel has different
environmental requirements.

Each sensor measures an integer value between 0 and 13 volts. Parcels can be
represented as one-hot encoded vectors of length 3, where each index is one of the
parcels. Parcels can be watered simultaneously.

Exercise: An irrigation machine

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 9/13


10/22/24, 10:26 PM OneNote
You're going to automate the watering of farm parcels by making an intelligent irrigation
machine. Multi-label classification problems differ from multi-class problems in that each
observation can be labeled with zero or more classes. So classes/labels are not
mutually exclusive, you could water all, none or any combination of farm parcels based
on the inputs.
To account for this behavior what we do is have an output layer with as many neurons
as classes but this time, unlike in multi-class problems, each output neuron has
a sigmoid activation function. This makes each neuron in the output layer able to output
a number between 0 and 1 independently.
The Sequential() model and Dense() layers are ready to be used. It's time to build an
intelligent irrigation machine!

Training with multiple labels


An output of your multi-label model could look like this: [0.76 , 0.99 , 0.66 ]. If we round
up probabilities higher than 0.5, this observation will be classified as containing all 3
possible labels [1,1,1]. For this particular problem, this would mean watering all 3
parcels in your farm is the right thing to do, according to the network, given the input
sensor measurements.
You will now train and predict with the model you just
built. sensors_train, parcels_train, sensors_test and parcels_test are already loaded for
you to use.
Let's see how well your intelligent machine performs!

1.2.4. Keras callbacks


By now you've trained a lot of models. It's time to learn more about how to better control
and supervise model training by using callbacks.

What is a callback: A callback is a function that is executed after some other function,
event, or task has finished. For instance, when you touch your phone screen, a block of
code that identifies the type of gesture will be triggered. Since this block of code has
been called after the touching event occurred, it's a callback.

Callbacks in Keras
In the same way, a keras callback is a block of code that gets executed after each
epoch during training or after the training is finished. They are useful to store metrics as
the model trains and to make decisions as the training goes by

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C… 10/13


10/22/24, 10:26 PM OneNote

A callback you've been missing


Every time you call the fit method on a keras model there's a callback object that gets
returned after the model finishes training. This is the history object. Accessing the
history attribute, which is a python dictionary, we can check the saved metrics of the
model during training as an array of numbers.

To get the most out of the history object we should use the validation_data parameter in
our fit method, passing X_test and Y_test as a tuple. The validation_split parameter can
be used instead, specifying a percentage of the training data that will be left out for
testing purposes. That way we not only have the training metrics but also the validation
metrics.

History plots
You can compare training and validation metrics with a few matplotlib commands. We
just need to define a figure. Plot the values of the history attribute for the training
accuracy and the validation accuracy. We can then make our graph prettier by adding a
title, axis labels and a legend.

We can see our model accuracy increases for both training and test sets till it reaches
epoch 25. Then accuracy flattens for the test set whilst the training keeps improving.
Overfitting it's taking place since we see the training accuracy keeps improving whilst
the test data decreases in accuracy. More on this in the next chapter.

Early stopping
Early stopping a model can solve the overfitting problem. Since it stops its training when
it no longer improves. This is extremely useful since deep neural models can take a
long time to train and we don't know beforehand how many epochs will be needed.
Early stopping, like other keras callbacks can be imported from
tensorflow.keras.callbacks. We then need to instantiate it. The early stopping callback
can monitor several metrics, like validation accuracy, validation loss, etc. These can be
specified in the monitor parameter. It's also important to define a patience argument,
that is the number of epochs to wait for the model to improve before stopping it's
training. There's no rules to decide which patience number works best at all times,this
depends on the implementation. It's good to avoid low values, that way your model has
a chance to improve at a later epoch.The callback is passed as a list to the callbacks
parameter in the model fit method.

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C… 11/13


10/22/24, 10:26 PM OneNote

Model checkpoint
The model checkpoint callback can also be imported from keras callbacks. This callback
allows us to save our model as it trains. We specify the model filename with a name and
the .hdf5 extension. You can also decide what to monitor to determine which model is
best with the monitor parameter, by default validation loss is monitored. Setting the
save_best_only parameter to True guarantees that the latest best model according to
the quantity monitored will not be overwritten.

Exercise: The history callback


The history callback is returned by default every time you train a model with the .fit()
method. To access these metrics you can access the history dictionary parameter
inside the returned h_callback object with the corresponding keys.
The irrigation machine model you built in the previous lesson is loaded for you to train,
along with its features and labels now loaded as X_train, y_train, X_test, y_test. This
time you will store the model's historycallback and use the validation_data parameter as
it trains.
You will plot the results stored in history with plot_accuracy() and plot_loss(), two simple
matplotlib functions. You can check their code in the console by pasting
show_code(plot_loss).
Let's see the behind the scenes of our training!

Early stopping your model


The early stopping callback is useful since it allows for you to stop the model training if it
no longer improves after a given number of epochs. To make use of this functionality
you need to pass the callback inside a list to the model's callback parameter in
the .fit() method.
The model you built to detect fake dollar bills is loaded for you to train, this time with
early stopping. X_train, y_train, X_test and y_test are also available for your use.

A combination of callbacks
Deep learning models can take a long time to train, especially when you move to
deeper architectures and bigger datasets. Saving your model every time it improves as
well as stopping it when it no longer does allows you to worry less about choosing the
number of epochs to train for. You can also restore a saved model anytime and resume
training where you left it.
The model training and validation data are available in your workspace
as X_train, X_test, y_train, and y_test.
Use the EarlyStopping() and the ModelCheckpoint() callbacks so that you can go eat a
jar of cookies while you leave your computer to work!

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C… 12/13


10/22/24, 10:26 PM OneNote

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C… 13/13

You might also like