Keras1-2 Classification
Keras1-2 Classification
1.2 Classification
21 April 2024 Sunday 10:08
Our dataset
The dataset for this problem is very simple. The coordinates are pairs of values
corresponding to the X and Y coordinates of each circle in the graph. The labels are 1
for red circles and 0 for the blue circles.
Pair plots
We can make use of seaborn's pairplot function to explore a small dataset and identify
whether our classification problem will be easily separable. We can get an intuition for
this if we see that the classes separate well-enough along several variables. In this
case, for the circles dataset, there is a very clear boundary: the red circles concentrate
at the center while the blue are outside. It should be easy for our network to find a way
to separate them just based on x and y coordinates.
The NN architecture
This is the neural network we will build to classify red and blue dots in our graph.We
have two neurons as an input layer, one for the x coordinate and another for the y
coordinate of each of the red and blue circles in the graph. Then we have one hidden
layer with four neurons. Four is a good enough number to learn the separation of
classes in this dataset. This was found by experimentation.We finally end up with a
single output neuron which makes use of the sigmoid activation function. It's important
to note that, regardless of the activation functions used for the previous layers, we do
need the sigmoid activation function for this last output node.
You can consider the output of the sigmoid function as the probability of a pair of
coordinates being in one class or another. So we can set a threshold and say
everything below 0.5 will be a blue circle and everything above a red one.
Let's build it
So let's build our model in keras: We start by importing the sequential model and the
dense layer. We then instantiate a sequential model. We add a hidden layer of 4
neurons and we define an input shape, which consists of 2 neurons. We use the tanh as
the activation function, for this hidden layer. Activation functions are covered later in the
course, so don't worry about this choice for now. We finally add an output layer which
contains a single neuron, we make use of the sigmoid activation function so that we
achieve the behavior we expect from this network, that is obtaining a value between 0
and 1. Our model is now ready to be trained.
Results
These are boundaries that were learned to classify our circles. It looks like our model
did pretty well!
Throwing darts: Identifying who threw which dart in a game of darts is a good
example of a multi-class classification problem. Each dart can only be thrown by one
competitor. And that means our classes are mutually exclusive since no dart can be
thrown by two different competitors simultaneously.
The dataset: The darts dataset consist of dart throws by different competitors. The
coordinate pairs xCoord and yCoord show where each dart landed. Based on the
landing position of previously thrown darts we should be able to distinguish between
throwers if there's enough variation between their throws. In our pairplot we can see that
different players tend to aim at specific regions of the board.
The architecture
The model for this dataset has two neurons as inputs, since our predictors are xCoord
and yCoord. We will define them using the input_shape argument, just as we've done
before. In between there will be a series of hidden layers, we are using 3 Dense layers
of 128, 64 and 32 neurons each. As outputs we have 4 neurons, one per competitor.
Let's look closer at the output layer now.
Multi-class model
You can build this model as we did in the previous lesson; instantiate a sequential
model, add a hidden layer, also defining an input layer with the input_shape
parameter,and finish by adding the remaining hidden layers and an output layer with
softmax activation. You will do all this yourself in the exercises.
Categorical cross-entropy
When compiling your model, instead of binary cross-entropy as we used before, we now
use categorical cross-entropy or log loss. Categorical cross-entropy measures the
difference between the predicted probabilities and the true label of the class we should
have predicted. So if we should have predicted 1 for a given class, taking a look at the
graph we see we would get high loss values for predicting close to 0 (since we'd be very
wrong) and low loss values for predicting closer to 1 (the true label).
Preparing a dataset
Since our outputs are vectors containing the probabilities of each class, our neural
network must also be trained with vectors representing this concept. To achieve that we
make use of the tensorflow.keras.utils to_categorical function. We first turn our
response variable into a categorical variable with pandas Categorical, this allows us to
redefine the column using the categorical codes (cat codes) of the different categories.
Now that our categories are each represented by a unique integer, we can use the
to_categorical function to turn them into one-hot encoded vectors, where each
component is 0 except for the one corresponding to the labeled categories
One-hot encoding
Keras to_categorical essentially perform the process described in the picture above.
Label encoded Apple, Chicken and Broccoli turn into a vector of 3 components. A 1 is
placed to represent the presence of the class and a 0 to indicate its absence
Softmax predictions
Your recently trained model is loaded for you. This model is generalizing well!, that's why
you got a high accuracy on the test set.
Since you used the softmax activation function, for every input of 2 coordinates provided
to your model there's an output vector of 4 numbers. Each of these numbers encodes
the probability of a given dart being thrown by one of the 4 possible competitors.
When computing accuracy with the model's .evaluate() method, your model takes the
class with the highest probability as the prediction. np.argmax() can help you do this
since it returns the index with the highest value in an array.
Use the collection of test throws stored in coords_small_test and np.argmax()to check
this out!
Multi-class vs multi-label
Imagine we had three classes; sun, moon and clouds. In multi-class problems if we took
a sample of our observations each individual in the sample will belong to a unique class.
However, in a multi-label problem each individual in the sample can have all, none or a
subset of the available classes. As you can see in the image, multi-label vectors are
also one-hot encoded, there's a 1 or a 0 representing the presence or absence of each
class.
The architecture
Making a multi-label model for this problem is not that different to what you did when
building your multi-class model. We first instantiate a sequential model. For the sake of
this example, we will assume that to differentiate between these 3 classes, we need just
one input and 2 hidden neurons. The biggest changes happen in the output layer and in
its activation function. In the output layer, we use as many neurons as possible classes
but we use sigmoid activation this time
Sigmoid outputs
We use sigmoid outputs because we no longer care about the sum of probabilities. We
want each output neuron to be able to individually take a value between 0 and 1. This
can be achieved with the sigmoid activation because it constrains our neuron output in
the range 0-1. That's what we did in binary classification, though we only had one output
neuron there.
An advantage
You can see how using neural networks for multi-label classification can be performed
with minor tweaks to our model architecture. If we were to use a classical machine
learning approach to solve multi-label problems we would need more complex methods.
One way to do so consists of training several classifiers to distinguish each particular
class from the rest. This is called one versus rest classification.
An irrigation machine
Let's tackle a new problem. A farm field has an array of 20 sensors distributed along 3
crop fields. These sensors measure, among other things, the humidity of the soil,
radiation of the sun, etc. Your task is to use the combination of measurements from
these sensors to decide which parcels to water, given each parcel has different
environmental requirements.
Each sensor measures an integer value between 0 and 13 volts. Parcels can be
represented as one-hot encoded vectors of length 3, where each index is one of the
parcels. Parcels can be watered simultaneously.
What is a callback: A callback is a function that is executed after some other function,
event, or task has finished. For instance, when you touch your phone screen, a block of
code that identifies the type of gesture will be triggered. Since this block of code has
been called after the touching event occurred, it's a callback.
Callbacks in Keras
In the same way, a keras callback is a block of code that gets executed after each
epoch during training or after the training is finished. They are useful to store metrics as
the model trains and to make decisions as the training goes by
To get the most out of the history object we should use the validation_data parameter in
our fit method, passing X_test and Y_test as a tuple. The validation_split parameter can
be used instead, specifying a percentage of the training data that will be left out for
testing purposes. That way we not only have the training metrics but also the validation
metrics.
History plots
You can compare training and validation metrics with a few matplotlib commands. We
just need to define a figure. Plot the values of the history attribute for the training
accuracy and the validation accuracy. We can then make our graph prettier by adding a
title, axis labels and a legend.
We can see our model accuracy increases for both training and test sets till it reaches
epoch 25. Then accuracy flattens for the test set whilst the training keeps improving.
Overfitting it's taking place since we see the training accuracy keeps improving whilst
the test data decreases in accuracy. More on this in the next chapter.
Early stopping
Early stopping a model can solve the overfitting problem. Since it stops its training when
it no longer improves. This is extremely useful since deep neural models can take a
long time to train and we don't know beforehand how many epochs will be needed.
Early stopping, like other keras callbacks can be imported from
tensorflow.keras.callbacks. We then need to instantiate it. The early stopping callback
can monitor several metrics, like validation accuracy, validation loss, etc. These can be
specified in the monitor parameter. It's also important to define a patience argument,
that is the number of epochs to wait for the model to improve before stopping it's
training. There's no rules to decide which patience number works best at all times,this
depends on the implementation. It's good to avoid low values, that way your model has
a chance to improve at a later epoch.The callback is passed as a list to the callbacks
parameter in the model fit method.
Model checkpoint
The model checkpoint callback can also be imported from keras callbacks. This callback
allows us to save our model as it trains. We specify the model filename with a name and
the .hdf5 extension. You can also decide what to monitor to determine which model is
best with the monitor parameter, by default validation loss is monitored. Setting the
save_best_only parameter to True guarantees that the latest best model according to
the quantity monitored will not be overwritten.
A combination of callbacks
Deep learning models can take a long time to train, especially when you move to
deeper architectures and bigger datasets. Saving your model every time it improves as
well as stopping it when it no longer does allows you to worry less about choosing the
number of epochs to train for. You can also restore a saved model anytime and resume
training where you left it.
The model training and validation data are available in your workspace
as X_train, X_test, y_train, and y_test.
Use the EarlyStopping() and the ModelCheckpoint() callbacks so that you can go eat a
jar of cookies while you leave your computer to work!