09-Neural Networks
09-Neural Networks
Network(2)
Implementing MLP with Keras
Introduction
• Keras is a high-level Deep Learning API that allows you to easily build,
train, evaluate and execute all sorts of neural networks.
• TensorFlow itself now comes bundled with its own Keras
implementation called tf.keras. It only supports TensorFlow as the
backend.
2
Installing TensorFlow and Keras
• The pip show command can provide information about a single, globally
installed package, including its location: pip show <Package-Name>
• To install TensorFlow:
• Execute anaconda
• Activate your environment
• Show the command window
• python3 -m pip install --upgrade tensorflow
• To test your installation
• import tensorflow as tf
• print(tf.__version__)
4
Recognizing Handwritten Numbers
• Most people effortlessly recognize those digits as 504192.
• The difficulty of visual pattern recognition becomes apparent if you
attempt to write a computer program to recognize digits like those above.
• What seems easy when we do it ourselves suddenly becomes extremely difficult.
Simple intuitions about how we recognize shapes - "a 9 has a loop at the top, and a
vertical stroke in the bottom right" - turn out to be not so simple to express
algorithmically. When you try to make such rules precise, you quickly get lost in
exceptions and special cases. It seems hopeless.
• Neural networks approach the problem in a different way. The idea is to
take a large number of handwritten digits, known as training examples and
then develop a system which can learn from those training examples.
• In other words, the neural network uses the examples to automatically infer rules for
recognizing handwritten digits. Furthermore, by increasing the number of training
examples, the network can learn more about handwriting, and so improve its
accuracy.
… MINIST
• The MNIST database contains
60,000 training images and
10,000 testing images.
• It is a labeled dataset that
pairs images of hand-written
numerals with the name of the
respective numeral
• Each image is a normalized to
fit into a 28x28 pixel bounding
box
This set has been studied so much that it is often called the “Hello World” of Machine Learning: whenever
people come up with a new classification algorithm, they are interested to see how it will perform on MNIST.
6
… The Architecture
To recognize individual digits we will use a three-layer neural network:
• The input layer of the network contains neurons encoding the values of the input
pixels.
• Our training data for the network will consist of many 28x28 images of scanned handwritten
digits, and so the input layer contains 784 neurons.
• The input pixels are grey-scale, with a value of 0 representing white, a value of 255
representing black, and in between values representing gradually darkening shades of grey.
• The output layer of the network contains 10 neurons. we number the output
neurons from 0 through 9, and figure out which neuron has the highest activation
value. If that neuron is , say, neuron number 6, then our network will guess that
the input digit was a 6 . And so on for the other output neurons.
• The hidden layers: Two hidden layers (300 and 100 neurons respectively) (you
can experiment other values)
8
Using Keras to Load the Dataset
• Keras provides some utility functions to fetch and load common datasets, including
MNIST
• mnist = keras.datasets.mnist
• (X_train_full, y_train_full), (X_test, y_test) = mnist.load_data()
• When loading MNIST using Keras rather than Scikit-Learn, one important difference is
that every image is represented as a 28×28 array rather than a 1D array of size 784.
Moreover, the pixel intensities are represented as integers (from 0 to 255) rather than
floats (from 0.0 to 255.0).
• Here is the shape and data type of the
• X_train_full.shape ➔ (60000, 28, 28)
• X_train_full.dtype ➔ dtype('uint8')
…
• Let’s take a peek at one digit from the dataset. All you need to do is
grab an instance’s feature vector, (no need to reshape it to a 28×28
array), and display it using Matplotlib’s imshow() function:
• import matplotlib as mpl
• import matplotlib.pyplot as plt
• some_digit = X_train_full[0]
• plt.imshow(some_digit, cmap = mpl.cm.binary,
interpolation="nearest")
• plt.axis("off")
• plt.show()
• y_train_full[0] ➔ ‘5’
10
…
• Note that the dataset is already split into a training set and a test set, but
there is no validation set, so let’s create one. Moreover, since we are going
to train the neural network using Gradient Descent, we must scale the
input features. For simplicity, we just scale the pixel intensities down to the
0-1 range by dividing them by 255.0 (this also converts them to floats):
• X_valid, X_train = X_train_full[:5000] / 255.0,
X_train_full[5000:] / 255.0
• y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
• With MNIST, when the label is equal to 5, it means that the image
represents the handwritten digit 5.
11
13
…
• Next we add a Dense hidden layer with 300 neurons.
• It will use the ReLU activation function.
• Each Dense layer manages its own weight matrix, containing all the connection
weights between the neurons and their inputs.
• It also manages a vector of bias terms (one per neuron).
• When it receives some input data, it computes Equation: ℎ𝑊,𝑏 (𝑋) = ∅(𝑋𝑊 + 𝑏)
• Next we add a second Dense hidden layer with 100 neurons, also using the
ReLU activation function.
• Finally, we add a Dense output layer with 10 neurons (one per class), using
the softmax activation function (because the classes are exclusive).
14
…
• Instead of adding the layers one by one as we just did, you can pass a
list of layers when creating the Sequential model:
• model = keras.models.Sequential([
• keras.layers.Flatten(input_shape=[28, 28]),
• keras.layers.Dense(300, activation="relu"),
• keras.layers.Dense(100, activation="relu"),
• keras.layers.Dense(10, activation="softmax")
• ])
15
…
• The model’s model.summary() method displays all the model’s layers, including each
layer’s name (which is automatically generated unless you set it when creating the
layer), its output shape (None means the batch size can be anything), and its number
of parameters. The summary ends with the total number of parameters, including
trainable and non-trainable parameters. Here we only have trainable parameters
16
…
• Note that Dense layers often have a lot of parameters.
• For example, the first hidden layer has 784 × 300 connection weights, plus 300
bias terms, which adds up to 235,500 parameters! This gives the model quite a lot
of flexibility to fit the training data, but it also means that the model runs the risk
of overfitting, especially when you do not have a lot of training data.
• You can easily get a model’s list of layers, to fetch a layer by its index, or you can
fetch it by name:
• model.layers[1].name
• All the parameters of a layer can be accessed using its get_weights() and
set_weights() method. For a Dense layer, this includes both the connection
weights and the bias terms:
• hidden1 = model.layers[1]
• weights, biases = hidden1.get_weights()
• print(weights.shape, biases.shape) ➔ (784x300) (300,)
17
…
• Notice that the Dense layer initialized the connection weights
randomly (which is needed to break symmetry, as we discussed
earlier), and the biases were just initialized to zeros, which is fine.
• If you ever want to use a different initialization method, you can set
kernel_initializer (kernel is another name for the matrix of connection
weights) or bias_initializer when creating the layer.
18
Compiling the Model
• After a model is created, you must call its compile() method to specify the loss function
and the optimizer to use.
• Optionally, you can also specify a list of extra metrics to compute during training and evaluation:
• model.compile(loss="sparse_categorical_crossentropy", optimizer="sgd",
metrics=["accuracy"])
19
…
• Secondly, regarding the optimizer, "sgd" simply means that we will
train the model using simple Stochastic Gradient Descent.
• In other words, Keras will perform the backpropagation algorithm described
earlier.
• Finally, since this is a classifier, it’s useful to measure its "accuracy"
during training and evaluation.
20
Training and Evaluating the Model
• Now the model is ready to be trained. For this we simply need to call its fit() method.
We pass it
• the input features (X_train)
• the target classes (y_train),
• the number of epochs to train (or else it would default to just 1, which would definitely not
be enough to converge to a good solution).
• a validation set (this is optional): Keras will measure the loss and the extra metrics on this set
at the end of each epoch, which is very useful to see how well the model really performs.
Instead of passing a validation set using the validation_data argument, you could instead set
validation_split to the ratio of the training set that you want Keras to use for validation (e.g., 0.1).
keras, batch size is specified by using the batch_size hyperparameter (argument) in the fit() method of the
model. The batch_size accepts an integer or None . When None or unspecified, it will default to 32
21
…
• And that’s it! The neural network is trained.
• At each epoch during training, Keras displays the number of instances
processed so far (along with a progress bar), the mean training time
per sample, the loss and accuracy (or any other extra metrics you
asked for), both on the training set and the validation set.
• You can see that the training loss went down, which is a good sign,
and the validation accuracy reached 98% after 30 epochs, not too far
from the training accuracy, so there does not seem to be much
overfitting going on.
22
…
• The fit() method returns a History object containing the
• training parameters (history.params),
• the list of epochs it went through (history.epoch),
• a dictionary (history.history) containing the loss and extra metrics it measured
at the end of each epoch on the training set and on the validation set (if any).
• If you create a Pandas DataFrame using this dictionary and call its
plot() method, you get the learning curves
• import pandas as pd
• pd.DataFrame(history.history).plot(figsize=(8, 5))
• plt.grid(True)
• plt.gca().set_ylim(0, 1) # set the vertical range to [0-1]
• plt.show()
23
…
• You can see that both the training and
validation accuracy steadily increase
during training, while the training and
validation loss decrease. Good!
• Moreover, the validation curves are quite
close to the training curves, which means
that there is not too much overfitting.
24
…
• If you are not satisfied with the performance of your model, you
should go back and tune the model’s hyperparameters, for example
• the number of layers, the number of neurons per layer, the types of activation
functions we use for each hidden layer, the number of training epochs, the
batch size (it can be set in the fit() method using the batch_size argument,
which defaults to 32).
25
• Once you are satisfied with your model’s validation accuracy, you
should evaluate it on the test set to estimate the generalization error
before you deploy the model to production.
• You can easily do this using the evaluate() method (it also supports
several other arguments, such as batch_size or sample_weight,
please check the documentation for more details):
• model.evaluate(X_test, y_test)
• As we saw, it is common to get slightly lower performance on the test
set than on the validation set, because the hyperparameters are
tuned on the validation set, not the test set.
26
Using the Model to Make Predictions
• Next, we can use the model’s predict() method to make predictions on new instances.
• Since we don’t have actual new instances, we will just use the first 3 instances of the test
set:
• X_new = X_test[:3]
• y_proba = model.predict(X_new)
• print(y_proba.round(2))
➔ array([[0. , 0. , 0. , 0. , 0. , 0.09, 0. , 0.12, 0. , 0.79],
[0. , 0. , 0.94, 0. , 0.02, 0. , 0.04, 0. , 0. , 0. ],
[0. , 1. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]], dtype=float32)
• As you can see, for each instance the model estimates one probability per class, from
class 0 to class 9. For example, for the first image it estimates that the probability of class
9 is 79%, the probability of class 7 is 12%, the probability of class 5 is 9%, and the other
classes are negligible. In other words, it “believes” it’s 9 , probably 7, but it’s not entirely
sure.
27
28
Building a Regression MLP
Using the Sequential API
29
…
• Let’s switch to the California housing problem and tackle it using a
regression neural network.
• For simplicity, we will use Scikit-Learn’s fetch_california_housing()
function to load the data:
• this dataset is simpler than the one we used before, since it contains only
numerical features (there is no ocean_proximity feature), and there is no
missing value.
• After loading the data, we split it into a training set, a validation set
and a test set, and we scale all the features:
30
…
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test =
train_test_split(housing.data, housing.target)
X_train, X_valid, y_train, y_valid =
train_test_split(X_train_full, y_train_full)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_valid_scaled = scaler.transform(X_valid)
X_test_scaled = scaler.transform(X_test)
31
…
• Building, training, evaluating and using a regression MLP using the
Sequential API to make predictions is quite similar to what we did for
classification.
• The main differences are the fact that
• the output layer has a single neuron (since we only want to predict a single
value) and
• uses no activation function, and
• the loss function is the mean squared error.
• Since the dataset is quite noisy, we just use a single hidden layer with
fewer neurons than before, to avoid overfitting:
32
…
model = keras.models.Sequential([
keras.layers.Dense(30, activation="relu", input_shape=X_train.shape[1:]),
keras.layers.Dense(1)])
model.compile(loss="mean_squared_error", optimizer="sgd")
history = model.fit(X_train, y_train, epochs=20, validation_data=(X_valid, y_valid))
mse_test = model.evaluate(X_test, y_test)
X_new = X_test[:3] # pretend these are new instances
y_pred = model.predict(X_new)
33
34