0% found this document useful (0 votes)
12 views47 pages

Class Notes DL Unit 2

The document provides an overview of neural networks, including their anatomy, types, and the training process. It introduces Keras as a high-level deep learning framework built on top of TensorFlow, Theano, and CNTK, detailing its features, benefits, and setup requirements. Additionally, it outlines the necessary hardware and software for deep learning projects and offers guidance on setting up a deep learning workstation using Jupyter and Anaconda.

Uploaded by

madhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views47 pages

Class Notes DL Unit 2

The document provides an overview of neural networks, including their anatomy, types, and the training process. It introduces Keras as a high-level deep learning framework built on top of TensorFlow, Theano, and CNTK, detailing its features, benefits, and setup requirements. Additionally, it outlines the necessary hardware and software for deep learning projects and offers guidance on setting up a deep learning workstation using Jupyter and Anaconda.

Uploaded by

madhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 47

UNIT III: Neural Networks: Anatomy of Neural Network, Introduction to Keras: Keras,

TensorFlow, Theano and CNTK, Setting up Deep Learning Workstation, Classifying Movie
Reviews: Binary Classification, Classifying newswires: Multiclass Classification.

Anatomy of Neural networks:


Neural networks are a type of machine learning model that are inspired by the structure and
function of the human brain. They consist of layers of interconnected nodes, called neurons,
which process and transmit information.

As a result, we can say that ANNs are composed of multiple nodes. That imitate biological
neurons of the human brain. Although, we connect these neurons by links. Also, they interact
with each other. Although, nodes are used to take input data. Further, perform simple
operations on the data. As a result, these operations are passed to other neurons. Also, output
at each node is called its activation or node value.

The basic anatomy of a neural network includes:


 Input layer: This is the first layer of the network where the input data is received.Each
neuron in this layer corresponds to one input feature.
 Hidden layer(s): These are the layers that come between the input and output layers.
They process the input data and learn to recognize patterns in the data. A neural
network can have one or more hidden layers.
 Output layer: This is the final layer of the network where the output data is produced.
Each neuron in this layer corresponds to one output feature.

 Neurons: These are the basic units of a neural network. They receive input from other
neurons and perform a mathematical operation on the input to produce an output.

 Weights: These are the values that determine the strength of the connections between
neurons. They are adjusted during training to optimize the performance of the network.

 Bias: Bias is an additional parameter that is added to the input of each neuron. It
allows the neuron to adjust the output based on a specific threshold.

 Activation function: This is a mathematical function that is applied to the output of


each neuron. It allows the neuron to produce a non-linear output that can model
complex relationships between input and output data.

 Backpropagation: This is the process of adjusting the weights and biases of the
network during training to minimize the difference between the predicted output and
the actual output. It is an iterative process that continues until the network produces
accurate predictions.
As each link is associated with weight. Also, they are capable of learning. That takes place by
altering weight values. Hence, the following illustration shows a simple ANN

Figure 1 :Neural networks structure

Types of Artificial Neural Networks


Generally, there are two types of ANN. Such as Feedforward and Feedback.

Feedforward ANN
In this network flow of information is unidirectional. A unit used to send information to another
unit that does not receive any information. Also, no feedback loops are present in this.
Although, used in recognition of a pattern. As they contain fixed inputs and outputs.
Feedback ANN: In this Artificial Neural Network, it allows feedback loops.Also, used in
content addressable memories.

Training a neural network revolves around the following objects:


Layers, which are combined into a network (or model)
The input data and corresponding targets
The loss function, which defines the feedback signal used for learning
The optimizer, which determines how learning proceeds

As illustrated in figure 3.1: the network, composed of layers that are chained together, maps
the input data to predictions. The loss function then compares these predictions to the targets,
producing a loss value: a measure of how well the network’s predictions match what was
expected. The optimizer uses this loss value to update the network’s weights.

Layers: the building blocks of deep learning


The fundamental data structure in neural networks is the layer, A layer is a data-processing
module that takes as input one or more tensors and that outputs one or more tensors. Some
layers are stateless, but more frequently layers have a state: the layer’s weights, one or several
tensors learned with stochastic gradient descent, which together contain the network’s
knowledge.

For instance,
Simple vector data, stored in 2D tensors of shape (samples, features), is often processed by
densely connected layers, also called fully connected or dense layers (the Dense class in
Keras).
Sequence data, stored in 3D tensors of shape (samples, timesteps, features), is typically
processed by recurrent layers such as an LSTM layer.
Image data, stored in 4D tensors, is usually processed by 2D convolution layers (Conv2D).

Building deep-learning models in Keras is done by clipping together compatible layers to


form useful data-transformation pipelines. The notion of layer compatibility here refers
specifically to the fact that every layer will only accept input tensors of a certain shape and
will return output tensors of a certain shape.
Consider the following example:

We’re creating a layer that will only accept as input 2D tensors where the first dimension is
784 (axis 0, the batch dimension, is unspecified, and thus any value would be accepted).
This layer will return a tensor where the first dimension has been transformed to be 32. Thus
this layer can only be connected to a downstream layer that expects 32- dimensional vectors
as its input. When using Keras, you don’t have to worry about compatibility, because the
layers you add to your models are dynamically built to match the shape of the incoming layer.

For instance, suppose you write the following:

The second layer didn’t receive an input shape argument—instead, it automatically inferred its
input shape as being the output shape of the layer that came before.

Models: networks of layers


A deep-learning model is a directed, acyclic graph of layers. The most common
instance is a linear stack of layers, mapping a single input to a single output.
But as you move forward, you’ll be exposed to a much broader variety of network
topologies.
Some common ones include the following:
 Two-branch networks
 Multihead networks
 Inception blocks
.
Picking the right network architecture is more an art than a science; and although
there are some best practices and principles you can rely on, only practice can help
you become a proper neural-network architect.

Loss functions and optimizers: keys to configuring the learning process


Once the network architecture is defined, you still have to choose two more things:
 Loss function (objective function) —The quantity that will be minimized during
training. It represents a measure of success for the task at hand.
 Optimizer—Determines how the network will be updated based on the loss
function. It implements a specific variant of stochastic gradient descent (SGD).

Introduction to Keras
Keras is a deep-learning framework for Python that provides a convenient way to
define and train almost any kind of deep-learning model. Keras was initially
developed for researchers, with the aim of enabling fast experimentation.
Keras is one of the most powerful and easy to use python library, which is built on top of
popular deep learning libraries like TensorFlow, Theano, etc., for creating deep learning
models.

Overview of Keras:
Keras is a model-level library, providing high-level building blocks for developing
deep-learning models. It doesn’t handle low-level operations such as tensor
manipula- tion and differentiation. Instead, it relies on a specialized, well-optimized
tensor library to do so, serving as the backend engine of Keras. Rather than choosing
a single tensor library and tying the implementation of Keras to that library, Keras
handles the problem in a modular way (see figure 3.3); thus several different backend
engines can be plugged seamlessly into Keras. Currently, the three existing backend
implementa- tions are the TensorFlow backend, the Theano backend, and the
Microsoft Cognitive Toolkit (CNTK) backend. In the future, it’s likely that Keras will
be extended to work with even more deep-learning execution engines.
Keras runs on top of open-source machine libraries like TensorFlow, Theano or Cognitive
Toolkit (CNTK). Theano is a python library used for fast numerical computation tasks.
TensorFlow is the most famous symbolic math library used for creating neural networks
and deep learning models. TensorFlow is very flexible and the primary benefit is distributed
computing. CNTK is deep learning framework developedby Microsoft. It uses libraries such
as Python, C#, C++, or standalone machine learning toolkits. Theano and TensorFlow are
very powerful libraries but difficult to understand for creating neural networks. Keras is based
on minimal structure that provides a clean and easy way to create deep learning models based
on TensorFlow or Theano. Keras isdesigned to quickly define deep learning models. Well,
Keras is an optimal choice for deep learning applications.

Features:
Keras leverages various optimization techniques to make high level neural network API
easier and more performant.
It supports the following features:
 Consistent, simple and extensible API.
 Minimal structure - easy to achieve the result without any frills.
 It supports multiple platforms and backends.
 It is user friendly framework which runs on both CPU and GPU.
 Highly scalability of computation.

Benefits:

Keras is highly powerful and dynamic framework and comes up with the followingadvantages:
 Larger community support.
 Easy to test.
 Keras neural networks are written in Python which makes things simpler.
 Keras supports both convolution and recurrent networks.
Figure 3.2 Google web search interest for different deep-learning frameworks over time

TensorFlow :

TensorFlow is an open-source machine learning library used for numerical computational tasks
developed by Google. Keras is a high level API built on top of TensorFlow or Theano. We
know already how to install TensorFlow using pip. If it is not installed, you can install using
the below command:

pip install TensorFlow

Once we execute keras, we could see the configuration file is located at your homedirectory inside
and go to .keras/keras.json.

{
"image_data_format":
"channels_last", "epsilon": 1e-07,
"floatx": "float32", "backend":
"tensorflow"

keras.json:
Here,

 image_data_format represent the data format.


 epsilon represents numeric constant. It is used to avoid DivideByZero error.
 floatx represent the default data type float32. You can also change it to
float16 or float64 using set_floatx() method.
 backend denotes the current backend.

Theano:

Theano is an open-source deep learning library that allows you to evaluate multidimensional
arrays effectively. We can easily install using the below command:

pip install Theano

By default, keras uses TensorFlow backend. If you want to change backend


configuration from TensorFlow to Theano, just change the backend = theano in keras.json
file.

Itis described below:

keras.json

"image_data_format":"channeIs_Iast",
"epsiIon": 1e−07,

"fIoatx": "fIoat32", "backend": "theano"

Now save your file, restart your terminal and start keras, your backend will bechanged.

>>> import keras as k using theano backend


CNTK
CNTK (Microsoft Cognitive Toolkit) is an open-source deep learning framework developed
by Microsoft that allows users to build, train, and deploy deep neural networks. It supports a
variety of machine learning algorithms and architectures, including convolutional neural
networks (CNNs), recurrent neural networks (RNNs),and deep belief networks (DBNs).

Here are some key features of CNTK:

 Distributed training: CNTK supports distributed training across multiple GPUs


and multiple machines, which allows users to train large-scale modelsmore
efficiently.
 Python and C++ APIs: CNTK provides APIs in both Python and C++, whichmakes
it easy to integrate with other programming languages and tools.
 GPU acceleration: CNTK supports GPU acceleration for training and inference,
which allows users to train models much faster than on CPU-onlysystems.
 Pre-trained models: CNTK includes a number of pre-trained models that canbe used
for tasks such as image classification and speech recognition. These models can be
fine- tuned on a user's own dataset for improved performance.
 Customizable neural network architecture: CNTK allows users to define and
customize their own neural network architecture using a high-level API called
Brain Script.
 Cross-platform support: CNTK runs on Windows, Linux, and macOS, whichmakes
it easy to deploy models across a variety of platforms.
 Interactive training: CNTK includes a powerful visualization tool called the Network
Learner, which allows users to monitor the training process in real-time and make
adjustments to the model as needed.

Overall, CNTK is a powerful and flexible deep learning framework that is particularly well-
suited for large-scale distributed training. Its support for multiple programming languages and
platforms makes it easy to integrate with other tools and systems, and its pre-trained models
and customizable neural network architecture make it a great choice for a wide variety of
machine learning tasks.
Setting up Deep learning workstation
PC Hardware Setup:

First of all to perform machine learning and deep learning on any dataset, the
software/program requires a computer system powerful enough to handle thecomputing power
necessary.
So the following is required:

 Central Processing Unit (CPU) — Intel Core i5 6th Generation processor or higher.
An AMD equivalent processor will also be optimal.
 RAM — 8 GB minimum, 16 GB or higher is recommended.
 Graphics Processing Unit (GPU) — NVIDIA GeForce GTX 960 or
higher.AMD GPUs are not able to perform deep learning regardless. For more
information on NVIDIA GPUs for deep learning please visit
https://developer.nvidia.com/cuda- gpus.
 Operating System — Ubuntu or Microsoft Windows 10. I recommend
updating Windows 10 to the latest version before proceeding forward.

Note: In the case of laptops, the ideal option would be to purchase a gaming laptop from any
vendor deemed suitable such as Alienware, ASUS, Lenovo Legion, AcerPredator, etc.

Note that in order to use Keras, you need to install TensorFlow or CNTK or Theano (or all of
them, if you want to be able to switch back and forth among the three backends

Software Requirements:

Jupyter notebooks: the preferred way to run deep-learning experiments .We recommend
using Jupyter notebooks to get started with Keras, although that isn’t a requirement: you can
also run standalone Python scripts or run code from within an IDE such as PyCharm.
Deep Learning with Keras

1. Deep Learning with Keras — Setting up Deep Learning with Keras

With this background, let us now start creating the project.

Setting Up Project
We will use Jupyter through Anaconda navigator for our project. As our project uses
TensorFlow and Keras, you will need to install those in Anaconda setup. To install
Tensorflow, run the following command in your console window:

>conda install -c anaconda tensorflow

To install Keras, use the following command:

>conda install -c anaconda keras

You are now ready to start Jupyter.

Starting Jupyter
When you start the Anaconda navigator, you would see the following opening screen.

1
Deep Learning with Keras
Click ‘Jupyter’ to start it. The screen will show up the existing projects, if any, on your
drive.

Startinga New Project


Start a new Python 3 project in Anaconda by selecting the following menu option:

File | New Notebook | Python 3

The screenshot of the menu selection is shown for your quick reference:

A new blank project will show up on your screen as shown below:

Change the project name to DeepLearningDigitRecognition by clicking and editing on


the default name “UntitledXX”.

2
2. Deep Learning with Keras — Importing Deep Learning with Keras

Libraries

We first import the various libraries required by the code in our project.

Array Handlingand Plotting


As typical, we use numpy for array handling and matplotlib for plotting. These libraries
are imported in our project using the following import statements:

import numpy as np
import matplotlib
import matplotlib.pyplot as plot

SuppressingWarnings
As both Tensorflow and Keras keep on revising, if you do not sync their appropriate
versions in the project, at runtime you would see plenty of warning errors. As they distract
your attention from learning, we shall be suppressing all the warnings in this project. This
is done with the following lines of code:

# silent all warnings


import os
os.environ['TF_CPP_MIN_LOG_LEVEL']=
'3' import warnings
warnings.filterwarnings('ignore')
from tensorflow.python.util import deprecation
deprecation._PRINT_DEPRECATION_WARNINGS = False

Keras
We use Keras libraries to import dataset. We will use the mnist dataset for handwritten
digits. We import the required package using the following statement:

from keras.datasets import mnist

We will be defining our deep learning neural network using Keras packages. We
import the Sequential, Dense, Dropout and Activation packages for defining the
network architecture. We use load_model package for saving and retrieving our model.
We also use np_utils for a few utilities that we need in our project. These imports are
done with the following program statements:

from keras.models import Sequential, load_model


from keras.layers.core import Dense, Dropout, Activation
3
Deep Learning with Keras

from keras.utils import np_utils

When you run this code, you will see a message on the console that says that Keras uses
TensorFlow at the backend. The screenshot at this stage is shown here:

Now, as we have all the imports required by our project, we will proceed to define the
architecture for our Deep Learning network.

4
3. Deep Learning with Keras — Creating Deep Deep Learning with Keras

Learning Model

Our neural network model will consist of a linear stack of layers. To define such a model,
we call the Sequential function:

model = Sequential()

Input Layer
We define the input layer, which is the first layer in our network using the following
program statement:

model.add(Dense(512, input_shape=(784,)))

This creates a layer with 512 nodes (neurons) with 784 input nodes. This is depicted in
the figure below:

Note that all the input nodes are fully connected to the Layer 1, that is each input node is
connected to all 512 nodes of Layer 1.

Next, we need to add the activation function for the output of Layer 1. We will use ReLU
as our activation. The activation function is added using the following program statement:

5
model.add(Activation('relu'))

Next, we add Dropout of 20% using the statement below. Dropout is a technique used to
prevent model from overfitting.

model.add(Dropout(0.2))

At this point, our input layer is fully defined. Next, we will add a hidden layer.

HiddenLayer
Our hidden layer will consist of 512 nodes. The input to the hidden layer comes from our
previously defined input layer. All the nodes are fully connected as in the earlier case. The
output of the hidden layer will go to the next layer in the network, which is going to
be our final and output layer. We will use the same ReLU activation as for the previous
layer and a dropout of 20%. The code for adding this layer is given here:

model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))

The network at this stage can be visualized as follows:

6
Next, we will add the final layer to our network, which is the output layer. Note that you
may add any number of hidden layers using the code similar to the one which you have
used here. Adding more layers would make the network complex for training; however,
giving a definite advantage of better results in many cases though not all.

Output Layer
The output layer consists of just 10 nodes as we want to classify the given images in 10
distinct digits. We add this layer, using the following statement:

model.add(Dense(10))

As we want to classify the output in 10 distinct units, we use the softmax activation. In
case of ReLU, the output is binary. We add the activation using the following statement:

model.add(Activation('softmax'))

At this point, our network can be visualized as shown in the below diagram:

At this point, our network model is fully defined in the software. Run the code cell and if
there are no errors, you will get a confirmation message on the screen as shown in the
screenshot below:

7
Next, we need to compile the model.

8
4. Deep Learning with Keras — Compiling the Deep Learning with Keras

Model

The compilation is performed using one single method call called compile.

model.compile(loss='categorical_crossentropy', metrics=['accuracy'],
optimizer='adam')

The compile method requires several parameters. The loss parameter is specified to have
type 'categorical_crossentropy'. The metrics parameter is set to 'accuracy' and finally
we use the adam optimizer for training the network. The output at this stage is shown
below:

Now, we are ready to feed in the data to our network.

LoadingData
As said earlier, we will use the mnist dataset provided by Keras. When we load the data
into our system, we will split it in the training and test data. The data is loaded by calling
the load_data method as follows:

(X_train, y_train), (X_test, y_test) = mnist.load_data()

The output at this stage looks like the following:

Now, we shall learn the structure of the loaded dataset.

Examining Data Points


The data that is provided to us are the graphic images of size 28 x 28 pixels, each
containing a single digit between 0 and 9. We will display the first ten images on the
console. The code for doing so is given below:

# printing first 10 images


for i in range(10):

9
Deep Learning with Keras

plot.subplot(3,5,i+
1)
plot.tight_layout()
plot.imshow(X_train[i], cmap='gray',
interpolation='none') plot.title("Digit:
{}".format(y_train[i]))
plot.xticks([])
plot.yticks([])
In an iterative loop of 10 counts, we create a subplot on each iteration and show an image
from X_train vector in it. We title each image from the corresponding y_train vector.
Note that the y_train vector contains the actual values for the corresponding image in
X_train vector. We remove the x and y axes markings by calling the two methods xticks
and yticks with null argument. When you run the code, you would see the following
output:

Next, we will prepare data for feeding it into our network.

10
5. Deep Learning with Keras ― Preparing Data Deep Learning with Keras

Before we feed the data to our network, it must be converted into the format required by
the network. This is called preparing data for the network. It generally consists of
converting a multi-dimensional input to a single-dimension vector and normalizing the
data points.

Reshaping Input Vector


The images in our dataset consist of 28 x 28 pixels. This must be converted into a single
dimensional vector of size 28 * 28 = 784 for feeding it into our network. We do so by
calling the reshape method on the vector.

X_train = X_train.reshape(60000,
784) X_test = X_test.reshape(10000,
784)
Now, our training vector will consist of 60000 data points, each consisting of a single
dimension vector of size 784. Similarly, our test vector will consist of 10000 data points
of a single-dimension vector of size 784.

Normalizing Data
The data that the input vector contains currently has a discrete value between 0 and 255
- the gray scale levels. Normalizing these pixel values between 0 and 1 helps in speeding
up the training. As we are going to use stochastic gradient descent, normalizing data will
also help in reducing the chance of getting stuck in local optima.

To normalize the data, we represent it as float type and divide it by 255 as shown in the
following code snippet:

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

Let us now look at how the normalized data looks like.

ExaminingNormalizedData
To view the normalized data, we will call the histogram function as shown here:

plot.hist(X_train[0])
plot.title("Digit: {}".format(y_train[0]))

Here, we plot the histogram of the first element of the X_train vector. We also print the
digit represented by this data point. The output of running the above code is shown here:

11
Deep Learning with Keras

You will notice a thick density of points having value close to zero. These are the black dot
points in the image, which obviously is the major portion of the image. The rest of the
gray scale points, which are close to white color, represent the digit. You may check out
the distribution of pixels for another digit. The code below prints the histogram of a
digit at index of 2 in the training dataset.

plot.hist(X_train[2])
plot.title("Digit: {}".format(y_train[2])

The output of running the above code is shown below:

12
Deep Learning with Keras

Comparing the above two figures, you will notice that the distribution of the white
pixels in two images differ indicating a representation of a different digit - “5” and “4” in
the above two pictures.

Next, we will examine the distribution of data in our full training dataset.

Examining Data Distribution


Before we train our machine learning model on our dataset, we should know the
distribution of unique digits in our dataset. Our images represent 10 distinct digits ranging
from 0 to 9. We would like to know the number of digits 0, 1, etc., in our dataset. We can
get this information by using the unique method of Numpy.

Use the following command to print the number of unique values and the number of
occurrences of each one:

print(np.unique(y_train, return_counts=True))

When you run the above command, you will see the following output:

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8), array([5923, 6742,


5958,
6131, 5842, 5421, 5918, 6265, 5851, 5949]))
It shows that there are 10 distinct values — 0 through 9. There are 5923 occurrences of
digit 0, 6742 occurrences of digit 1, and so on. The screenshot of the output is shown
here:

As a final step in data preparation, we need to encode our data.

Encoding Data
We have ten categories in our dataset. We will thus encode our output in these ten
categories using one-hot encoding. We use to_categorial method of Numpy utilities to
perform encoding. After the output data is encoded, each data point would be converted
into a single dimensional vector of size 10. For example, digit 5 will now be
represented as [0,0,0,0,0,1,0,0,0,0].

Encode the data using the following piece of code:

n_classes = 10
Y_train = np_utils.to_categorical(y_train, n_classes)

You may check out the result of encoding by printing the first 5 elements of the categorized
Y_train vector.

Use the following code to print the first 5 vectors:

13
Deep Learning with Keras

for i in range(5):

print (Y_train[i])

You will see the following output:

[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]

The first element represents digit 5, the second represents digit 0, and so on.

Finally, you will have to categorize the test data too, which is done using the following
statement:

Y_test = np_utils.to_categorical(y_test, n_classes)

At this stage, your data is fully prepared for feeding into the network.

Next, comes the most important part and that is training our network model.

14
6. Deep Learning with Keras — Training the Deep Learning with Keras

Model

The model training is done in one single method call called fit that takes few parameters
as seen in the code below:

history = model.fit(X_train, Y_train,


batch_size=128, epochs=20,
verbose=2,
validation_data=(X_test,
Y_test)))
The first two parameters to the fit method specify the features and the output of the
training dataset.

The epochs is set to 20; we assume that the training will converge in max 20 epochs -
the iterations. The trained model is validated on the test data as specified in the last
parameter.

The partial output of running the above command is shown here:

Train on 60000 samples, validate on 10000 samples


Epoch 1/20
- 9s - loss: 0.2488 - acc: 0.9252 - val_loss: 0.1059 - val_acc: 0.9665
Epoch 2/20
- 9s - loss: 0.1004 - acc: 0.9688 - val_loss: 0.0850 - val_acc: 0.9715
Epoch 3/20
- 9s - loss: 0.0723 - acc: 0.9773 - val_loss: 0.0717 - val_acc: 0.9765
Epoch 4/20
- 9s - loss: 0.0532 - acc: 0.9826 - val_loss: 0.0665 - val_acc: 0.9795
Epoch 5/20
- 9s - loss: 0.0457 - acc: 0.9856 - val_loss: 0.0695 - val_acc: 0.9792

The screenshot of the output is given below for your quick reference:

15
Deep Learning with Keras

Now, as the model is trained on our training data, we will evaluate its performance.

16
7. Deep Learning with Keras ― Evaluating Model Deep Learning with Keras

Performance

To evaluate the model performance, we call evaluate method as follows:

loss_and_metrics = model.evaluate(X_test, Y_test, verbose=2)

We will print the loss and accuracy using the following two statements:

print("Test Loss", loss_and_metrics[0])


print("Test Accuracy", loss_and_metrics[1])

When you run the above statements, you would see the following output:

Test Loss 0.08041584826191042


Test Accuracy 0.9837

This shows a test accuracy of 98%, which should be acceptable to us. What it means to
us that in 2% of the cases, the handwritten digits would not be classified correctly. We will
also plot accuracy and loss metrics to see how the model performs on the test data.

PlottingAccuracyMetrics
We use the recorded history during our training to get a plot of accuracy metrics. The
following code will plot the accuracy on each epoch. We pick up the training data accuracy
(“acc”) and the validation data accuracy (“val_acc”) for plotting.

plot.subplot(2,1,1)
plot.plot(history.history['acc'])
plot.plot(history.history['val_acc'
]) plot.title('model accuracy')
plot.ylabel('accuracy')
plot.xlabel('epoch')
plot.legend(['train', 'test'], loc='lower right')

The output plot is shown below:

17
Deep Learning with Keras

As you can see in the diagram, the accuracy increases rapidly in the first two epochs,
indicating that the network is learning fast. Afterwards, the curve flattens indicating that
not too many epochs are required to train the model further. Generally, if the training data
accuracy (“acc”) keeps improving while the validation data accuracy (“val_acc”) gets
worse, you are encountering overfitting. It indicates that the model is starting to memorize
the data.

We will also plot the loss metrics to check our model’s performance.

PlottingLossMetrics
Again, we plot the loss on both the training (“loss”) and test (“val_loss”) data. This is done
using the following code:

plot.subplot(2,1,2)
plot.plot(history.history['loss'])
plot.plot(history.history['val_loss
']) plot.title('model loss')
plot.ylabel('loss')
plot.xlabel('epoch')
plot.legend(['train', 'test'], loc='upper right')

The output of this code is shown below:

18
Deep Learning with Keras

As you can see in the diagram, the loss on the training set decreases rapidly for the first
two epochs. For the test set, the loss does not decrease at the same rate as the training
set, but remains almost flat for multiple epochs. This means our model is generalizing well
to unseen data.
Now, we will use our trained model to predict the digits in our test data.

19
8. Deep Learning with Keras ― Predicting on Deep Learning with Keras

Test Data

To predict the digits in an unseen data is very easy. You simply need to call the
predict_classes method of the model by passing it to a vector consisting of your
unknown data points.

predictions = model.predict_classes(X_test)

The method call returns the predictions in a vector that can be tested for 0’s and 1’s
against the actual values. This is done using the following two statements:

correct_predictions = np.nonzero(predictions == y_test)


[0] incorrect_predictions = np.nonzero(predictions !=
y_test)[0]
Finally, we will print the count of correct and incorrect predictions using the following two
program statements:

print(len(correct_predictions)," classified correctly")


print(len(incorrect_predictions)," classified incorrectly")

When you run the code, you will get the following output:

9837 classified correctly


163 classified incorrectly

Now, as you have satisfactorily trained the model, we will save it for future use.

20
11. Deep Learning with Keras ― Saving Model
Deep Learning with Keras

We will save the trained model in our local drive in the models folder in our current
working directory. To save the model, run the following code:

directory = "./models/"
name = 'handwrittendigitrecognition.h5'
path = os.path.join(save_dir, name)
model.save(path)
print('Saved trained model at %s ' % path)

The output after running the code is shown below:

Now, as you have saved a trained model, you may use it later on for processing your
unknown data.

21
12. Deep Learning with Keras ― Loading Model Deep Learning with Keras

for Predictions

To predict the unseen data, you first need to load the trained model into the memory. This
is done using the following command:

model = load_model ('./models/handwrittendigitrecognition.h5')

Note that we are simply loading the .h5 file into memory. This sets up the entire neural
network in memory along with the weights assigned to each layer.

Now, to do your predictions on unseen data, load the data, let it be one or more items,
into the memory. Preprocess the data to meet the input requirements of our model as
what you did on your training and test data above. After preprocessing, feed it to your
network. The model will output its prediction.

22
13. Deep Learning with Keras ― Conclusion
Deep Learning with Keras

Keras provides a high level API for creating deep neural network. In this tutorial, you
learned to create a deep neural network that was trained for finding the digits in
handwritten text. A multi-layer network was created for this purpose. Keras allows you to
define an activation function of your choice at each layer. Using gradient descent, the
network was trained on the training data. The accuracy of the trained network in predicting
the unseen data was tested on the test data. You learned to plot the accuracy and error
metrics. After the network is fully trained, you saved the network model for future use.

23
Classifying movie reviews:
binary classification example
Two-class classification, or binary classification, may be the most widely applied kindof machine-learning
problem. In this example, you’ll learn to classify movie reviews as positive or negative, based on the text
content of the reviews.

The IMDB dataset


You’ll work with the IMDB dataset: a set of 50,000 highly polarized reviews from the Internet Movie
Database. They’re split into 25,000 reviews for training and 25,000 reviews for testing, each set consisting of
50% negative and 50% positive reviews.
Just like the MNIST dataset, the IMDB dataset comes packaged with Keras. It has already been
preprocessed: the reviews (sequences of words) have been turned into sequences of integers, where each
integer stands for a specific word in a dictionary.
The following code will load the dataset (when you run it the first time, about 80 MB of data will be
downloaded to your machine).
Loading the IMDB dataset
from keras.datasets import imdb

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

The argument num_words=10000 means you’ll only keep the top 10,000 most frequently occurring
words in the training data. Rare words will be discarded.
This allows you to work with vector data of manageable size. The variables train_data and test_data are
lists of reviews; each review is a list of word indices (encoding a sequence of words). train_labels and
test_labels are lists of 0s and 1s, where 0 stands for negative and 1 stands for positive:
>>> train_data[0] [1, 14, 22, 16, ... 178, 32] >>> train_labels[0] 1

Preparing the data


You can’t feed lists of integers into a neural network. You have to turn your lists into tensors. There are
two ways to do that:
 Pad your lists so that they all have the same length, turn them into an integer tensor of shape
(samples, word_indices), and then use as the first layer in your network a layer capable of
handling such integer tensors
 One-hot encode your lists to turn them into vectors of 0s and 1s. This would, mean, for instance,
turning the sequence [3, 5] into a 10,000-dimensional vector that would be all 0s except for
indices 3 and 5, which would be 1s. Then you could use as the first layer in your network a Dense
layer, capable of handling floating-point vector data.
Let’s go with the latter solution to vectorize the data, which you’ll do manually for maximum clarity.

Encoding the integer sequences into a binary matrix


Building your network
The input data is vectors, and the labels are scalars (1s and 0s): this is the easiest setup you’ll ever
encounter. A type of network that performs well on such a problem is a simple stack of fully connected
(Dense) layers with relu activations:
Dense(16, activation='relu').

The argument being passed to each Dense layer (16) is the number of hidden units of the layer. A hidden
unit is a dimension in the representation space of the layer.
Dense layer with a relu activation implements the following chain of tensor operations:
output = relu(dot(W, input) + b)

Having 16 hidden units means the weight matrix W will have shape (input_dimension, 16): the dot
product with W will project the input data onto a 16-dimensional representation space (and then you’ll
add the bias vector b and apply the relu operation).
You can intuitively understand the dimensionality of your representation space as “how much freedom
you’re allowing the network to have when learning internal representations.” Having more hidden units (a
higher-dimensional representation space) allows your network to learn more-complex representations, but
it makes the network more computationally expensive and may lead to learning unwanted patterns
(patterns that will improve performance on the training data but not on the test data).

Two intermediate layers with 16 hidden units each


A third layer that will output the scalar prediction regarding the sentiment of the current review
The intermediate layers will use relu as their activation function, and the final layer will use a sigmoid
activation so as to output a probability (a score between 0 and 1,indicating how likely the sample is to
have the target “1”: how likely the review is to be positive).
A relu (rectified linear unit) is a function meant to zero out negative values (see figure 3.4), whereas a
sigmoid “squashes” arbitrary values into the [0, 1] interval (see figure 3.5), outputting something that can
be interpreted as a probability
Figure 3.6 shows what the network looks like
Finally, you need to choose a loss function and an optimizer. Because you’re facing a binary
classification problem and the output of your network is a probability (you end your network with a
single-unit layer with a sigmoid activation), it’s best to use the binary_crossentropy loss.
It isn’t the only viable choice: you could use, for instance, mean_squared_error. But crossentropy is
usually the best choice when you’re dealing with models that output probabilities.
Crossentropy is a quantity from the field of Information Theory that measures the distance between
probability distributions or, in this case, between the ground-truth distribution and your predictions.
Here’s the step where you configure the model with the rmsprop optimizer and the binary_crossentropy
loss function. Note that you’ll also monitor accuracy during training

You’re passing your optimizer, loss function, and metrics as strings, which is possible because rmsprop,
binary_crossentropy, and accuracy are packaged as part of Keras.

Sometimes you may want to configure the parameters of your optimizer or pass a custom loss function or
metric function. The former can be done by passing an optimizer class instance as the optimizer
argument, as shown in listing 3.5; the latter can be done by passing function objects as the loss and/or
metrics arguments, as shown in listing 3.6.
Validating your approach
In order to monitor during training the accuracy of the model on data it has never seen before, you’ll
create a validation set by setting apart 10,000 samples from the original training data
x_val = x_train[:10000]
partial_x_train = x_train[10000:]

y_val = y_train[:10000]
partial_y_train = y_train[10000:]

You’ll now train the model for 20 epochs (20 iterations over all samples in the x_train and y_train
tensors), in mini-batches of 512 samples. At the same time, you’ll monitor loss and accuracy on the
10,000 samples that you set apart. You do so by passing the validation data as the validation_data
argument.

On CPU, this will take less than 2 seconds per epoch—training is over in 20 seconds. At the end of every
epoch, there is a slight pause as the model computes its loss and accuracy on the 10,000 samples of the
validation data.
Note that the call to model.fit() returns a History object. This object has a member history, which is a
dictionary containing data about everything that happened during training.
Let’s look at it:
>>> history_dict = history.history
>>> history_dict.keys()
[u'acc', u'loss', u'val_acc', u'val_loss']

The dictionary contains four entries: one per metric that was being monitored during training and during
validation. In the following two listing, let’s use Matplotlib to plot the training and validation loss side by
side (see figure 3.7), as well as the training and validation accuracy (see figure 3.8). Note that your own
results may vary slightly due to a different random initialization of your network.
As you can see, the training loss decreases with every epoch, and the training accuracy increases with
every epoch. That’s what you would expect when running gradientdescent optimization—the quantity
you’re trying to minimize should be less with every iteration. But that isn’t the case for the validation loss
and accuracy: they seem to peak at the fourth epoch. This is an example of what we warned against
earlier: a model that performs better on the training data isn’t necessarily a model that will do better on
data it has never seen before. In precise terms, what you’re seeing is overfitting: after the second epoch,
you’re overoptimizing on the training data, and you end up learning representations that are specific to the
training data and don’t generalize to data outside of the training set. In this case, to prevent overfitting,
you could stop training after three epochs. In general, you can use a range of techniques to mitigate
overfitting
Let’s train a new network from scratch for four epochs and then evaluate it on the test data

This fairly naive approach achieves an accuracy of 88%. With state-of-the-art approaches, you should be
able to get close to 95%

Using a trained network to generate predictions on new data


After having trained a network, you’ll want to use it in a practical setting. You can generate the likelihood
of reviews being positive by using the predict method
Classifying newswires: a multiclass classification example
you’ll build a network to classify Reuters newswires into 46 mutually exclusive topics. Because you have
many classes, this problem is an instance of multiclass classification; and because each data point should
be classified into only one category, the problem is more specifically an instance of single-label,
multiclass classification. If each data point could belong to multiple categories (in this case, topics), you’d
be facing a multilabel, multiclass classification problem.
The Reuters dataset
You’ll work with the Reuters dataset, a set of short newswires and their topics, published by Reuters in
1986. It’s a simple, widely used toy dataset for text classification. There are 46 different topics; some
topics are more represented than others, but each topic has at least 10 examples in the training set. Like
IMDB and MNIST, the Reuters dataset comes packaged as part of Keras. Let’s take a look.

To vectorize the labels, there are two possibilities: you can cast the label list as an integer tensor, or you
can use one-hot encoding. One-hot encoding is a widely used format for categorical data, also called
categorical encoding. For a more detailed explanation of one-hot encoding, see section 6.1. In this case,
one-hot encoding of the labels consists of embedding each label as an all-zero vector with a 1 in the place
of the label index. Here’s an example
Building your network
This topic-classification problem looks similar to the previous movie-review classification problem: in
both cases, you’re trying to classify short snippets of text. But there is a new constraint here: the number
of output classes has gone from 2 to 46. The dimensionality of the output space is much larger. In a stack
of Dense layers like that you’ve been using, each layer can only access information present in the output
of the previous layer. If one layer drops some information relevant to the classification problem, this
information can never be recovered by later layers: each layer can potentially become an information
bottleneck. In the previous example, you used 16-dimensional intermediate layers, but a 16-dimensional
space may be too limited to learn to separate 46 different classes: such small layers may act as
information bottlenecks, permanently dropping relevant information. For this reason you’ll use larger
layers. Let’s go with 64 units.

There are two other things you should note about this architecture:
You end the network with a Dense layer of size 46. This means for each input sample, the network will
output a 46-dimensional vector. Each entry in this vector (each dimension) will encode a different output
class.
The last layer uses a softmax activation. You saw this pattern in the MNIST example. It means the
network will output a probability distribution over the 46 different output classes—for every input
sample, the network will produce a 46- dimensional output vector, where output[i] is the probability that
the sample belongs to class i. The 46 scores will sum to 1. The best loss function to use in this case is
categorical_crossentropy. It measures the distance between two probability distributions: here, between
the probability distribution output by the network and the true distribution of the labels. By minimizing
the distance between these two distributions you train the network to output something as close as
possible to the true labels.

Validating your approach


Let’s set apart 1,000 samples in the training data to use as a validation set
The network begins to overfit after nine epochs. Let’s train a new network from scratch for nine epochs
and then evaluate it on the test set

This approach reaches an accuracy of ~80%. With a balanced binary classification problem, the accuracy
reached by a purely random classifier would be 50%. But in this case it’s closer to 19%, so the results
seem pretty good, at least when compared to a random baseline
Generating predictions on new data You can verify that the predict method of the model instance returns a
probability distribution over all 46 topics. Let’s generate topic predictions for all of the test data.

You might also like