Class Notes DL Unit 2
Class Notes DL Unit 2
TensorFlow, Theano and CNTK, Setting up Deep Learning Workstation, Classifying Movie
Reviews: Binary Classification, Classifying newswires: Multiclass Classification.
As a result, we can say that ANNs are composed of multiple nodes. That imitate biological
neurons of the human brain. Although, we connect these neurons by links. Also, they interact
with each other. Although, nodes are used to take input data. Further, perform simple
operations on the data. As a result, these operations are passed to other neurons. Also, output
at each node is called its activation or node value.
Neurons: These are the basic units of a neural network. They receive input from other
neurons and perform a mathematical operation on the input to produce an output.
Weights: These are the values that determine the strength of the connections between
neurons. They are adjusted during training to optimize the performance of the network.
Bias: Bias is an additional parameter that is added to the input of each neuron. It
allows the neuron to adjust the output based on a specific threshold.
Backpropagation: This is the process of adjusting the weights and biases of the
network during training to minimize the difference between the predicted output and
the actual output. It is an iterative process that continues until the network produces
accurate predictions.
As each link is associated with weight. Also, they are capable of learning. That takes place by
altering weight values. Hence, the following illustration shows a simple ANN
Feedforward ANN
In this network flow of information is unidirectional. A unit used to send information to another
unit that does not receive any information. Also, no feedback loops are present in this.
Although, used in recognition of a pattern. As they contain fixed inputs and outputs.
Feedback ANN: In this Artificial Neural Network, it allows feedback loops.Also, used in
content addressable memories.
As illustrated in figure 3.1: the network, composed of layers that are chained together, maps
the input data to predictions. The loss function then compares these predictions to the targets,
producing a loss value: a measure of how well the network’s predictions match what was
expected. The optimizer uses this loss value to update the network’s weights.
For instance,
Simple vector data, stored in 2D tensors of shape (samples, features), is often processed by
densely connected layers, also called fully connected or dense layers (the Dense class in
Keras).
Sequence data, stored in 3D tensors of shape (samples, timesteps, features), is typically
processed by recurrent layers such as an LSTM layer.
Image data, stored in 4D tensors, is usually processed by 2D convolution layers (Conv2D).
We’re creating a layer that will only accept as input 2D tensors where the first dimension is
784 (axis 0, the batch dimension, is unspecified, and thus any value would be accepted).
This layer will return a tensor where the first dimension has been transformed to be 32. Thus
this layer can only be connected to a downstream layer that expects 32- dimensional vectors
as its input. When using Keras, you don’t have to worry about compatibility, because the
layers you add to your models are dynamically built to match the shape of the incoming layer.
The second layer didn’t receive an input shape argument—instead, it automatically inferred its
input shape as being the output shape of the layer that came before.
Introduction to Keras
Keras is a deep-learning framework for Python that provides a convenient way to
define and train almost any kind of deep-learning model. Keras was initially
developed for researchers, with the aim of enabling fast experimentation.
Keras is one of the most powerful and easy to use python library, which is built on top of
popular deep learning libraries like TensorFlow, Theano, etc., for creating deep learning
models.
Overview of Keras:
Keras is a model-level library, providing high-level building blocks for developing
deep-learning models. It doesn’t handle low-level operations such as tensor
manipula- tion and differentiation. Instead, it relies on a specialized, well-optimized
tensor library to do so, serving as the backend engine of Keras. Rather than choosing
a single tensor library and tying the implementation of Keras to that library, Keras
handles the problem in a modular way (see figure 3.3); thus several different backend
engines can be plugged seamlessly into Keras. Currently, the three existing backend
implementa- tions are the TensorFlow backend, the Theano backend, and the
Microsoft Cognitive Toolkit (CNTK) backend. In the future, it’s likely that Keras will
be extended to work with even more deep-learning execution engines.
Keras runs on top of open-source machine libraries like TensorFlow, Theano or Cognitive
Toolkit (CNTK). Theano is a python library used for fast numerical computation tasks.
TensorFlow is the most famous symbolic math library used for creating neural networks
and deep learning models. TensorFlow is very flexible and the primary benefit is distributed
computing. CNTK is deep learning framework developedby Microsoft. It uses libraries such
as Python, C#, C++, or standalone machine learning toolkits. Theano and TensorFlow are
very powerful libraries but difficult to understand for creating neural networks. Keras is based
on minimal structure that provides a clean and easy way to create deep learning models based
on TensorFlow or Theano. Keras isdesigned to quickly define deep learning models. Well,
Keras is an optimal choice for deep learning applications.
Features:
Keras leverages various optimization techniques to make high level neural network API
easier and more performant.
It supports the following features:
Consistent, simple and extensible API.
Minimal structure - easy to achieve the result without any frills.
It supports multiple platforms and backends.
It is user friendly framework which runs on both CPU and GPU.
Highly scalability of computation.
Benefits:
Keras is highly powerful and dynamic framework and comes up with the followingadvantages:
Larger community support.
Easy to test.
Keras neural networks are written in Python which makes things simpler.
Keras supports both convolution and recurrent networks.
Figure 3.2 Google web search interest for different deep-learning frameworks over time
TensorFlow :
TensorFlow is an open-source machine learning library used for numerical computational tasks
developed by Google. Keras is a high level API built on top of TensorFlow or Theano. We
know already how to install TensorFlow using pip. If it is not installed, you can install using
the below command:
Once we execute keras, we could see the configuration file is located at your homedirectory inside
and go to .keras/keras.json.
{
"image_data_format":
"channels_last", "epsilon": 1e-07,
"floatx": "float32", "backend":
"tensorflow"
keras.json:
Here,
Theano:
Theano is an open-source deep learning library that allows you to evaluate multidimensional
arrays effectively. We can easily install using the below command:
keras.json
"image_data_format":"channeIs_Iast",
"epsiIon": 1e−07,
Now save your file, restart your terminal and start keras, your backend will bechanged.
Overall, CNTK is a powerful and flexible deep learning framework that is particularly well-
suited for large-scale distributed training. Its support for multiple programming languages and
platforms makes it easy to integrate with other tools and systems, and its pre-trained models
and customizable neural network architecture make it a great choice for a wide variety of
machine learning tasks.
Setting up Deep learning workstation
PC Hardware Setup:
First of all to perform machine learning and deep learning on any dataset, the
software/program requires a computer system powerful enough to handle thecomputing power
necessary.
So the following is required:
Central Processing Unit (CPU) — Intel Core i5 6th Generation processor or higher.
An AMD equivalent processor will also be optimal.
RAM — 8 GB minimum, 16 GB or higher is recommended.
Graphics Processing Unit (GPU) — NVIDIA GeForce GTX 960 or
higher.AMD GPUs are not able to perform deep learning regardless. For more
information on NVIDIA GPUs for deep learning please visit
https://developer.nvidia.com/cuda- gpus.
Operating System — Ubuntu or Microsoft Windows 10. I recommend
updating Windows 10 to the latest version before proceeding forward.
Note: In the case of laptops, the ideal option would be to purchase a gaming laptop from any
vendor deemed suitable such as Alienware, ASUS, Lenovo Legion, AcerPredator, etc.
Note that in order to use Keras, you need to install TensorFlow or CNTK or Theano (or all of
them, if you want to be able to switch back and forth among the three backends
Software Requirements:
Jupyter notebooks: the preferred way to run deep-learning experiments .We recommend
using Jupyter notebooks to get started with Keras, although that isn’t a requirement: you can
also run standalone Python scripts or run code from within an IDE such as PyCharm.
Deep Learning with Keras
Setting Up Project
We will use Jupyter through Anaconda navigator for our project. As our project uses
TensorFlow and Keras, you will need to install those in Anaconda setup. To install
Tensorflow, run the following command in your console window:
Starting Jupyter
When you start the Anaconda navigator, you would see the following opening screen.
1
Deep Learning with Keras
Click ‘Jupyter’ to start it. The screen will show up the existing projects, if any, on your
drive.
The screenshot of the menu selection is shown for your quick reference:
2
2. Deep Learning with Keras — Importing Deep Learning with Keras
Libraries
We first import the various libraries required by the code in our project.
import numpy as np
import matplotlib
import matplotlib.pyplot as plot
SuppressingWarnings
As both Tensorflow and Keras keep on revising, if you do not sync their appropriate
versions in the project, at runtime you would see plenty of warning errors. As they distract
your attention from learning, we shall be suppressing all the warnings in this project. This
is done with the following lines of code:
Keras
We use Keras libraries to import dataset. We will use the mnist dataset for handwritten
digits. We import the required package using the following statement:
We will be defining our deep learning neural network using Keras packages. We
import the Sequential, Dense, Dropout and Activation packages for defining the
network architecture. We use load_model package for saving and retrieving our model.
We also use np_utils for a few utilities that we need in our project. These imports are
done with the following program statements:
When you run this code, you will see a message on the console that says that Keras uses
TensorFlow at the backend. The screenshot at this stage is shown here:
Now, as we have all the imports required by our project, we will proceed to define the
architecture for our Deep Learning network.
4
3. Deep Learning with Keras — Creating Deep Deep Learning with Keras
Learning Model
Our neural network model will consist of a linear stack of layers. To define such a model,
we call the Sequential function:
model = Sequential()
Input Layer
We define the input layer, which is the first layer in our network using the following
program statement:
model.add(Dense(512, input_shape=(784,)))
This creates a layer with 512 nodes (neurons) with 784 input nodes. This is depicted in
the figure below:
Note that all the input nodes are fully connected to the Layer 1, that is each input node is
connected to all 512 nodes of Layer 1.
Next, we need to add the activation function for the output of Layer 1. We will use ReLU
as our activation. The activation function is added using the following program statement:
5
model.add(Activation('relu'))
Next, we add Dropout of 20% using the statement below. Dropout is a technique used to
prevent model from overfitting.
model.add(Dropout(0.2))
At this point, our input layer is fully defined. Next, we will add a hidden layer.
HiddenLayer
Our hidden layer will consist of 512 nodes. The input to the hidden layer comes from our
previously defined input layer. All the nodes are fully connected as in the earlier case. The
output of the hidden layer will go to the next layer in the network, which is going to
be our final and output layer. We will use the same ReLU activation as for the previous
layer and a dropout of 20%. The code for adding this layer is given here:
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
6
Next, we will add the final layer to our network, which is the output layer. Note that you
may add any number of hidden layers using the code similar to the one which you have
used here. Adding more layers would make the network complex for training; however,
giving a definite advantage of better results in many cases though not all.
Output Layer
The output layer consists of just 10 nodes as we want to classify the given images in 10
distinct digits. We add this layer, using the following statement:
model.add(Dense(10))
As we want to classify the output in 10 distinct units, we use the softmax activation. In
case of ReLU, the output is binary. We add the activation using the following statement:
model.add(Activation('softmax'))
At this point, our network can be visualized as shown in the below diagram:
At this point, our network model is fully defined in the software. Run the code cell and if
there are no errors, you will get a confirmation message on the screen as shown in the
screenshot below:
7
Next, we need to compile the model.
8
4. Deep Learning with Keras — Compiling the Deep Learning with Keras
Model
The compilation is performed using one single method call called compile.
model.compile(loss='categorical_crossentropy', metrics=['accuracy'],
optimizer='adam')
The compile method requires several parameters. The loss parameter is specified to have
type 'categorical_crossentropy'. The metrics parameter is set to 'accuracy' and finally
we use the adam optimizer for training the network. The output at this stage is shown
below:
LoadingData
As said earlier, we will use the mnist dataset provided by Keras. When we load the data
into our system, we will split it in the training and test data. The data is loaded by calling
the load_data method as follows:
9
Deep Learning with Keras
plot.subplot(3,5,i+
1)
plot.tight_layout()
plot.imshow(X_train[i], cmap='gray',
interpolation='none') plot.title("Digit:
{}".format(y_train[i]))
plot.xticks([])
plot.yticks([])
In an iterative loop of 10 counts, we create a subplot on each iteration and show an image
from X_train vector in it. We title each image from the corresponding y_train vector.
Note that the y_train vector contains the actual values for the corresponding image in
X_train vector. We remove the x and y axes markings by calling the two methods xticks
and yticks with null argument. When you run the code, you would see the following
output:
10
5. Deep Learning with Keras ― Preparing Data Deep Learning with Keras
Before we feed the data to our network, it must be converted into the format required by
the network. This is called preparing data for the network. It generally consists of
converting a multi-dimensional input to a single-dimension vector and normalizing the
data points.
X_train = X_train.reshape(60000,
784) X_test = X_test.reshape(10000,
784)
Now, our training vector will consist of 60000 data points, each consisting of a single
dimension vector of size 784. Similarly, our test vector will consist of 10000 data points
of a single-dimension vector of size 784.
Normalizing Data
The data that the input vector contains currently has a discrete value between 0 and 255
- the gray scale levels. Normalizing these pixel values between 0 and 1 helps in speeding
up the training. As we are going to use stochastic gradient descent, normalizing data will
also help in reducing the chance of getting stuck in local optima.
To normalize the data, we represent it as float type and divide it by 255 as shown in the
following code snippet:
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
ExaminingNormalizedData
To view the normalized data, we will call the histogram function as shown here:
plot.hist(X_train[0])
plot.title("Digit: {}".format(y_train[0]))
Here, we plot the histogram of the first element of the X_train vector. We also print the
digit represented by this data point. The output of running the above code is shown here:
11
Deep Learning with Keras
You will notice a thick density of points having value close to zero. These are the black dot
points in the image, which obviously is the major portion of the image. The rest of the
gray scale points, which are close to white color, represent the digit. You may check out
the distribution of pixels for another digit. The code below prints the histogram of a
digit at index of 2 in the training dataset.
plot.hist(X_train[2])
plot.title("Digit: {}".format(y_train[2])
12
Deep Learning with Keras
Comparing the above two figures, you will notice that the distribution of the white
pixels in two images differ indicating a representation of a different digit - “5” and “4” in
the above two pictures.
Next, we will examine the distribution of data in our full training dataset.
Use the following command to print the number of unique values and the number of
occurrences of each one:
print(np.unique(y_train, return_counts=True))
When you run the above command, you will see the following output:
Encoding Data
We have ten categories in our dataset. We will thus encode our output in these ten
categories using one-hot encoding. We use to_categorial method of Numpy utilities to
perform encoding. After the output data is encoded, each data point would be converted
into a single dimensional vector of size 10. For example, digit 5 will now be
represented as [0,0,0,0,0,1,0,0,0,0].
n_classes = 10
Y_train = np_utils.to_categorical(y_train, n_classes)
You may check out the result of encoding by printing the first 5 elements of the categorized
Y_train vector.
13
Deep Learning with Keras
for i in range(5):
print (Y_train[i])
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
The first element represents digit 5, the second represents digit 0, and so on.
Finally, you will have to categorize the test data too, which is done using the following
statement:
At this stage, your data is fully prepared for feeding into the network.
Next, comes the most important part and that is training our network model.
14
6. Deep Learning with Keras — Training the Deep Learning with Keras
Model
The model training is done in one single method call called fit that takes few parameters
as seen in the code below:
The epochs is set to 20; we assume that the training will converge in max 20 epochs -
the iterations. The trained model is validated on the test data as specified in the last
parameter.
The screenshot of the output is given below for your quick reference:
15
Deep Learning with Keras
Now, as the model is trained on our training data, we will evaluate its performance.
16
7. Deep Learning with Keras ― Evaluating Model Deep Learning with Keras
Performance
We will print the loss and accuracy using the following two statements:
When you run the above statements, you would see the following output:
This shows a test accuracy of 98%, which should be acceptable to us. What it means to
us that in 2% of the cases, the handwritten digits would not be classified correctly. We will
also plot accuracy and loss metrics to see how the model performs on the test data.
PlottingAccuracyMetrics
We use the recorded history during our training to get a plot of accuracy metrics. The
following code will plot the accuracy on each epoch. We pick up the training data accuracy
(“acc”) and the validation data accuracy (“val_acc”) for plotting.
plot.subplot(2,1,1)
plot.plot(history.history['acc'])
plot.plot(history.history['val_acc'
]) plot.title('model accuracy')
plot.ylabel('accuracy')
plot.xlabel('epoch')
plot.legend(['train', 'test'], loc='lower right')
17
Deep Learning with Keras
As you can see in the diagram, the accuracy increases rapidly in the first two epochs,
indicating that the network is learning fast. Afterwards, the curve flattens indicating that
not too many epochs are required to train the model further. Generally, if the training data
accuracy (“acc”) keeps improving while the validation data accuracy (“val_acc”) gets
worse, you are encountering overfitting. It indicates that the model is starting to memorize
the data.
We will also plot the loss metrics to check our model’s performance.
PlottingLossMetrics
Again, we plot the loss on both the training (“loss”) and test (“val_loss”) data. This is done
using the following code:
plot.subplot(2,1,2)
plot.plot(history.history['loss'])
plot.plot(history.history['val_loss
']) plot.title('model loss')
plot.ylabel('loss')
plot.xlabel('epoch')
plot.legend(['train', 'test'], loc='upper right')
18
Deep Learning with Keras
As you can see in the diagram, the loss on the training set decreases rapidly for the first
two epochs. For the test set, the loss does not decrease at the same rate as the training
set, but remains almost flat for multiple epochs. This means our model is generalizing well
to unseen data.
Now, we will use our trained model to predict the digits in our test data.
19
8. Deep Learning with Keras ― Predicting on Deep Learning with Keras
Test Data
To predict the digits in an unseen data is very easy. You simply need to call the
predict_classes method of the model by passing it to a vector consisting of your
unknown data points.
predictions = model.predict_classes(X_test)
The method call returns the predictions in a vector that can be tested for 0’s and 1’s
against the actual values. This is done using the following two statements:
When you run the code, you will get the following output:
Now, as you have satisfactorily trained the model, we will save it for future use.
20
11. Deep Learning with Keras ― Saving Model
Deep Learning with Keras
We will save the trained model in our local drive in the models folder in our current
working directory. To save the model, run the following code:
directory = "./models/"
name = 'handwrittendigitrecognition.h5'
path = os.path.join(save_dir, name)
model.save(path)
print('Saved trained model at %s ' % path)
Now, as you have saved a trained model, you may use it later on for processing your
unknown data.
21
12. Deep Learning with Keras ― Loading Model Deep Learning with Keras
for Predictions
To predict the unseen data, you first need to load the trained model into the memory. This
is done using the following command:
Note that we are simply loading the .h5 file into memory. This sets up the entire neural
network in memory along with the weights assigned to each layer.
Now, to do your predictions on unseen data, load the data, let it be one or more items,
into the memory. Preprocess the data to meet the input requirements of our model as
what you did on your training and test data above. After preprocessing, feed it to your
network. The model will output its prediction.
22
13. Deep Learning with Keras ― Conclusion
Deep Learning with Keras
Keras provides a high level API for creating deep neural network. In this tutorial, you
learned to create a deep neural network that was trained for finding the digits in
handwritten text. A multi-layer network was created for this purpose. Keras allows you to
define an activation function of your choice at each layer. Using gradient descent, the
network was trained on the training data. The accuracy of the trained network in predicting
the unseen data was tested on the test data. You learned to plot the accuracy and error
metrics. After the network is fully trained, you saved the network model for future use.
23
Classifying movie reviews:
binary classification example
Two-class classification, or binary classification, may be the most widely applied kindof machine-learning
problem. In this example, you’ll learn to classify movie reviews as positive or negative, based on the text
content of the reviews.
The argument num_words=10000 means you’ll only keep the top 10,000 most frequently occurring
words in the training data. Rare words will be discarded.
This allows you to work with vector data of manageable size. The variables train_data and test_data are
lists of reviews; each review is a list of word indices (encoding a sequence of words). train_labels and
test_labels are lists of 0s and 1s, where 0 stands for negative and 1 stands for positive:
>>> train_data[0] [1, 14, 22, 16, ... 178, 32] >>> train_labels[0] 1
The argument being passed to each Dense layer (16) is the number of hidden units of the layer. A hidden
unit is a dimension in the representation space of the layer.
Dense layer with a relu activation implements the following chain of tensor operations:
output = relu(dot(W, input) + b)
Having 16 hidden units means the weight matrix W will have shape (input_dimension, 16): the dot
product with W will project the input data onto a 16-dimensional representation space (and then you’ll
add the bias vector b and apply the relu operation).
You can intuitively understand the dimensionality of your representation space as “how much freedom
you’re allowing the network to have when learning internal representations.” Having more hidden units (a
higher-dimensional representation space) allows your network to learn more-complex representations, but
it makes the network more computationally expensive and may lead to learning unwanted patterns
(patterns that will improve performance on the training data but not on the test data).
You’re passing your optimizer, loss function, and metrics as strings, which is possible because rmsprop,
binary_crossentropy, and accuracy are packaged as part of Keras.
Sometimes you may want to configure the parameters of your optimizer or pass a custom loss function or
metric function. The former can be done by passing an optimizer class instance as the optimizer
argument, as shown in listing 3.5; the latter can be done by passing function objects as the loss and/or
metrics arguments, as shown in listing 3.6.
Validating your approach
In order to monitor during training the accuracy of the model on data it has never seen before, you’ll
create a validation set by setting apart 10,000 samples from the original training data
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]
You’ll now train the model for 20 epochs (20 iterations over all samples in the x_train and y_train
tensors), in mini-batches of 512 samples. At the same time, you’ll monitor loss and accuracy on the
10,000 samples that you set apart. You do so by passing the validation data as the validation_data
argument.
On CPU, this will take less than 2 seconds per epoch—training is over in 20 seconds. At the end of every
epoch, there is a slight pause as the model computes its loss and accuracy on the 10,000 samples of the
validation data.
Note that the call to model.fit() returns a History object. This object has a member history, which is a
dictionary containing data about everything that happened during training.
Let’s look at it:
>>> history_dict = history.history
>>> history_dict.keys()
[u'acc', u'loss', u'val_acc', u'val_loss']
The dictionary contains four entries: one per metric that was being monitored during training and during
validation. In the following two listing, let’s use Matplotlib to plot the training and validation loss side by
side (see figure 3.7), as well as the training and validation accuracy (see figure 3.8). Note that your own
results may vary slightly due to a different random initialization of your network.
As you can see, the training loss decreases with every epoch, and the training accuracy increases with
every epoch. That’s what you would expect when running gradientdescent optimization—the quantity
you’re trying to minimize should be less with every iteration. But that isn’t the case for the validation loss
and accuracy: they seem to peak at the fourth epoch. This is an example of what we warned against
earlier: a model that performs better on the training data isn’t necessarily a model that will do better on
data it has never seen before. In precise terms, what you’re seeing is overfitting: after the second epoch,
you’re overoptimizing on the training data, and you end up learning representations that are specific to the
training data and don’t generalize to data outside of the training set. In this case, to prevent overfitting,
you could stop training after three epochs. In general, you can use a range of techniques to mitigate
overfitting
Let’s train a new network from scratch for four epochs and then evaluate it on the test data
This fairly naive approach achieves an accuracy of 88%. With state-of-the-art approaches, you should be
able to get close to 95%
To vectorize the labels, there are two possibilities: you can cast the label list as an integer tensor, or you
can use one-hot encoding. One-hot encoding is a widely used format for categorical data, also called
categorical encoding. For a more detailed explanation of one-hot encoding, see section 6.1. In this case,
one-hot encoding of the labels consists of embedding each label as an all-zero vector with a 1 in the place
of the label index. Here’s an example
Building your network
This topic-classification problem looks similar to the previous movie-review classification problem: in
both cases, you’re trying to classify short snippets of text. But there is a new constraint here: the number
of output classes has gone from 2 to 46. The dimensionality of the output space is much larger. In a stack
of Dense layers like that you’ve been using, each layer can only access information present in the output
of the previous layer. If one layer drops some information relevant to the classification problem, this
information can never be recovered by later layers: each layer can potentially become an information
bottleneck. In the previous example, you used 16-dimensional intermediate layers, but a 16-dimensional
space may be too limited to learn to separate 46 different classes: such small layers may act as
information bottlenecks, permanently dropping relevant information. For this reason you’ll use larger
layers. Let’s go with 64 units.
There are two other things you should note about this architecture:
You end the network with a Dense layer of size 46. This means for each input sample, the network will
output a 46-dimensional vector. Each entry in this vector (each dimension) will encode a different output
class.
The last layer uses a softmax activation. You saw this pattern in the MNIST example. It means the
network will output a probability distribution over the 46 different output classes—for every input
sample, the network will produce a 46- dimensional output vector, where output[i] is the probability that
the sample belongs to class i. The 46 scores will sum to 1. The best loss function to use in this case is
categorical_crossentropy. It measures the distance between two probability distributions: here, between
the probability distribution output by the network and the true distribution of the labels. By minimizing
the distance between these two distributions you train the network to output something as close as
possible to the true labels.
This approach reaches an accuracy of ~80%. With a balanced binary classification problem, the accuracy
reached by a purely random classifier would be 50%. But in this case it’s closer to 19%, so the results
seem pretty good, at least when compared to a random baseline
Generating predictions on new data You can verify that the predict method of the model instance returns a
probability distribution over all 46 topics. Let’s generate topic predictions for all of the test data.