0% found this document useful (0 votes)
2 views10 pages

DLT Experiment 3

The document outlines an experiment to design a neural network for classifying Reuters newswires into 46 topics, focusing on multi-class, single-label classification. It details the use of the Reuters dataset, data preparation, network architecture, and training methodology, including the use of larger layers and softmax activation for output. The experiment concludes with validation and evaluation of the network's performance on the test set.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views10 pages

DLT Experiment 3

The document outlines an experiment to design a neural network for classifying Reuters newswires into 46 topics, focusing on multi-class, single-label classification. It details the use of the Reuters dataset, data preparation, network architecture, and training methodology, including the use of larger layers and softmax activation for output. The experiment concludes with validation and evaluation of the network's performance on the test set.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

EXPERIMENT-3

Aim: Design a Neural network for classifying news


wires (Multi class classification) using Reuters dataset.
Theory and code:
In the previous experiment, you saw how to
classify vector inputs into two mutually exclusive
classes using a densely connected neural network.
But what happens when you have more than two
classes?
In this experiment, you’ll build a network to
classify Reuters newswires into 46 mutually
exclusive topics. Because you have many classes,
this problem is an instance of multi- class
classification; and because each data point
should be classified into only one category, the
problem is more specifically an instance of single-
label, multiclass classification. If each data point
could belong to multiple categories (in this case,
topics), you’d be facing a multilabel, multiclass
classification problem.
The Reuters dataset

You’ll work with the Reuters dataset, a set of


short newswires and their topics, published by
Reuters in 1986. It’s a simple, widely used toy
dataset for text classification. There are 46
different topics; some topics are more
represented than others, but each topic has at
least 10 examples in the training set.
Like IMDB and MNIST, the Reuters dataset
comes packaged as part of Keras. Let’s take a
look.
As with the IMDB dataset, the argument num_words=10000
restricts the data to the 10,000 most frequently occurring words
found in the data.
You have 8,982 training examples and 2,246 test examples:

As with the IMDB reviews, each example is a list of integers (word


indices):
#Decoding newswires back to text
The label associated with an example is an integer
between 0 and 45—a topic index:

Preparing the data


You can vectorize the data with the exact same code as
in the previous example.
Encoding the data
To vectorize the labels, there are two possibilities: you
can cast the label list as an inte- ger tensor, or you can
use one-hot encoding. One-hot encoding is a widely
used for- mat for categorical data, also called
categorical encoding. In this case, one-hot encoding of
the labels consists of embedding each label as an all-
zero vector with a 1 in the place of the label index.

Building your network


This topic-classification problem looks similar to the
previous movie-review classifica- tion problem: in both
cases, you’re trying to classify short snippets of text.
But there is a new constraint here: the number of
output classes has gone from 2 to 46. The
dimensionality of the output space is much larger.
In a stack of Dense layers like that you’ve been using,
each layer can only access infor- mation present in the
output of the previous layer. If one layer drops some
information relevant to the classification problem, this
information can never be recovered by later layers:
each layer can potentially become an information
bottleneck. In the previous example, you used 16-
dimensional intermediate layers, but a 16-dimensional
space may be too limited to learn to separate 46
different classes: such small layers may act as infor-
mation bottlenecks, permanently dropping relevant
information.
For this reason you’ll use larger layers. Let’s go with 64
units.

There are two other things you should note about this
architecture:
You end the network with a Dense layer of size 46.
This means for each input sample, the network will
output a 46-dimensional vector. Each entry in this vec-
tor (each dimension) will encode a different output
class.
The last layer uses a softmax activation. You saw
this pattern in the MNIST example. It means the
network will output a probability distribution over the
46 different output classes—for every input sample,
the network will produce a 46- dimensional output
vector, where output[i] is the probability that the
sample belongs to class i. The 46 scores will sum to 1.
The best loss function to use in this case is
categorical_crossentropy. It measures the distance
between two probability distributions: here, between
the probability distribution output by the network and
the true distribution of the labels. By minimizing the
distance between these two distributions, you train the
network to output something as close as possible to
the true labels.

Validating your approach

Let’s set apart 1,000 samples in the training data


to use as a validation set.
Plotting the training and validation accuracy
The network begins to overfit after nine epochs. Let’s
train a new network from scratch for nine epochs and
then evaluate it on the test set.

Conclusions:

You might also like