Text Classification With Deep Learning - Code
Text Classification With Deep Learning - Code
(https://www.nvidia.com/dli/)
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…task2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 1 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
Introduction
Text classification is a classic problem in Natural Language Processing. Given multiple
individual spans of texts (sentences, paragraphs, documents, etc.), the task is to assign each
span one or multiple labels - or classes - out of k possible ones. Some possible applications
of text classification are:
Genre identification - Does this text contain news, sports, finance, etc?
Language detection - Is this text in English, German or any other language?
Sentiment analysis - What type of sentiment (positive/negative/neutral) is present in this
text? Additionally, if multiple subjects or topics are discussed, what sentiment is
associated with each such subject/topic?
A particular type of text classification is the problem of authorship attribution. In this task we
are given some documents and a set of possible authors. We then assign each document
with a set of the authors who we believe wrote the document. Presumably, we also have a
large set of documents for which we know the authors already so we can extract features and
characteristics to help us with the unknown ones.
1. Building a linguistic style model to extract author-specific features from a set of texts
(known as a corpus)
2. Using these features for building a classification model for authorship attribution
3. Applying the model for identifying the author of a set of unknown documents
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…task2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 2 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
of America. In later years, a list emerged where the author of each one of the 85 papers was
identified. Nevertheless, for a subset of these papers the author is still in question. The
problem of the Federalist Papers authorship attribution has been a subject of much research
in statistical NLP in the past (see the above Wikipedia article for details). We will try to use
Deep Learning to re-create this research.
In concrete terms, the problem is identifying - for each one of the disputed papers - whether
Alexander Hamilton (AH) or James Madison (JM) are the authors. We will assume that each
paper has a single author (i.e., that no collaboration took place) and that each author has a
well-defined writing style that is displayed across all the papers.
Approach
We will take the following approach with this problem:
Use the non-disputed documents as labeled data for an end-to-end model. The model is
composed of two distinct parts (see Figure 1 below):
Use the model to determine the author for each disputed paper
Style Extraction
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…task2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 3 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
Given a sequence of tokens (subsets of the input text), we would like the extractor to return a
representation of this sequence such that sequences with similar stylistic qualities have
similar representations. In other words, we would like to find a mapping from the sequence to
a vector space that uses style properties as its basis. Some possible properties that the
extractor may learn include (but are not limited to):
The above features are well-known in stylometry. We would like, however, for the model to
learn both these and other features that may be applicable.
In order to use sequences we will need models that can deal with time-varying inputs (i.e.,
multiple timesteps with different inputs at each timestep). In our case, we will have one token
every timestep and a fixed sequence length. These will be our hyperparameters - a set of
parameters that we determine empirically and that do not change during the training. The
model itself will be a Recurrent Neural Net (RNN) and specifically, a variant named Long Short
Term Memory (LSTM) (http://colah.github.io/posts/2015-08-Understanding-LSTMs/) that is
ideally suited for NLP problems (see Figure 2).
Ideally, our feature extractor will be a type of language model. That is, given a set of tokens
the model can predict the next token with high accuracy. Let's look at an example:
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…task2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 4 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
If we have such a model available - namely, one that predicts sequences that appear in our
training data with high probability - the natural conclusion is that is has learned how our
authors tend to write and we can then attempt to use that knowledge. Here we encounter our
primary issue: training language models on words requires potentially millions of examples.
We have a relatively small corpus, so a word language model may severely overfit (in other
words, this model may pick up specific patterns that just happen to appear in our corpus).
This is a tough issue - fortunately, in this case it is easy to overcome. Since our corpus is
composed of proper English text (i.e., no foreign characters or emojis) we can simply use
characters as the tokens and not words. For example, given the phrase "to the people of New
York", we will have the sequence: ['t', 'o', ' ', 't', 'h', 'e', ' ', 'p',...] Note that whitespace and
punctuation are also characters and will be part of the sequences. This will hopefully assist us
in learning the features mentioned above.
Another trick we can use is to use character embeddings as opposed to just the characters
themselves (for instance, in a 1-hot encoded representation). Recall that embeddings are
dense representations of data that can learn the semantics of the domain. Here the
terminology may be a bit confusing, but the final result is that the embedding of a single
character can represent features about the context of the character: which characters tend to
appear before and/or after this one. This is precisely what we need.
Classifier
As noted above, the output of the style feature extractor (or feature encoder) is a fixed-size
vector. We use this vector as the input to a simple multi-layer feed-forward network. Each
layer in the network can then extract features that help it determine whether the character
sequence (that is, the fixed-size representation) was written by AH or JM. The final layer is
composed of a single neuron with a sigmoid (https://en.wikipedia.org/wiki/Sigmoid_function)
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…task2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 5 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
A very useful property of the entire model (both the feature encoder and the classifier) is that it
can be trained end-to-end - there is no need for specifically building either the feature
encoder or classifier. Furthermore, it also means that we can use this model to easily infer the
author of each of the disputed papers. For each such document, we perform the following
procedure:
1. Break the entire document to sequences of the same length, as determined by the
hyperparameter
2. Retrieve an author prediction for each one of these sequences
3. Determine which author has received more 'votes'. We will then use this author as our
prediction for the entire document. (Note: in order to have a clear majority, we need to
ensure that the number of sequences is odd).
Convert all text into lower-case (ignoring the fact that capitalization may be a stylistic
property)
Converting all newlines and multiple whitespaces into single whitespaces
Remove any mention of the authors' names, otherwise we risk data leakage
(https://machinelearningmastery.com/data-leakage-machine-learning/)
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…task2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 6 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
def preprocess_text(file_path):
""" Read and preprocess the text from a specific file.
Preprocessing includes:
* Replace newlines by spaces
* Replace double spaces by single spaces
* Lower-cases the text
* Removes the names of the authors
# Arguments
file_path: the path to read the file from
# Returns
The preprocessed file
"""
with open(file_path, 'r') as f:
lines = f.readlines()
text = ' '.join(lines[1:]).replace("\n", ' ').replace(' ',' ').
text = ' '.join(text.split())
return text
for x in os.listdir('./federalist_papers/JM/'):
all_madison += preprocess_text('./federalist_papers/JM/' + x)
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…task2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 7 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
Note that there is much more text available for AH than JM. We will need to address this issue
in order to not bias the model towards AH.
The next step is to break the long text for each author into many small sequences. As
described above, we empirically choose a length for the sequence and use it throughout the
model's lifecycle. We get our full dataset by labeling each sequence with its author.
To break the long texts into smaller sequences we use the Tokenizer class from the Keras
framework. In particular, note that we set it up to tokenize according to characters and not
words.
# Arguments
long_sequence: the long sequence to break into smaller sequences
label: the label to assign to each subsequence
sequence_length: the length of each subsequence
# Returns
X: matrix of size [len_sequences - sequence_length, sequence_length
y: matrix of size [len_sequences - sequence_length, 1] with label d
"""
len_sequences = len(long_sequence)
X = np.zeros(((len_sequences - sequence_length)+1, sequence_length))
y = np.zeros((X.shape[0], 1))
for i in range(X.shape[0]):
X[i] = long_sequence[i:i+sequence_length]
y[i] = label
return X,y
# We use the Tokenizer class from Keras to convert the long texts into a se
tokenizer = Tokenizer(char_level=True)
madison_long_sequence = tokenizer.texts_to_sequences([all_madison])[0]
hamilton_long_sequence = tokenizer.texts_to_sequences([all_hamilton])[0]
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…task2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 8 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
Number of characters: 53
Madison sequences: (271316, 30)
Hamilton sequences: (672139, 30)
Compare the number of raw characters to the number of labeled sequences for each author.
Deep Learning requires many examples of each input. The following code calculates the
number of total and unique words in the texts.
word_tokenizer = Tokenizer()
word_tokenizer.fit_on_texts([all_madison, all_hamilton])
Exercise: Do you think a word or a character embedding model is appropriate here? Write
down your reasoning in the following cell.
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…task2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 9 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
We begin by addressing the discrepancy in the amounts of data available for AH vs. JM. We
choose a simple solution here by choosing the same number of sequences for AH as are
available for JM and discarding the rest. Depending on the performance of the model, this
may or may not be a good idea in general.
The training set is used by the model to learn the weights in the neural network. The
model will iterate over this data many times, until performance is deemed to be
acceptable. A single pass through all the data is known as an epoch. Each training loop
works on a subset of the data known as a mini-batch. The number of instances in this
mini-batch is known as the batch size.
The validation set is used at the end of each epoch to assess performance of the model.
We present the model with data it has not seen before in order to evaluate its ability to
generalize. Had we used the training set instead, the model would have no 'motivation' to
learn the internal structure of the data - it would just try to 'memorize' the original data.
We stop the training when the validation set performance begins to drop, as this means
that the model now specializes (i.e, overfits) on the training set and is losing its ability to
deal with unseen data.
The test set is the final measure of performance that we report for the model. Once again,
we feed the model with data that it has not seen before in order to see how well it can
generalize. We do not use the validation set as we have already used it to determine
when to stop training, so effectively our model is biased towards good validation
performance. The test set is a brand new set of data that should only be used at the end
of the model training procedure.
We take 80% of the original data for the training set, and use the remaining 20% for test. We
then split the resulting training set again, and use 90% for actual training and the other 10%
for validation.
Exercise: Make sure that the data is in the proper shape for use in an RNN. See here for a
hint.
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…ask2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 10 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
In [20]:
# Take equal amounts of sequences from both authors
X = np.vstack((X_madison, X_hamilton[:X_madison.shape[0]]))
y = np.vstack((y_madison, y_hamilton[:y_madison.shape[0]]))
# Data is to be fed into RNN - ensure that the actual data is of size [batc
X_train = X_train.reshape(-1,SEQ_LEN,1)##TODO## : Reshape the data to fit a
X_test = X_test.reshape(-1,SEQ_LEN,1)##TODO## : Reshape the data to fit an
Finally, we construct the model graph and perform the training procedure. Notice how each
part of the model we describe above is implemented in Tensorflow code.
A single training epoch takes around 8.5 minutes on a K80 GPU. We have therefore provided
pretrained weights for the model at 1, 10 and 20 epochs. Simply run the code with no
changes to use the pretrained weights. If you'd like to perform the training yourself, change
the value of the RUN_TRAINING variable in the second cell below to True. You can also
control the number of training epochs using the NUM_EPOCHS variable.
# Arguments
sequences: character sequence data of size (batch size, sequence le
embedding_size: size of embedding vector to be generated for each c
lstm_size: size of vector that will be output by the LSTM style mod
# Returns
result: output of the entire model, a value between 0 and 1
lstm_output: the last output of the LSTM for each input sequence
"""
with tf.variable_scope("Hidden1"):
w1 = tf.get_variable("w1", (lstm_size, 128), initializer=tf.initial
b1 = tf.get_variable("b1", 128)
result = tf.nn.relu(tf.matmul(lstm_value, w1) + b1)
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…ask2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 12 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
with tf.variable_scope("Hidden2"):
w2 = tf.get_variable("w2", (128, 64), initializer=tf.initializers
b2 = tf.get_variable("b2", 64)
result = tf.nn.relu(tf.matmul(result, w2) + b2)
with tf.variable_scope("Output"):
w3 = tf.get_variable("w3", (64, 1), initializer=tf.initializers.
b3 = tf.get_variable("b3", 1)
result = tf.nn.sigmoid(tf.matmul(result, w3) + b3)
# Arguments
logits: output value of the model, as a logit
labels: real labels of the data
# Returns
loss: the loss value
num_correct: number of instances that were correctly classified
"""
loss = tf.losses.log_loss(labels, logits)
preds = tf.round(logits)
equality = tf.equal(tf.cast(labels, tf.float32), preds)
num_correct = tf.reduce_sum(tf.cast(equality, tf.float32))
# Arguments
loss: loss value for the model
learning_rate: the learning rate to use for the training procedure
"""
return tf.train.RMSPropOptimizer(learning_rate=learning_rate, momentum
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…ask2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 13 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
The following cell contains the code to perform the training loop on our data. Note the use of
Tensorflow iterators for importing different datasets without changes to the code.
### Hyperparameters
BATCH_SIZE = 4096
NUM_EPOCHS = 20 # Change this to shorten training time at the expense o
def make_dataset(X,y):
""" Creates a dataset composed of (data, label) instances, to be used f
# Arguments
X: the data to be used for training
y: the labels to be used for training
# Returns
ds: a Dataset object to be used for creating iterators
"""
ds = tf.data.Dataset.zip(
(tf.data.Dataset.from_tensor_slices(X), tf.data.
).shuffle(len(X), reshuffle_each_iteration=True).
return ds
def evaluate(dataset):
""" Perform evaluation of the model given a specific dataset
# Arguments
dataset: the dataset to be used for the evaluation
# Returns
mean of the loss value over all batches in this dataset
accuracy score for the dataset
"""
total_inputs = 0;
total_correct = 0;
losses = []
try:
total_inputs += logits_value.shape[0]
total_correct += value_correct
losses.append(loss_value)
except tf.errors.OutOfRangeError:
# This exception is expected. Simply continue.
pass
def train():
""" Perform a single training epoch of a model
# Returns
mean of the training loss value over all batches in this dataset
accuracy score for the dataset
duration: time elapsed for performing a single epoch
"""
losses = []
duration = 0
accuracy = 0
start = time.time()
try:
def validate():
""" Evaluate a validation set on a model
# Returns
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…ask2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 15 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
# Returns
Results of evaluating a validation set on a model
"""
return evaluate(validate_ds)
def test():
""" Evaluate a test set on a model
# Returns
Results of evaluating a test set on a model
"""
return evaluate(test_ds)
tf.reset_default_graph()
g = tf.Graph()
with g.as_default():
with tf.name_scope("input"):
sess = tf.Session(graph=g)
############# Flag for either training or restoring model from file ###
RUN_TRAINING = False
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…ask2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 16 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
if not RUN_TRAINING:
saver.restore(sess, "/dli/data/checkpoints/model.ckpt-20"); # Avail
print("Restored model from file!")
else:
sess.run(init_op)
Once the model has finished training, compare the validation and test losses to the training
loss. Notice that the test and validation loss values are similar, but not identical. This indicates
that the validation set is a good approximation for the performance of the test set and the
model's ability to generalize to new, unseen data. Notice also that the model's training loss is
lower that both validation and test loss. This may indicate that the model is beginning to
overfit by modelling the 'noise' in the training data. A good rule-of-thumb is to stop the
training when the validation loss begins to rise while the training loss continues to drop.
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…ask2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 17 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
1. Check the output of the style feature encoder - does it coincide with our intuition of what
it should be doing?
2. Compare the output of the classifier with the published research on the disputed papers -
how accurate is our model?
In the next cell, we create a new dataset that uses the test set and shuffles it. Recall that the
test set contains data that has not been used during the model's training in any way. We
shuffle it to make sure that we get random instances and in particular, that we do not get
sequences that follow each other in the actual text. We then run a single batch of data
through the trained model. However, rather than look at the final output, we take the output
from the LSTM layer (which itself follows a character embedding layer). We also make sure to
include the real labels from this data. Run this cell multiple times to extract subsequent
batches for viewing.
We would now like to get a visual idea of the style features vector. In order to do this, we need
to apply a Dimensionality Reduction (https://en.wikipedia.org/wiki/Dimensionality_reduction)
technique to transform the high-dimensional vector into a 2 or 3-dimensional vector which we
can then visualize. We do this with a technique called Principal Component Analysis (PCA)
(https://en.wikipedia.org/wiki/Principal_component_analysis) and then plot the result.
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…ask2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 18 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
plt.figure(figsize=(20,7))
plt.scatter(transformed_values[:,0], transformed_values[:,1], c=real_labels
colorbar = plt.colorbar();
colorbar.formatter = ticker.FuncFormatter(colorbar_labeler)
colorbar.update_ticks()
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…ask2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 19 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
There are two interesting observations we can make from this plot:
1. The sequences in the test set seem to cluster together for each author, i.e. they are not
randomly placed in the figure. (If you have time, try reloading the model weights after a
single epoch and then re-running the plot.) This coincides with our hypothesis that each
author has a specific style and that the model can learn to identify it.
2. The two clusters have a certain amount of overlap. This is not surprising, as the authors
use many similar words and phrases.
Note, however, that the red cluster (AH) has a number of points (i.e., sequences) that seem to
lie deep in the yellow cluster (JM). Two possible interpretations for this phenomenon are as
follows:
1. Either AH occasionally uses vocabulary and style that closely resembles that of JM, or
2. The labels are not correct. Keep in mind that the data are just sequences from the text
and not the entire text itself, so labels refer to whether AH or JM wrote a particular
sequence. Hence, this may point to the fact that some papers are actually collaborations
between AH and JM. What do you think?
When applying PCA, we chose to use only the 2 most significant principal components which
only account for a small amount of variance in the data. Since we cannot plot more than 3
components, we need to use a different method if we'd like a more detailed look. Here we
apply a method called T-distributed Stochastic Neighbor Embedding (T-SNE)
(https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding).
Because T-SNE is very computationally intensive (and unfortunately, we do not have a GPU
implementation) we take a different approach: we apply PCA to reduce dimensionality of the
original vector, and apply T-SNE on the result.
In the following cell, change the NUM_DIMENSIONS parameter to control the amount of
variance plotted, and the NUM_ITERATIONS parameter to change the computation time of
the algorithm. Note that this may take several minutes to run.
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…ask2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 20 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
NUM_DIMENSIONS = 30
NUM_ITERATIONS = 500 # No less than 250
tsne = TSNE(n_iter=NUM_ITERATIONS)
transformed_values = tsne.fit_transform(pca_values)
plt.figure(figsize=(20,7))
plt.scatter(transformed_values[:,0], transformed_values[:,1], c=real_labels
colorbar = plt.colorbar();
colorbar.formatter = ticker.FuncFormatter(colorbar_labeler)
colorbar.update_ticks()
disputed_text = preprocess_text('./federalist_papers/unknown/' + x)
disputed_long_sequence = tokenizer.texts_to_sequences([disputed_text
X_sequences, _ = make_subsequences(disputed_long_sequence, UNKNOWN)
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…ask2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 21 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
with g.as_default():
votes_for_madison = 0
votes_for_hamilton = 0
try:
while True:
predictions = sess.run(tf.round(logits), feed_dict={"Optio
counts = np.unique(predictions, return_counts=True)[1]
votes_for_hamilton += counts[AH]
votes_for_madison += counts[JM]
except tf.errors.OutOfRangeError:
pass
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…ask2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 22 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
Summary
In this lab, we discussed the problem of authorship attribution. We presented the Federalist
Papers debate, and built a Deep Learning model to address it. Finally, we looked at the model
internals to get an intuition for how the it encodes stylometric properties.
For the Federalist Papers, we know for a fact that each debated paper was written by either
Alexander Hamilton or James Madison. Therefore we used a model that will output one or the
other. More generally, we could have used a model that returns confidence levels for each
author - we would then have multiple sigmoid outputs, each giving us the probability of the
input text having been written by an author. If no sigmoid exceeds a certain threshold (for
example, 0.5), we can declare that text's author as unknown.
Authorship attribution is a type of a text classification problem, which are very pervasive. The
models and approach you saw here are very relevant to many types of text classification.
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…ask2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 23 of 24
Text Classification with Deep Learning 12/11/19, 1:24 AM
References
[1] The complete text of the Federalist Papers are available from Project Gutenberg here
(http://www.gutenberg.org/ebooks/18). The data has been split into multiple files for your
convenience, and those by John Jay have been removed.
http://ec2-18-222-127-142.us-east-2.compute.amazonaws.com/VjVvup…ask2/task/Text%20Classification%20with%20Deep%20Learning.ipynb# Page 24 of 24