0% found this document useful (0 votes)
48 views

08 Natural Language Processing in Tensorflow

The document discusses natural language processing (NLP) and provides examples of common NLP problems and modeling approaches. It outlines steps for modeling text data, including tokenization, embedding, and using recurrent neural networks (RNNs). It proposes experiments using models like LSTM, GRU, CNNs, and TensorFlow Hub pretrained feature extractors on a text classification task and evaluating results with metrics like accuracy and precision. The document appears to be instructional material for an NLP modeling workshop that will cover preprocessing text, building models, and evaluating performance.

Uploaded by

Akbar Shakoor
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

08 Natural Language Processing in Tensorflow

The document discusses natural language processing (NLP) and provides examples of common NLP problems and modeling approaches. It outlines steps for modeling text data, including tokenization, embedding, and using recurrent neural networks (RNNs). It proposes experiments using models like LSTM, GRU, CNNs, and TensorFlow Hub pretrained feature extractors on a text classification task and evaluating results with metrics like accuracy and precision. The document appears to be instructional material for an NLP modeling workshop that will cover preprocessing text, building models, and evaluating performance.

Uploaded by

Akbar Shakoor
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Natural Language Processing (NLP)

with
Where can you get help?
“If in doubt, run the code”

• Follow along with the code


• Try it for yourself
• Press SHIFT + CMD + SPACE to read the docstring
• Search for it
• Try again
• Ask (don’t forget the Discord chat!)
(yes, including the “dumb”
questions)
“What is a NLP problem?”
Example NLP problems and NLU… (natural language
understanding)
“What tags should this article have?”

Machine learning
Representation learning
Arti cial intelligence

(multiple label options per


sample) Source: http://karpathy.github.io/2015/05/21/rnn-e ectiveness/

Classi cation These are also Text Generation


referred to as
sequence problems

Machine Translation Voice Assistants


fi
fi
ff
Other sequence problems

Source: http://karpathy.github.io/2015/05/21/rnn-e ectiveness/


ff
Other sequence problems
Image captioning

Input

Source: http://karpathy.github.io/2015/05/21/rnn-e ectiveness/

Output A sledgehammer leaning up against a tire


ff
Other sequence problems
Sentiment analysis

Input

Source: http://karpathy.github.io/2015/05/21/rnn-e ectiveness/ Output Positive 👍


ff
Other sequence problems
Time series forecasting

Input

Source: https://www.coindesk.com/price/bitcoin

Source: http://karpathy.github.io/2015/05/21/rnn-e ectiveness/ Output Price at next timestamp (e.g. $59,678)


ff
Other sequence problems
Machine Translation

Source: http://karpathy.github.io/2015/05/21/rnn-e ectiveness/


Input Output
ff
What we’re going to cover
(broadly)
• Downloading and preparing a text dataset

• How to prepare text data for modelling (tokenization and embedding)

• Setting up multiple modelling experiments with recurrent neural


networks (RNNs)

• Building a text feature extraction model using TensorFlow Hub

• Finding the most wrong prediction examples

• Using a model we’ve built to make predictions on text from the wild

👩🍳 👩🔬
(w e’ ll be co ok ing u p lots of co d e! )

How:
NLP inputs and outputs
Diaster 🌪

Not Diaster 👌
“Is this Tweet for a disaster or not?”
Actual output

🌪 👌
[[0.22, 0.98, 0.02…],
[[0.97, 0.03],
[0.09, 0.55, 0.87…],
[0.81, 0.19],
[0.53, 0.81, 0.79…],
…,
…,

Numerical encoding Predicted output


(Tokenization + Embedding) ea d y ex is t s , if n o t ,
(often alr (comes from looking at lots
you can build one) of these)
Input and output shapes
(for a text classification example)
We’re going to be building RNNs/
CNNs/Feature extractors to do this part!

👌 🌪
[0.99, 0.01]
i o n p r ob ab i l i t i e s )
(predict

(gets represented as a tens


or/embedding)
[batch_size, embedding_size] Shape = [2]
Shape = [None, 512]
or These will vary depending on the
Shape = [32, 512] problem you’re working on/what
(32 is a very c om m o n b a t c h s iz e) embedding style you use.
Steps in modelling with TensorFlow

1. Turn all data into numbers (neural networks can’t handle text/natural language)
2. Make sure all of your tensors are the right shape (pad sequences which don’t t)

fi
“What is a recurrent neural
network (RNN)?”
(typical)*

Architecture of an RNN

(what we’re working towa


rds
building)

Not Diaster 👌

*Note: there are almost an unlimited amount of ways you could stack together a recurrent neural network, this slide demonstrates only one.
Let’s code!
Tokenization vs Embedding
I=0
I love TensorFlow 0 1 2 love = 1
TensorFlow = 2
Tokenization — straight [[1, 0, 0],
mapping from token to
number (can be modelled but
[0, 1, 0], One-hot
Encoding
quickly gets too big) [0, 0, 1],
…,
Embedding — richer
representation of [[0.492, 0.005, 0.019],
relationships between tokens [0.060, 0.233, 0.899], Embedding
(can limit size + can be [0.741, 0.983, 0.567],
learned) …,
Experiments we’re running
Experiment Number Model
0 Naive Bayes with TF-IDF encoder (baseline)

1 Feed-forward neural network (dense model)

2 LSTM (RNN)

3 GRU (RNN)

4 Bidirectional-LSTM (RNN)

5 1D Convolutional Neural Network

6 TensorFlow Hub Pretrained Feature Extractor

7 TensorFlow Hub Pretrained Feature Extractor (10% of data)


(some common)

Classification evaluation methods


Key: tp = True Positive, tn = True Negative, fp = False Positive, fn = False Negative

Metric Name Metric Forumla Code When to use

tp + tn tf.keras.metrics.Accuracy() Default metric for classi cation


Accuracy Accuracy = or problems. Not the best for
tp + tn + fp + fn sklearn.metrics.accuracy_score() imbalanced classes.

tp tf.keras.metrics.Precision() Higher precision leads to less false


Precision Precision = or
tp + fp sklearn.metrics.precision_score() positives.

tp tf.keras.metrics.Recall() Higher recall leads to less false


Recall Recall = or
tp + fn sklearn.metrics.recall_score() negatives.

precision ⋅ recall Combination of precision and recall,


F1-score F1-score = 2 ⋅ sklearn.metrics.f1_score() usually a good overall metric for a
precision + recall classi cation model.

When comparing predictions to truth


Custom function labels to see where model gets
Confusion matrix NA or
sklearn.metrics.confusion_matrix() confused. Can be hard to use with
large numbers of classes.
fi
fi
Architecture of a RNN
(col o ur e d b l o c k e d it i o n )

Standard RNN
Types of RNN cells
Name When to use Learn more Code

LSTM (long short-term Default RNN layer for sequence Understanding LSTM Networks
tf.keras.layers.LSTM
memory) problems. by Chris Olah

Performs very similar to LSTM (could Illustrated Guide to LSTM’s and


GRU (gated recurrent unit) tf.keras.layers.GRU
be used as a default). GRU’s by Michael Phi

Bidirectional LSTM (goes Good for sequences which may


bene t from passing forward and
forward and backwards on Same as above tf.keras.layers.Bidirectional
backwards (e.g. translation or longer
sequence)
passages of text).
fi
Architecture of a Sequence
Conv1D Model (colour e d b l o c k e d it i o n )

Conv1D Sequence model


Model we’re building (USE* feature extractor)
*USE = Universal Sentence Encoder
Source: https://tfhub.dev/google/universal-sentence-encoder/4

Encoder Decoder
(Encodes sequences into (Decodes sequences into
numerical representation) desired output)
Ideal speed/performance trade off

Ideal position for


speed/performance
(high performance +
high speed)
Improving a model (from a model’s perspective)

Smaller model

Common ways to improve a deep model:


• Adding layers Larger model
• Increase the number of hidden units
• Change the activation functions
• Change the optimization function
• Change the learning rate (because you can alter each of
• Fitting on more data these, they’re hyperparameters)

• Fitting for longer


What is overfitting?
Over tting — when a model over learns patterns in a particular dataset and isn’t able to
generalise to unseen data.

For example, a student who studies the course materials too hard and then isn’t able to perform
well on the nal exam. Or tries to put their knowledge into practice at the workplace and nds
what they learned has nothing to do with the real world.

Under tting Balanced Over tting


(goldilocks zone)
fi
fi
fi
fi
fi
Improving a model (from a data perspective)

Method to improve a model


What does it do?
(reduce over tting)

Gives a model more of a chance to learn patterns between samples


More data (e.g. if a model is performing poorly on images of pizza, show it more
images of pizza).

Increase the diversity of your training dataset without collecting more


data (e.g. take your photos of pizza and randomly rotate them 30°).
Data augmentation (usually for images)
Increased diversity forces a model to learn more generalisation
patterns.

Not all data samples are created equally. Removing poor samples
Better data from or adding better samples to your dataset can improve your
model’s performance.

Take a model’s pre-learned patterns from one problem and tweak


Use transfer learning them to suit your own problem. For example, take a model trained on
pictures of cars to recognise pictures of trucks.
fi
The machine learning explorer’s
motto
“Visualize, visualize, visualize”
Data

Model It’s a good idea to visualize


these as often as possible.

Training

Predictions
The machine learning practitioner’s
motto

“Experiment, experiment, experiment”

👩🍳 👩🔬
(try lots of things an
d see what
tastes good)

You might also like