CH 6. Applications of AI-NLP
CH 6. Applications of AI-NLP
Applications of AI
● Natural Language Processing: Language Models, Grammar, Parsing, Augmented Grammars, Complications of Real
Natural Language
● Natural Language Tasks
Natural Language Processing
Natural language processing is a subfield of linguistics, computer science, and artificial intelligence concerned
with the interactions between computers and human language, in particular how to program computers to
process and analyze large amounts of natural language data. More like giving computers the ability to
understand text and spoken words in much the same way human beings can.
3 Primary reasons for computers to do NLP
1. To communicate with humans. In many situations it is convenient for humans to use speech to interact
with computers, and in most situations it is more convenient to use natural language rather than a formal
language such as first-order predicate calculus.
2. To learn. Humans have written down a lot of knowledge using natural language. Wikipedia alone has 30
million pages of facts such as “Bush babies are small nocturnal primates,” whereas there are hardly any
sources of facts like this written in formal logic. If we want our system to know a lot, it had better
understand natural language.
3. To advance the scientific understanding of languages and language use, using the tools of AI in
conjunction with linguistics, cognitive psychology, and neuroscience.
Language Models
A language model is a probability distribution over sequences of words. Given such a sequence of length m, a
language model assigns a probability to the whole sequence. Language models generate
probabilities by training on text corpora in one or many languages.
We define a language model as a probability distribution describing the likelihood of any string. Such a model
should say that “Do I dare disturb the universe?” has a reasonable probability as a string of English, but
“Universe dare the I disturb do?” is extremely unlikely.
Applications of Language model
The bag-of-words model is a simplifying representation used in natural language processing and
information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the
bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity.
Bag of words Model
Let’s say we have 2 text document:
Based on these two text documents, a list is constructed as follows for each document:
"John","likes","to","watch","movies","Mary","likes","movies","too"
"Mary","also","likes","to","watch","football","games"
Form a Matrices
words John likes to watch movies Mary too also football games
Doc1 1 2 1 1 2 1 1 0 0 0
Doc2 0 1 1 1 0 1 0 1 1 1
The bag-of-words model has limitations. For example, the word “quarter” is common in both the and categories.
But the four-word sequence “first quarter earnings report” is common only in and “fourth quarter touchdown
passes” is common only in business and “fourth quarter touchdown passes” is common only in sports.
A sequence of written symbols of length is called an -gram, with special cases “unigram” for 1-gram, “bigram” for
2-gram, and “trigram” for 3-gram.
[ "John likes", "likes to", "to watch", "watch movies", "Mary likes", "likes movies", "movies too",]
N-Gram Model
Taking the same 2 text document:
Based on these two text documents, a list is constructed as follows for each document using bigrams:
"John likes","likes to" ,"to watch" ,"watch movies" ,"Mary likes" ,"likes movies" ,"movie too"
"Mary also" ,"also likes" ,"likes to" ,"to watch" ,"watch football" ,"football games"
Form a Matrices
words John likes Likes to To watch watch Mary likes Likes Movies Mary also Also likes Watch Football
movies movies too football games
Doc1 1 1 1 1 1 1 1 0 0 0 0
Doc2 0 1 1 0 0 0 0 1 1 1 1
● One basic way to categorize words is by their part of speech (POS), also called lexical category or tag: noun, verb,
adjective, and so on.
● Penn Treebank, a corpus of over three million words of text annotated with part-of-speech tags.
Example of POS Tags:
● The task of assigning a part of speech to each word in a sentence is called part-of-speech tagging
Tags
Hidden Markov Model
Markov Chain:
https://www.youtube.com/watch?v=i3AkTO9HLXo&list=PLoD3ZpkiaA_pwMrAMlES6CGBnS1UBOVpk&index=
2&t=16s
HMM: https://www.youtube.com/watch?v=RWkHJnFj5rY
Markov Chain Example
Hidden Markov Model Example
Hidden Markov Model Example
Hidden Markov Model Example
Hidden Markov Model Example
Hidden Markov Model Example
= 0.00391
Hidden Markov Model Example
= 0.00391
A grammar is a set of rules that defines the tree structure of allowable phrases, and a language is the set of
sentences that follow those rules.
Syntactic categories such as noun phrase or verb phrase help to constrain the probable words at each point
within a sentence, and the phrase structure provides a framework for the meaning or semantics of the
sentence.
Example: The sentence “I ate a banana” is fine, but “Me ate a banana” is ungrammatical, and “I ate a
bandanna” is unlikely. Here, “I ate a bandanna” is syntactically correct but semantically wrong.
Lexicon
The definition of a lexicon is a dictionary or the vocabulary of a language, a people or a subject. An example of
lexicon is YourDictionary.com.
Parsing is the process of analyzing a string of words to uncover its phrase structure, according to the rules
of a grammar. We can think of it as a search for a valid parse tree whose leaves are the words of the string.
Dependency Parsing
The term Dependency Parsing (DP) refers to the process of examining the dependencies between the
phrases of a sentence in order to determine its grammatical structure.
Natural Language Task
1. Speech recognition
2. Text-to-speech
3. Machine translation
4. Information extraction
5. Question Answering
Word Embeddings
● Natural Language Processing: Language Models, Grammar, Parsing, Augmented Grammars, Complications of Real
Natural Language
● Natural Language Tasks
● NLP Demo using Python
Contents
One of the most significant findings to emerge from the application of deep learning
to language tasks is that a great deal deal of mileage comes from re-representing
individual words as vectors in a high-dimensional space—so-called word
embeddings (see Section 24.1 ). The vectors are usually extracted from the
weights of the first hidden layer of a network trained on large quantities of text, and
they capture the statistics of the lexical contexts in which words are used. Because
words with similar meanings are used in similar contexts, they end up close to
each other in the vector space. This allows the network to generalize effectively
across categories of words, without the need for humans to predefine those
categories. For example, a sentence beginning “John bought a watermelon and
two pounds of ...” is likely to continue with “apples” or “bananas” but not with
“thorium” or “geography.” Such a prediction is much easier to make if “apples” and
“bananas” have similar representations in the internal layer.
Recurrent Neural Networks for NLP
For example, in the sentence “Eduardo told me that Miguel was very sick so I took him to the hospital,”
knowing that him refers to Miguel and not Eduardo requires context that spans from the first to the last word of
the 14-word sentence.
Recurrent Neural Networks for NLP
Once the model has been trained, we can use it to generate random text. We give
the model an initial input word , from which it will produce an output which is a
softmax probability distribution over words. We sample a single word from the
distribution, record the word as the output for time , and feed it back in as the next
input word . We repeat for as long as desired. In sampling from we have a choice: we
could always take the most likely word; we could sample according to the probability
of each word; or we could oversample the less-likely words, in order to inject more
variety into the generated output. The sampling weight is a hyperparameter of the Fig: Example of RNN
model.
Sequence to Sequence Model
Sequence to Sequence (often abbreviated to seq2seq) models is a special class of Recurrent Neural
Network architectures that we typically use (but not restricted) to solve complex Language problems like
Machine Translation, Question Answering, creating Chatbots, Text Summarization, etc.
Sequence to Sequence Model
A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the
significance of each part of the input data. It is used primarily in the fields of natural language processing (NLP)
and computer vision (CV).
Like recurrent neural networks (RNNs), transformers are designed to process sequential input data, such as
natural language, with applications towards tasks such as translation and text summarization. However, unlike
RNNs, transformers process the entire input all at once. The attention mechanism provides context for any
position in the input sequence. For example, if the input data is a natural language sentence, the transformer does
not have to process one word at a time. This allows for more parallelization than RNNs and therefore reduces
training times.
Transformers were introduced in 2017 by a team at Google Brain[1] and are increasingly the model of choice for
NLP problems,[3] replacing RNN models such as long short-term memory (LSTM).
The transformer Architecture
Pretraining: a form of transfer learning in which we use a large amount of shared general- domain language
data to train an initial version of an NLP model. From there, we can use a smaller amount of domain-specific
data (perhaps including some labeled data) to refine the model. The refined model can learn the vocabulary,
idioms, syntactic structures, and other linguistic phenomena that are specific to the new domain.
Masked language model (MLM). MLMs are trained by masking (hiding) individual words in the input and
asking the model to predict the masked words. For this task, one can use a deep bidirectional RNN or
transformer on top of the masked sentence. For example, given the input sentence “The river__rose five feet”
we can mask the middle word to get “The river five feet” and ask the model to fill in the blank.
State of the Art
Deep learning and transfer learning have markedly advanced the state of the art for NLP. It started with simple
word embeddings from systems such WORD2VEC in 2013 and GloVe in 2014. Researchers can download
such a model or train their own relatively quickly without access to supercomputers.
State of the Art Models
● BERT
● XLNET
● State of the Art models became feasible only after hardware advances (GPUs and TPUs).
● The transformer model allowed for efficient training of much larger and deeper neural networks than was
previously possible
● A ROBERTA model with some fine-tuning achieves state-of-the-art results in question answering and
reading comprehension tests
● GPT-2, a transformer-like language model with 1.5 billion parameters trained on 40GB of Internet text,
achieves good results on such diverse tasks as translation between French and English
● T5 (the Text-to-Text Transfer Transformer) is designed to produce textual responses to various kinds of
textual input.
CONTD….
Prev Contents
● Computer Vision:
○ Image Formation,
○ Simple Image Features,
○ Classifying Images,
○ Detecting Objects,
○ The 3D World,
○ Using Computer Vision
● Robotics:
○ Robot Hardware,
○ Robotic Perception,
○ Planning and Control,
○ Planning Uncertain Movements,
○ Reinforcement Learning in Robotics,
○ Humans and Robots,
○ Application Domains
Computer Vision
Computer vision is a field of artificial intelligence (AI) that enables computers and systems to derive
meaningful information from digital images, videos and other visual inputs — and take actions or make
recommendations based on that information.
Image Formation
Pg: 1619
Simple Image Feature
- Edges
- Boundary
- Texture
- Segementation
Noise here means changes to the value of a pixel that don’t have to do with an edge.
Classifying images
PG: 1654
Using Computer vision
Perception is the process by which robots map sensor measurements into internal representations of the environment.
Much of it uses the computer vision techniques from the previous chapter. But perception for robotics must deal with
additional sensors like lidar and tactile sensors.
● They contain enough information for the robot to make good decisions.
● They are structured so that they can be updated efficiently.
● They are natural in the sense that internal variables correspond to natural state variables in the physical world.
Example
Planning and Control
The robot’s deliberations ultimately come down to deciding how to move, from the abstract task level all the
way down to the currents that are sent to its motors. In this section, we simplify by assuming that perception
(and, where needed, prediction) are given, so the world is observable. We further assume deterministic
transitions (dynamics) of the world.
We start by separating motion from control. We define a path as a sequence of points in geometric space that
a robot (or a robot part, such as an arm) will follow. Here we mean a sequence of points in space rather than a
sequence of discrete actions. The task of finding a good path is called motion planning.
Once we have a path, the task of executing a sequence of actions to follow the path is called trajectory tracking
control. A trajectory is a path that has a time associated with each point on the path. A path just says “go from
A to B to C, etc.” and a trajectory says “start at A, take 1 second to get to B, and another 1.5 seconds to get to
C, etc.”
Planning uncertain movement
In robotics, uncertainty arises from partial observability of the environment and from the stochastic (or
unmodeled) effects of the robot’s actions. Errors can also arise from the use of approximation algorithms
such as particle filtering, which does not give the robot an exact belief state even if the environment is
modeled perfectly.
Applications
● Industries − Robots are used for handling material, cutting, welding, color coating, drilling, polishing, etc.
● Military − Autonomous robots can reach inaccessible and hazardous zones during war. A robot named
Daksh, developed by Defense Research and Development Organization (DRDO), is in function to destroy
life-threatening objects safely.
● Medicine − The robots are capable of carrying out hundreds of clinical tests simultaneously, rehabilitating
permanently disabled people, and performing complex surgeries such as brain tumors.
● Exploration − The robot rock climbers used for space exploration, underwater drones used for ocean
exploration are to name a few.
● Entertainment − Disney’s engineers have created hundreds of robots for movie making.