0% found this document useful (0 votes)
4 views

NLP_DeepNLP

Uploaded by

Indoritwist
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

NLP_DeepNLP

Uploaded by

Indoritwist
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

Prerequisite

for
Natural language processing
● Python
● Basic Concept of Machine
Learning and Deep Learning
Natural language processing
Natural language processing
Natural language processing (NLP) is a subfield of linguistics, computer
science, information engineering, and artificial intelligence concerned
with the interactions between computers and human (natural)
languages, in particular how to program computers to process and
analyze large amounts of natural language data.

Challenges in natural language processing frequently involve


speech recognition, natural language understanding, and natural
language generation.
Types of NLP
How NLP, DNLP and DL involves in!!!
How NLP, DNLP and DL involves in!!!
Applications
Used by
NLP Working
NLP Working
Natural Language Understanding
Ambiguity:
Lexical Ambiguity : The Tank is full of water.
Syntactic Ambiguity : ill men and women get to hospital.
Semantic Ambiguity : The Bike hit the pole while it was running.
Pragmatic Ambiguity : The Army is coming.

Phonology – This science helps to deal with patterns present in the sound and speeches related
to the sound as a physical entity.

Pragmatics – This science studies the different uses of language.

Morphology – This science deals with the structure of the words and the systematic relations
between them.

Syntax – This science deal with the structure of the sentences.

Semantics – This science deals with the literal meaning of the words, phrases as well as
sentences.
Natural Language Generation
Based on NL-Understanding, it will suggest about:
● What should say to user.
● Should be Intelligent and Covervational as like human
● Usage of Structured data.
● With text and Sentence like planning.
Tokenization
Tokenization is the process of replacing sensitive data with
unique identification symbols that retain all the essential
information about the data without compromising its security.
Tokenization
There are many library / framework for NLP problem solution

1. Natural Language Toolkit (NLTK)


2. TextBlob
3. CoreNLP
4. Gensim
5. spaCy
6. polyglot
7. scikit–learn
8. Pattern
So lets' move to COLAB for practical work...
Bag of words
The bag-of-words model is a simplifying representation
used in natural language processing and information
retrieval (IR). In this, a text (such as a sentence or a
document) is represented as the bag (multiset) of its
words, disregarding grammar and even word order but
keeping multiplicity.
Bag of words

i a l
ut or
s a in T
us
d H
ha
Fa
Comments
Are you ready, to start this course to?
Comments
Are you ready, to start this course to?
Training and Testing
NLP

Deep NLP
Features Extraction in NLP
Frequency: This summarizes how often a given word appears within a document.

Document Frequency: This downscales words that appear a lot across documents.

Inverse Document Frequency (IDF): is a weight indicating how commonly a word is used. The more
frequent its usage across documents, the lower its score. The lower the score, the less important the
word becomes.

For example, the word the appears in almost all English texts and would thus have a very low IDF score
as it carries very little “topic” information. In contrast, if you take the word coffee, while it is common, it’s
not used as widely as the word the. Thus, coffee would have a higher IDF score than the.

TF-IDF: is a numerical statistic that is intended to reflect how important a word is to a document in a
collection or corpus.
Sentence 1 : The car is driven on the road.
Sentence 2: The truck is driven on the highway.
1. Fair men
2. Fair women
3. men women Fair

Sent. 1 Sent. 2 Sent.3

fair

men

women

words IDF
men
women
fair

f1 f2 f3
men women fair
Sentence 1
Sentence 2
Sentence 3
Hashing with HashingVectorizer in NLP
Count Vectorizer: The most straightforward one, it counts the number of times a
token shows up in the document and uses this value as its weight.

Hash Vectorizer: This one is designed to be as memory efficient as possible. Instead


of storing the tokens as strings, the vectorizer applies the hashing trick to encode them
as numerical indexes. The downside of this method is that once vectorized, the
features’ names can no longer be retrieved.

TF-IDF Vectorizer: TF-IDF stands for “term frequency-inverse document frequency”,


meaning the weight assigned to each token not only depends on its frequency in a
document but also how recurrent that term is in the entire corpora. More on that here.
Hashing with HashingVectorizer in NLP
Counts and frequencies can be very useful, but one limitation of these
methods is that the vocabulary can become very large. This, in turn, will
require large vectors for encoding documents and impose large
requirements on memory and slow down algorithms. A clever work around
is to use a one way hash of words to convert them to integers. The clever
part is that no vocabulary is required and you can choose an arbitrary-long
fixed length vector. A downside is that the hash is a one-way function so
there is no way to convert the encoding back to a word
Hashing with HashingVectorizer in NLP
The HashingVectorizer class implements this approach that can be used to
consistently hash words, then tokenize and encode documents as needed.
The example below demonstrates the HashingVectorizer for encoding a
single document. An arbitrary fixed-length vector size of 20 was chosen.
How to Prepare Text Data
With Keras
Keras is an open-source neural-network library written in Python. It is capable
of running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML.
Designed to enable fast experimentation with deep neural networks, it focuses on
being user-friendly, modular, and extensible. It was developed as part of the research
effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating
System).

https://keras.io/
How to Prepare Text Data with
scikit-learn
Lets understand the following topic using Keras;

• Split Words with text to word sequence

• Encoding with one hot

• Hash Encoding with hashing trick

• Tokenizer API
N-grams in NLP
N-grams of texts are extensively used in text mining and natural language
processing tasks. They are basically a set of co-occuring words within a given
window and when computing the n-grams you typically move one word forward
(although you can move X words forward in more advanced scenarios). For
example, for the sentence "The quick brown fox jump over the lazy dog". If N=2
(known as bigrams), then the n-grams would be:
OR
Contiguous sequence of n item from a given sample text.
['The quick',
['The quick brown',
'quick brown',
'quick brown fox',
'brown fox',
'brown fox jump',
'fox jump',
['The', 'quick', 'brown', 'fox', 'jump', 'over', 'the', 'lazy', 'dog'] 'fox jump over',
'jump over',
'jump over the',
'over the',
'over the lazy',
'the lazy',
'the lazy dog']
'lazy dog']
N-grams in NLP
How many N-grams in a sentence?
If X=Num of words in a given sentence K, the number of n-grams for
sentence K would be:
What is Machine Learning
Machine learning is an application of artificial intelligence (AI) that provides
systems the ability to automatically learn and improve from experience without
being explicitly programmed
Machine Learning in NLP
Logistic Regression
Machine Learning in NLP
In linear regression, the outcome (dependent variable) is continuous. It can have any
one of an infinite number of possible values.
In logistic regression, the outcome (dependent variable) has only a limited number of
possible values.

The dependent variable:

Logistic regression is used when the response variable is categorical in nature. For
instance, yes/no, true/false, red/green/blue, 1st/2nd/3rd/4th, etc.

Linear regression is used when your response variable is continuous. For instance,
weight, height, number of hours, etc.

Y = mX + C
g(x) = 1 / (1 + e^-x)
Machine Learning in NLP
In machine learning, support-vector machines (SVMs, also support-
vector networks) are supervised learning models with associated learning
algorithms that analyze data used for classification and regression analysis. The
Support Vector Machine (SVM) algorithm is a popular machine learning tool that
offers solutions for both classification and regression problems.

A Support Vector Machine (SVM) is a discriminative classifier formally


defined by a separating hyperplane. In other words, given labeled training data
(supervised learning), the algorithm outputs an optimal hyperplane which
categorizes new examples. In two dimensional space this hyperplane is a line
dividing a plane in two parts where in each class lay in either side.
Machine Learning in NLP
Machine Learning in NLP
K-nearest neighbors algorithm
The k-nearest neighbors algorithm (k-NN) is a non-parametric method proposed
by Thomas Cover used for classification and regression. In both cases, the input
consists of the k closest training examples in the feature space. The output depends on
whether k-NN is used for classification or regression:

In k-NN classification, the output is a class membership. An object is classified


by a plurality vote of its neighbors, with the object being assigned to the class most
common among its k nearest neighbors (k is a positive integer, typically small). If k = 1,
then the object is simply assigned to the class of that single nearest neighbor.

In k-NN regression, the output is the property value for the object. This value is
the average of the values of k nearest neighbors.
Steps of Working
Example
Naïve Bayes Classifier algorithm
Naive Bayes classifiers are a family of simple "probabilistic classifiers" based on
applying Bayes' theorem with strong (naïve) independence assumptions between the
features. They are among the simplest Bayesian network models. But they could be
coupled with Kernel density estimation and achieve higher accuracy levels.

Naïve Bayes classifiers are highly scalable, requiring a number of parameters


linear in the number of variables (features/predictors) in a learning problem. Maximum-
likelihood training can be done by evaluating a closed-form expression,:718 which takes
linear time, rather than by expensive iterative approximation as used for many other
types of classifiers.

In the statistics and computer science literature, naive Bayes models are known
under a variety of names, including simple Bayes and independence Bayes.[5] All these
names reference the use of Bayes' theorem in the classifier's decision rule, but naïve
Bayes is not (necessarily) a Bayesian method
Naïve Bayes Classifier algorithm
Naïve Bayes Example
Naïve Bayes algorithm
Word Embedding using Word2Vec
Word Embedding is a language modeling technique used for mapping words to
vectors of real numbers. It represents words or phrases in vector space with several
dimensions. Word embeddings can be generated using various methods like neural
networks, co-occurrence matrix, probabilistic models, etc.

Word2Vec consists of models for generating word embedding. These models are
shallow two layer neural networks having one input layer, one hidden layer and one
output layer. Given enough data, usage and contexts, word2vec can make highly
accurate guesses about a words’ meaning based on past appearances. Those guesses
can be used to establish a word’s association with other words. Eg. Man is to boy what
whome is to girl etc…

Word2Vec utilizes two architectures

• CBOW (Continuous Bag of Words)

• Skip Gram
Word Embedding using Word2Vec
CBOW (Continuous Bag of Words)
CBOW model predicts the current word given context words within specific
window. The input layer contains the context words and the output layer contains the
current word. The hidden layer contains the number of dimensions in which we want
to represent current word present at the output layer.
Word Embedding using Word2Vec
Skip Gram : Skip gram predicts the surrounding context words
within specific window given current word. The input layer
contains the current word and the output layer contains the
context words. The hidden layer contains the number of
dimensions in which we want to represent current word present
at the input layer.
How does it work
In simple words Word2vec is just vector representation of words in n
dimension(usually 300) space. It is also called embedding.

Now why we use cosine similarity - To get similarity between two


words.
How does it work Cosine similarity = 1 - cosine distance.
Cosine distance is nothing but getting distance between two vectors in n
dimension space. Distance represent how words are related to each other.
Topic Modeling
In machine learning and natural language processing, a topic model is a type of
statistical model for discovering the abstract "topics" that occur in a collection of
documents. Topic modeling is a frequently used text-mining tool for discovery of
hidden semantic structures in a text body. Intuitively, given that a document is about a
particular topic, one would expect particular words to appear in the document more or
less frequently: "dog" and "bone" will appear more often in documents about dogs,
"cat" and "meow" will appear in documents about cats, and "the" and "is" will appear
approximately equally in both. A document typically concerns multiple topics in
different proportions; thus, in a document that is 10% about cats and 90% about dogs,
there would probably be about 9 times more dog words than cat words. The "topics"
produced by topic modeling techniques are clusters of similar words. A topic model
captures this intuition in a mathematical framework, which allows examining a set of
documents and discovering, based on the statistics of the words in each, what the
topics might be and what each document's balance of topics is.
Latent Dirichlet Allocation (LDA)
Johann Peter Gustav Lejeune Dirichlet was a German
mathematician in the 1800s who contributed widely to the field of
modern mathematics. There is a probability distribution named
after him "Dirichlet Distribution"

In natural language processing, the latent Dirichlet allocation (LDA) is a


generative statistical model that allows sets of observations to be explained
by unobserved groups that explain why some parts of the data are similar. For
example, if observations are words collected into documents, it posits that
each document is a mixture of a small number of topics and that each word's
presence is attributable to one of the document's topics. LDA is an example of
a topic model and belongs to the machine learning toolbox and in wider sense
to the artificial intelligence toolbox.
Latent Dirichlet Allocation (LDA)

Later on 2003 paper published on Journal of machine learning


Latent Dirichlet Allocation
Graphic model for topic discovery
Latent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA)

It assumes that documents are produced in the following fashion: a Choose a


topic mixture for the document (according to a Dirichlet distribution over a
fixed set of K topics). a e.g. 60% Pet, 20% resident , 10% food
Using the topic to generate the word itself (according to the topic's multinomial
distribution).
Step by Step work
Step by Step work

Let’s move towards practical work to


understand more…
Non-negative matrix
factorization (NMF or NNMF)
Non-negative matrix factorization (NMF or NNMF), also non-negative
matrix approximation is a group of algorithms in multivariate analysis and
linear algebra where a matrix V is factorized into (usually) two matrices W
and H, with the property that all three matrices have no negative
elements.

Used for dimensionality reduction and clustering.


We can use it in conjunction with TF-IDF to model topics
across documents.
Non-negative matrix
factorization (NMF or NNMF)
Non-negative matrix
factorization (NMF or NNMF)
Non-negative matrix
factorization (NMF or NNMF)
Non-negative matrix
factorization (NMF or NNMF)
Non-negative matrix
factorization (NMF or NNMF)

You might also like