NLP_DeepNLP
NLP_DeepNLP
for
Natural language processing
● Python
● Basic Concept of Machine
Learning and Deep Learning
Natural language processing
Natural language processing
Natural language processing (NLP) is a subfield of linguistics, computer
science, information engineering, and artificial intelligence concerned
with the interactions between computers and human (natural)
languages, in particular how to program computers to process and
analyze large amounts of natural language data.
Phonology – This science helps to deal with patterns present in the sound and speeches related
to the sound as a physical entity.
Morphology – This science deals with the structure of the words and the systematic relations
between them.
Semantics – This science deals with the literal meaning of the words, phrases as well as
sentences.
Natural Language Generation
Based on NL-Understanding, it will suggest about:
● What should say to user.
● Should be Intelligent and Covervational as like human
● Usage of Structured data.
● With text and Sentence like planning.
Tokenization
Tokenization is the process of replacing sensitive data with
unique identification symbols that retain all the essential
information about the data without compromising its security.
Tokenization
There are many library / framework for NLP problem solution
i a l
ut or
s a in T
us
d H
ha
Fa
Comments
Are you ready, to start this course to?
Comments
Are you ready, to start this course to?
Training and Testing
NLP
Deep NLP
Features Extraction in NLP
Frequency: This summarizes how often a given word appears within a document.
Document Frequency: This downscales words that appear a lot across documents.
Inverse Document Frequency (IDF): is a weight indicating how commonly a word is used. The more
frequent its usage across documents, the lower its score. The lower the score, the less important the
word becomes.
For example, the word the appears in almost all English texts and would thus have a very low IDF score
as it carries very little “topic” information. In contrast, if you take the word coffee, while it is common, it’s
not used as widely as the word the. Thus, coffee would have a higher IDF score than the.
TF-IDF: is a numerical statistic that is intended to reflect how important a word is to a document in a
collection or corpus.
Sentence 1 : The car is driven on the road.
Sentence 2: The truck is driven on the highway.
1. Fair men
2. Fair women
3. men women Fair
fair
men
women
words IDF
men
women
fair
f1 f2 f3
men women fair
Sentence 1
Sentence 2
Sentence 3
Hashing with HashingVectorizer in NLP
Count Vectorizer: The most straightforward one, it counts the number of times a
token shows up in the document and uses this value as its weight.
https://keras.io/
How to Prepare Text Data with
scikit-learn
Lets understand the following topic using Keras;
• Tokenizer API
N-grams in NLP
N-grams of texts are extensively used in text mining and natural language
processing tasks. They are basically a set of co-occuring words within a given
window and when computing the n-grams you typically move one word forward
(although you can move X words forward in more advanced scenarios). For
example, for the sentence "The quick brown fox jump over the lazy dog". If N=2
(known as bigrams), then the n-grams would be:
OR
Contiguous sequence of n item from a given sample text.
['The quick',
['The quick brown',
'quick brown',
'quick brown fox',
'brown fox',
'brown fox jump',
'fox jump',
['The', 'quick', 'brown', 'fox', 'jump', 'over', 'the', 'lazy', 'dog'] 'fox jump over',
'jump over',
'jump over the',
'over the',
'over the lazy',
'the lazy',
'the lazy dog']
'lazy dog']
N-grams in NLP
How many N-grams in a sentence?
If X=Num of words in a given sentence K, the number of n-grams for
sentence K would be:
What is Machine Learning
Machine learning is an application of artificial intelligence (AI) that provides
systems the ability to automatically learn and improve from experience without
being explicitly programmed
Machine Learning in NLP
Logistic Regression
Machine Learning in NLP
In linear regression, the outcome (dependent variable) is continuous. It can have any
one of an infinite number of possible values.
In logistic regression, the outcome (dependent variable) has only a limited number of
possible values.
Logistic regression is used when the response variable is categorical in nature. For
instance, yes/no, true/false, red/green/blue, 1st/2nd/3rd/4th, etc.
Linear regression is used when your response variable is continuous. For instance,
weight, height, number of hours, etc.
Y = mX + C
g(x) = 1 / (1 + e^-x)
Machine Learning in NLP
In machine learning, support-vector machines (SVMs, also support-
vector networks) are supervised learning models with associated learning
algorithms that analyze data used for classification and regression analysis. The
Support Vector Machine (SVM) algorithm is a popular machine learning tool that
offers solutions for both classification and regression problems.
In k-NN regression, the output is the property value for the object. This value is
the average of the values of k nearest neighbors.
Steps of Working
Example
Naïve Bayes Classifier algorithm
Naive Bayes classifiers are a family of simple "probabilistic classifiers" based on
applying Bayes' theorem with strong (naïve) independence assumptions between the
features. They are among the simplest Bayesian network models. But they could be
coupled with Kernel density estimation and achieve higher accuracy levels.
In the statistics and computer science literature, naive Bayes models are known
under a variety of names, including simple Bayes and independence Bayes.[5] All these
names reference the use of Bayes' theorem in the classifier's decision rule, but naïve
Bayes is not (necessarily) a Bayesian method
Naïve Bayes Classifier algorithm
Naïve Bayes Example
Naïve Bayes algorithm
Word Embedding using Word2Vec
Word Embedding is a language modeling technique used for mapping words to
vectors of real numbers. It represents words or phrases in vector space with several
dimensions. Word embeddings can be generated using various methods like neural
networks, co-occurrence matrix, probabilistic models, etc.
Word2Vec consists of models for generating word embedding. These models are
shallow two layer neural networks having one input layer, one hidden layer and one
output layer. Given enough data, usage and contexts, word2vec can make highly
accurate guesses about a words’ meaning based on past appearances. Those guesses
can be used to establish a word’s association with other words. Eg. Man is to boy what
whome is to girl etc…
• Skip Gram
Word Embedding using Word2Vec
CBOW (Continuous Bag of Words)
CBOW model predicts the current word given context words within specific
window. The input layer contains the context words and the output layer contains the
current word. The hidden layer contains the number of dimensions in which we want
to represent current word present at the output layer.
Word Embedding using Word2Vec
Skip Gram : Skip gram predicts the surrounding context words
within specific window given current word. The input layer
contains the current word and the output layer contains the
context words. The hidden layer contains the number of
dimensions in which we want to represent current word present
at the input layer.
How does it work
In simple words Word2vec is just vector representation of words in n
dimension(usually 300) space. It is also called embedding.