0% found this document useful (0 votes)
25 views19 pages

Lecture#11

word embedding and how it perform

Uploaded by

Qareena sadiq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views19 pages

Lecture#11

word embedding and how it perform

Uploaded by

Qareena sadiq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 19

Machine Learning (ML) for Natural

Language Processing (NLP)


Machine Learning for NLP
• Machine learning (ML) for natural language processing (NLP) and
text analytics involves using machine learning algorithms and
“narrow” artificial intelligence (AI) to understand the meaning of
text documents. These documents can be just about anything that
contains text: social media comments, online reviews, survey
responses, even financial, medical, legal and regulatory
documents. In essence, the role of machine learning and AI in
natural language processing and text analytics is to improve,
accelerate and automate the underlying text analytics functions
and NLP features that turn this unstructured text into useable data
and insights.
• Most importantly, “machine learning” really means “machine
teaching.” We know what the machine needs to learn, so our task
is to create a learning framework and provide properly-formatted,
relevant, clean data for the machine to learn from.
Machine Learning for NLP
Most importantly, “machine
learning” really means
“machine teaching.” We
know what the machine
needs to learn, so our task
is to create a learning
framework and provide
properly-formatted,
relevant, clean data for the
machine to learn from.
Supervised Machine Learning for Natural Language
Processing and Text Analytics
Machine learning for NLP and text
analytics involves a set of statistical
techniques for identifying parts of speech,
entities, sentiment, and other aspects of
text. The techniques can be expressed
as a model that is then applied to other
text, also known as supervised machine
learning. It also could be a set of
algorithms that work across large sets of
data to extract meaning, which is known
as unsupervised machine learning. It’s
important to understand the difference
between supervised and unsupervised
learning, and how you can get the best of
both in one system.
Supervised Machine Learning for Natural Language
Processing and Text Analytics
In supervised machine learning, a batch of text documents
are tagged or annotated with examples of what the
machine should look for and how it should interpret that
aspect. These documents are used to “train” a statistical
model, which is then given un-tagged text to analyze.
The most popular supervised NLP machine learning
algorithms are:
•Support Vector Machines
•Naive Bayes
•Maximum Entropy
•Conditional Random Field
•Neural Networks/Deep Learning
Support Vector Machines
Support Vector Machine or SVM is one of the most popular
Supervised Learning algorithms, which is used for Classification as
well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that
we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the


hyperplane. These extreme cases are called as support vectors, and
hence algorithm is termed as Support Vector Machine. Consider the
below diagram in which there are two different categories that are
classified using a decision boundary or hyperplane
SVM is Supervised Learning
Which Hyper plane is best
Pros:

• It works really well with a clear margin of


separation
• It is effective in high dimensional spaces.
• It is effective in cases where the number of
dimensions is greater than the number of
samples.
• It uses a subset of training points in the
decision function (called support vectors),
so it is also memory efficient.
Cons:
• It doesn’t perform well when we have large
data set because the required training time is
higher
• It also doesn’t perform very well, when the data
set has more noise i.e. target classes are
overlapping
• SVM doesn’t directly provide probability
estimates, these are calculated using an
expensive five-fold cross-validation. It is
included in the related SVC method of Python
scikit-learn library.
Naive Bayes Algorithm
• It is a classification technique based on Bayes’ Theorem with an
assumption of independence among predictors. In simple terms, a
Naive Bayes classifier assumes that the presence of a particular
feature in a class is unrelated to the presence of any other feature.

• For example, a fruit may be considered to be an apple if it is red,


round, and about 3 inches in diameter. Even if these features
depend on each other or upon the existence of the other features,
all of these properties independently contribute to the probability
that this fruit is an apple and that is why it is known as ‘Naive’.
• Naive Bayes model is easy to build and particularly useful for very
large data sets. Along with simplicity, Naive Bayes is known to
outperform even highly sophisticated classification methods.
• P(c|x) is the posterior probability of class (c, target) given
predictor (x, attributes).
• P(c) is the prior probability of class.
• P(x|c) is the likelihood which is the probability of predictor
given class.
• P(x) is the prior probability of predictor.
Unsupervised Machine Learning for Natural
Language Processing and Text Analytics
• Unsupervised machine learning involves training a model
without pre-tagging or annotating. Some of these techniques are
surprisingly easy to understand.

• Clustering means grouping similar documents together into


groups or sets. These clusters are then sorted based on
importance and relevancy (hierarchical clustering).
Latent Semantic Indexing (LSI)
• Another type of unsupervised learning is Latent Semantic
Indexing (LSI). This technique identifies on words and phrases
that frequently occur with each other. Data scientists use LSI
for faceted searches, or for returning search results that aren’t
the exact search term.

• For example, the terms “manifold” and “exhaust” are closely


related documents that discuss internal combustion engines.
So, when you Google “manifold” you get results that also
contain “exhaust”.
Thanx for Listening

You might also like