AP For NLP-Word 2 Vec
AP For NLP-Word 2 Vec
CONTENTS:
• NLP
• NLTK
• NLP Pre-
processing
WHAT IS NATURAL LANGUAGE
PROCESSING (NLP)?
• Natural Language Processing is an interdisciplinary field of Artificial
Intelligence.
• It is a technique used to teach a computer to understand Human
languages and also interpret just like us.
• It is the art of extracting information, hidden insights from unstructured
text.
• It is a sophisticated field that makes computers process text data on a
large scale.
• The ultimate goal of NLP is to make computers
and computer-controlled bots understand and interpret Human
Languages, just as we do.
COMPONENTS OF
NLP
• Natural Language
Understanding
• Natural Language
Generation
Figure: Components
of NLP
NATURAL LANGUAGE UNDERSTANDING:-
• NLU helps the machine to understand and analyze human language by extracting
the text from large data such as keywords, emotions, relations, and semantics,
etc.
Let’s see what challenges are faced by a machine-
He is looking for a match.
• What do you understand by the ‘match’ keyword?
• This is Lexical Ambiguity. It happens when a word has different meanings. Lexical
ambiguity can be resolved by using parts-of-speech (POS)tagging techniques.
The Fish is ready to eat.
• What do you understand by the above example?
• This is Syntactical Ambiguity which means when we see more meanings in a
sequence of words and also Called Grammatical Ambiguity.
NATURAL LANGUAGE
GENERATION:-
• Tokenization
• POS- Tagging
• NER
• Lemmatization
• Sentence Boundary Detection
• Text Classification
WHAT IS Gensim
• Topic modeling
• Word 2 Vec
• Document Similarity
• Stop words are used to filter some words which are repetitive and don’t hold any
information. For example, words like – {that these, below, is, are, etc.} don’t provide any
information, so they need to be removed from the text. Stop Words are considered as
Noise. NLTK provides a huge list of stop words
• Very common words like 'in', 'is', and 'an' are often used as stop words since they
don’t add a lot of meaning to a text in and of themselves.
STEMMING
Stemming Lemmatization
Stemming is a process that Lemmatization considers the
stems or removes last few context and converts the word to
characters from a word, often its meaningful base form, which
leading to incorrect meanings is called Lemma.
and spelling.
For instance, stemming the word For instance, lemmatizing
‘Caring’ would return ‘Car’. the word ‘Caring’ would
return ‘Care’.
Stemming is used in case of Lemmatization is
large dataset where computationally expensive
performance is an issue since it involves look-up tables
and what not.
CHUNKING
• Semantic Similarity: Words with similar meanings are located close to each
other in the vector space. For example, the words "king" and "queen" might be
close to each other.
• Word2Vec:
• Fast Text:
Transformers):
Word 2 Vec Model
• It is a popular technique used in natural language processing
(NLP) to transform words into numerical vectors of fixed
dimensionality.
• Skip-gram:
How Word2Vec Works
• Continuous Bag of Words (CBOW):
• The CBOW model predicts the target word given the context
words.
• It takes the context of each word (a few words before and after
the target word) and tries to predict the target word based on
these context words.
How Word2Vec Works
• Skip Gram
• For each word in the text, the model uses the word to predict
the words within a certain range before and after it.
How Word2Vec Works: Example
• Consider a simple sentence: "The cat sat on the mat."
• In CBOW, for the target word "sat," the context words are
["The", "cat", "on", "the", "mat"]. The model uses these context
words to predict "sat."
• In Skip-gram, for the target word "sat," the model will use "sat"
to predict each of the context words ["The", "cat", "on", "the",
"mat"].