0% found this document useful (0 votes)
105 views

Natural Language Processing (NLP) With Deep NLP: From Zero To Hero

This document discusses Natural Language Processing (NLP) from zero to hero. It provides an introduction to NLP and outlines prerequisites including Python and basic concepts of machine learning and deep learning. The author, Fahad Hussain, offers further assistance through code, slides and a YouTube channel.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

Natural Language Processing (NLP) With Deep NLP: From Zero To Hero

This document discusses Natural Language Processing (NLP) from zero to hero. It provides an introduction to NLP and outlines prerequisites including Python and basic concepts of machine learning and deep learning. The author, Fahad Hussain, offers further assistance through code, slides and a YouTube channel.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Natural Language Processing (NLP)

with Deep NLP


from Zero to Hero
FAHAD HUSSAIN
MCS, MSCS, DAE(CIT)
Computer Science Instructor of well known international Center
Also, Machine Learning and Deep learning Practitioner

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Prerequisite
for
Natural language processing

Python
Basic Concept of Machine
Learning and Deep Learning
Natural language processing
Natural language processing
ial is a subfield of linguistics, computer
Natural language processingor(NLP)
ut
i nT
science, information aengineering, and artificial intelligence concerned
u s s l
with the interactions
d H between computers and human
ri a (natural)
ha ut o
languages,Fain particular how to program computers i n T to process and
s s a
analyze large amounts of natural languageddata.
H u
h a
Fa
Challenges in natural i a l
language processing frequently involve
or
Tut
i n
speech recognition, anatural language understanding, and natural
u ss
dH
language generation.
ha
Fa
For further assistance, code and slide https://fahadhussaincs.blogspot.com/
YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Types of NLP
i a l
t o r
Tu
s a in
us a l
H i
h a d
Tu tor
Fa in
s s a
d Hu
h a
Fa
i a l
t o r
Tu
s a in
us
d H
h a
Fa
For further assistance, code and slide https://fahadhussaincs.blogspot.com/
YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
How NLP, DNLP and DL involves in!!!
ri al
Tu to
s a in
us
d H
ah a l
F r i a
ut o
s a in T
u s
d H
a ha
F

i al
ut or
sa in T
u s
d H
a h a
For further assistance, code and slide https://fahadhussaincs.blogspot.com/
F
YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
How NLP, DNLP and DL involves in!!!
i a l
t o r
Tu
s a in
us a l
H i
h a d
Tu tor
Fa in
s s a
d Hu
h a
Fa
i a l
t o r
Tu
s a in
us
d H
h a
Fa
For further assistance, code and slide https://fahadhussaincs.blogspot.com/
YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Applications
i a l
t o r
Tu
s a in
us a l
H i
h a d
Tu tor
Fa in
s s a
d Hu
h a
Fa
i a l
t o r
Tu
s a in
us
d H
h a
Fa
For further assistance, code and slide https://fahadhussaincs.blogspot.com/
YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Used by
i a l
t o r
Tu
s a in
us a l
H i
h a d
Tu tor
Fa in
s s a
d Hu
h a
Fa
i a l
t o r
Tu
s a in
us
d H
h a
Fa
For further assistance, code and slide https://fahadhussaincs.blogspot.com/
YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Thanks
Do Subscribe, like and share
next we will discuss about

How NLP work?


For further assistance, code and slide https://fahadhussaincs.blogspot.com/
YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
NLP Working
i a l
t o r
Tu
s a in
us a l
H i
h a d
Tu tor
Fa in
s s a
d Hu
h a
Fa

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
NLP Working
i a l
t o r
Tu
s a in
us a l
H i
h a d
Tu tor
Fa in
s s a
d Hu
h a
Fa
i a l
t o r
Tu
s a in
us
d H
h a
Fa
For further assistance, code and slide https://fahadhussaincs.blogspot.com/
YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Natural Language Understanding
Ambiguity:
Lexical Ambiguity : The Tank is full of water.
Syntactic Ambiguity : ill men and women get to hospital.
Semantic Ambiguity : The Bike hit the pole while it was running.
Pragmatic Ambiguity : The Army is coming.

Phonology – This science helps to deal with patterns present in the sound and speeches related
to the sound as a physical entity.

Pragmatics – This science studies the different uses of language.

Morphology – This science deals with the structure of the words and the systematic relations
between them.

Syntax – This science deal with the structure of the sentences.

Semantics – This science deals with the literal meaning of the words, phrases as well as
sentences.

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Natural Language Generation
Based on NL-Understanding, lit will suggest about:
o ria
● What should say to user. Tut
a i n
● Should be Intelligent sand
s Covervational as like human
Hu i a l
● Usage of Structured
a d data. to r
a h Tu
F Sentence like planning.
● With text and a i n
uss
d H
h a
Fa
i a l
t o r
Tu
a i n
us s
d H
h a
Fa
For further assistance, code and slide https://fahadhussaincs.blogspot.com/
YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Thanks
Do Subscribe, like and share
next we will discuss about

Data Processes,
Tokenization?
For further assistance, code and slide https://fahadhussaincs.blogspot.com/
YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Tokenization
Tokenization is the processorof ial replacing sensitive data with
Tut
unique identification symbols
a i n that retain all the essential
s s l
Hudata without compromising its security.
information about the
d ri a
o
a u t
Fah a in T
s s
d Hu
h a
Fa
i a l
t o r
Tu
s a in
us
d H
h a
Fa
For further assistance, code and slide https://fahadhussaincs.blogspot.com/
YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Tokenization
There are many library / framework
l for NLP problem solution
o ria
ut
1. Natural Language sToolkit
s a (NLTK) in T
Hu i a l
2. TextBlob a d to r
a h Tu
3. CoreNLP F a i n
uss
4. Gensim d H
h a
5. spaCy Fa
6. polyglot i a l
t o r
Tu
7. scikit–learn a i n
us s
8. Pattern d H
a h a
SoFlets' move to COLAB for practical work...
For further assistance, code and slide https://fahadhussaincs.blogspot.com/
YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Bag of words
The bag-of-words model is a simplifying representation
used in natural language processing and information
i a l
retrieval (IR). In this, a text (suchtoras a sentence or a
Tu
document) is represented sas a in the bag (multiset) of its
words, disregarding grammar u s and even word order but
d H
h a
Fa
keeping multiplicity.

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Bag of words

i a l
u t or
in T
ss a
Hu
h a d
Fa

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Comments
i a l
Are you ready, to start to rthis course to?
Tu
a i n
uss
d H
h a
Yes No Fa None Don't No

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Comments
Are you ready, to start this course to?

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Training and Testing
NLP

Deep NLP

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Features Extraction in NLP
Frequency: This summarizes how often a given word appears within a document.

Document Frequency: This downscales words that appear a lot across documents.

Inverse Document Frequency (IDF): is a weight indicating how commonly a word is used. The more
frequent its usage across documents, the lower its score. The lower the score, the less important the
word becomes.

For example, the word the appears in almost all English texts and would thus have a very low IDF score
as it carries very little “topic” information. In contrast, if you take the word coffee, while it is common, it’s
not used as widely as the word the. Thus, coffee would have a higher IDF score than the.

TF-IDF: is a numerical statistic that is intended to reflect how important a word is to a document in a
collection or corpus.

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Sentence 1 : The car is driven on the road.
Sentence 2: The truck is driven on the highway.

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
1. Fair men
2. Fair women
3. men women Fair

Sent. 1 Sent. 2 Sent.3

fair

men

women

words IDF
men
women
fair

f1 f2 f3
men women fair
Sentence 1
Sentence 2
Sentence 3

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Hashing with HashingVectorizer in NLP
Count Vectorizer: The most straightforward one, it counts the number of times a
token shows up in the document and uses this value as its weight.

Hash Vectorizer: This one is designed to be as memory efficient as possible. Instead


of storing the tokens as strings, the vectorizer applies the hashing trick to encode them
as numerical indexes. The downside of this method is that once vectorized, the
features’ names can no longer be retrieved.

TF-IDF Vectorizer: TF-IDF stands for “term frequency-inverse document frequency”,


meaning the weight assigned to each token not only depends on its frequency in a
document but also how recurrent that term is in the entire corpora. More on that here.

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Hashing with HashingVectorizer in NLP
Counts and frequencies can be very useful, but one limitation of these
methods is that the vocabulary can become very large. This, in turn, will
require large vectors for encoding documents and impose large
requirements on memory and slow down algorithms. A clever work around
is to use a one way hash of words to convert them to integers. The clever
part is that no vocabulary is required and you can choose an arbitrary-long
fixed length vector. A downside is that the hash is a one-way function so
there is no way to convert the encoding back to a word

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Hashing with HashingVectorizer in NLP
The HashingVectorizer class implements this approach that can be used to
consistently hash words, then tokenize and encode documents as needed.
The example below demonstrates the HashingVectorizer for encoding a
single document. An arbitrary fixed-length vector size of 20 was chosen.

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
How to Prepare Text Data
With Keras
Keras is an open-source neural-network library written in Python. It is capable of
running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML.
Designed to enable fast experimentation with deep neural networks, it focuses on
being user-friendly, modular, and extensible. It was developed as part of the research
effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating
System).

https://keras.io/

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
How to Prepare Text Data with
scikit-learn
Lets understand the following topic using Keras;

• Split Words with text to word sequence

• Encoding with one hot

• Hash Encoding with hashing trick

• Tokenizer API

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
N-grams in NLP
N-grams of texts are extensively used in text mining and natural language
processing tasks. They are basically a set of co-occuring words within a given
window and when computing the n-grams you typically move one word forward
(although you can move X words forward in more advanced scenarios). For
example, for the sentence "The quick brown fox jump over the lazy dog". If N=2
(known as bigrams), then the n-grams would be:
OR
Contiguous sequence of n item from a given sample text.
['The quick',
['The quick brown',
'quick brown',
'quick brown fox',
'brown fox',
'brown fox jump',
'fox jump',
['The', 'quick', 'brown', 'fox', 'jump', 'over', 'the', 'lazy', 'dog'] 'fox jump over',
'jump over',
'jump over the',
'over the',
'over the lazy',
'the lazy',
'the lazy dog']
'lazy dog']

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
N-grams in NLP
How many N-grams in a sentence?
If X=Num of words in a given sentence K, the number of n-grams for
sentence K would be:

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
What is Machine Learning
Machine learning is an application of artificial intelligence (AI) that provides
systems the ability to automatically learn and improve from experience without
being explicitly programmed

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Machine Learning in NLP
Logistic Regression

For further assistance, code and slide https://fahadhussaincs.blogspot.com/


YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists
Machine Learning in NLP
In linear regression, the outcome (dependent variable) is continuous. It can have any one of
an infinite number of possible values.
In logistic regression, the outcome (dependent variable) has only a limited number of
possible values.

The dependent variable:

Logistic regression is used when the response variable is categorical in nature. For
instance, yes/no, true/false, red/green/blue, 1st/2nd/3rd/4th, etc.

Linear regression is used when your response variable is continuous. For instance, weight,
height, number of hours, etc.

Y = mX + C
g(x) = 1 / (1 + e^-x)
For further assistance, code and slide https://fahadhussaincs.blogspot.com/
YouTube Channel: https://www.youtube.com/channel/UCapJpINJKHzflWwCQ8Kse2g/playlists

You might also like