NLP - Natural Language Processing
NLP - Natural Language Processing
Natural
NLP – Natural Language Processing
• I. Machine learning in Natural Language Processing
• II. Word2Vec
• III. Transformers
NLP – Natural Language Processing
• I. Machine learning in Natural Language Processing
• II. Word2Vec
• III. Transformers
Part 1 Machine learning in Natural Language Processing
Several Options
Bag of words with randomally numbers
1 3 8 7 2 4 15 10 0 789 92 34 47 71 79
Tf – idf
Explaination
the ratio of number of times the word
appears in a document compared to the
total number of words in that document
TF-IDF
1 23 4 5 6 7 8 9 10 11 12 13 14 15 16
Frequency
1 23 4 5 6 7 8 9 10 11 12 13 14 15 16
TF-IDF
NLP Problem : How to rate a review
Two Algorithm :
Machine
Learning
Naive Bayse
Decision tree
Naive bayse
Naive bayse (example from NLP)
• Rate reviews :
•- it’s beautiful !
1 5
4 5
Reminder - Confusion Matrix - Accuracy
Positive Negative
Rainy sunny
No
Yes
Vocabulary : ( 12 words)
[“the“, “blue“, “car“ , “have“, “two“, “red“, “doors“ , “on“ , “there“, “are“,
“poster“, “of“ ]
• 1) The blue car have two red doors => [1,1,1,1,1,1,1,0,0,0,0,0]
• 2) On the blue doors,
there are poster of red car => [1,1,1,0,0,1,1,1,1,1,1,1]
Bag Of Word
• [1,1,1,1,1,1,1,0,0,0,0,0]
• [1,1,1,0,0,1,1,1,1,1,1,1]
Bag Of Word
• Weaknesses:
- word order : “John likes Mary” ~= “Mary likes John”
(issue : n-grams ?)
X
Independent
Variable Dependent
Linear Model:
Slope Intercept (bias)
NLP – Natural Language Processing
• Bag-of-words
• Word2Vec
• I. The Goal
• II. Implementation
• Reminder :
• Linear Regression
• Cost Function
• Neural Network
• Entrainment of the Network
X
Error Function-- SSE
Sum of Squared Errors (SSE) = ½ Sum (Actual House Price – Predicted House Price)2
= ½ Sum(Y – Ypred)2
NLP – Natural Language Processing
• Bag-of-words
• Word2Vec
• I. The Goal
• II. Implementation
• Reminder :
• Linear Regression
• Cost Function
• Neural Network
• Entrainment of the Network
X
פתרון :רשת נוירונים Neural network
היה לנו נוירון בודד וקלט •
נוסיף עוד נוירון (או יותר) •
לשכבת הפלט
נוסיף שכבת ביניים אחת •
(לפחות)
קיבלנו רשת נוירונים •
https://www.youtube.com/watch?v=lGLto9Xd7bU
Backpropagation
https://www.youtube.com/watch?v=GJXKOrqZauk
NLP – Natural Language Processing
• Bag-of-words
• Word2Vec
• I. The Goal
• II. Implementation
• Reminder :
• Linear Regression
• Cost Function
• Neural Network
• Entrainment of the Network
Context Target
BIG Data
Skip-Gram (SG)
• Pineapples are spikey and yellow
Context Target
SMALL
Data
•
Context Target
Skip-Gram (SG)
Similarity
Example:
Natural language processing
and machine is fun exciting
Transformer
More efficient
of Word2vec is it
possible ?
Embeddings that allows us Transformers model explicitly
to have multiple (more than Word2Vec takes as input the position (index)
one) vector (numeric) embeddings do of each word in the sentence
representations for the not take into
same word account the word
position.
• The Decoder
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward
• The Decoder
Transformers structure
Encoder
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward
• The Decoder
The encoder
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward
• The Decoder
Step 1 :
Input
Embedding
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward
• The Decoder
Step 2 :
Positional
Encoding
For the positional encoding (PE) for each position and each
dimension i of the dmodel = 512 of the word embedding vector:
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward
• The Decoder
Step 3 :
Multi-Head
Attention
3 Random
Matrix that we
will calculate
optimal values
in the
computing
• Related to itself more that he
others
Attention Matrix
Come Back Dog ?
Food ?
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward
• The Decoder
Step 4 : Add & Norm
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward
• The Decoder
Step 5 :
Feed Two neural layers
Forward
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward
• The Decoder
The Decoder
Decoder