0% found this document useful (0 votes)
68 views74 pages

NLP - Natural Language Processing

The document discusses natural language processing (NLP) and several related concepts: - Machine learning techniques like bag-of-words, Word2Vec, and transformers are used for NLP tasks. Word2Vec aims to capture word similarity using neural networks. - Transformers use positional encoding and multi-head attention to better capture word order compared to Word2Vec. Transformers take the position of each word into account. - The document provides an overview of key steps in transformers including input embedding, positional encoding, multi-head attention, and adding and normalizing layers in the encoder.

Uploaded by

MichaelLevy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views74 pages

NLP - Natural Language Processing

The document discusses natural language processing (NLP) and several related concepts: - Machine learning techniques like bag-of-words, Word2Vec, and transformers are used for NLP tasks. Word2Vec aims to capture word similarity using neural networks. - Transformers use positional encoding and multi-head attention to better capture word order compared to Word2Vec. Transformers take the position of each word into account. - The document provides an overview of key steps in transformers including input embedding, positional encoding, multi-head attention, and adding and normalizing layers in the encoder.

Uploaded by

MichaelLevy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

NLP –

Natural
NLP – Natural Language Processing
• I. Machine learning in Natural Language Processing
• II. Word2Vec
• III. Transformers
NLP – Natural Language Processing
• I. Machine learning in Natural Language Processing
• II. Word2Vec
• III. Transformers
Part 1 Machine learning in Natural Language Processing
Several Options
Bag of words with randomally numbers

1 3 8 7 2 4 15 10 0 789 92 34 47 71 79

Tf – idf
Explaination
the ratio of number of times the word
appears in a document compared to the
total number of words in that document
TF-IDF
1 23 4 5 6 7 8 9 10 11 12 13 14 15 16

Frequency

1 23 4 5 6 7 8 9 10 11 12 13 14 15 16

TF-IDF
NLP Problem : How to rate a review
Two Algorithm :
Machine
Learning
Naive Bayse

Decision tree
Naive bayse
Naive bayse (example from NLP)
• Rate reviews :
•- it’s beautiful !

1 5
4 5
Reminder - Confusion Matrix - Accuracy

Positive Negative

Accuracy = (TP + TN )/ (TP + FN + FP + TN)


Reminder – Accuracy (suite)
Decision tree
overcast

Rainy sunny

No
Yes

Not play play Not play


play
NLP – Natural Language Processing
• I. Machine learning in Natural Language Processing
• II. Word2Vec
• III. Transformers
NLP – Natural Language Processing
• Bag-of-words
• Word2Vec
• I. The Goal
• II. Implementation
• Reminder :
• Linear Regression
• Cost Function
• Neural Network
• Entrainment of the Network

• Skip-Gram (SG) / Continuous Bag Of Word (CBOW)


NLP – Natural Language Processing
• Bag-of-words
• Word2Vec
• I. The Goal
• II. Implementation
• Reminder :
• Linear Regression
• Cost Function
• Neural Network
• Entrainment of the Network

• Skip-Gram (SG) / Continuous Bag Of Word (CBOW)


Bag Of Word
Example:
• 1) The blue car have two red doors
• 2) On the blue doors, there are poster of red car
Vocabulary : ( 12 words)
[“the“, “blue“, “car“ , “have“, “two“, “red“, “doors“ , “on“ , “there“, “are“, “poster“, “of“ ]

One Hot Encoding :


“the“ => [1,0,0,0,0,0,0,0,0,0,0,0] , “blue“=> [0,1,0,0,0,0,0,0,0,0,0,0]

“two“=> [0,0,0,0,1,0,0,0,0,0,0,0] , “red“=> [0,0,0,0,0,1,0,0,0,0,0,0] ,

“poster“=> [0,0,0,0,0,0,0,0,0,0,1,0] , “of“=> [0,0,0,0,0,0,0,0,0,0,0,1]
Bag Of Word

Vocabulary : ( 12 words)
[“the“, “blue“, “car“ , “have“, “two“, “red“, “doors“ , “on“ , “there“, “are“,
“poster“, “of“ ]
• 1) The blue car have two red doors => [1,1,1,1,1,1,1,0,0,0,0,0]
• 2) On the blue doors,
there are poster of red car => [1,1,1,0,0,1,1,1,1,1,1,1]
Bag Of Word
• [1,1,1,1,1,1,1,0,0,0,0,0]
• [1,1,1,0,0,1,1,1,1,1,1,1]
Bag Of Word
• Weaknesses:
- word order : “John likes Mary” ~= “Mary likes John”
(issue : n-grams ?)

- meaning of the underlying words : ‘’King’’ != ‘’Queen’’


NLP – Natural Language Processing
• Bag-of-words
• Word2Vec
• I. The Goal
• II. Implementation
• Reminder :
• Linear Regression
• Cost Function
• Neural Network
• Entrainment of the Network

• Skip-Gram (SG) / Continuous Bag Of Word (CBOW)


Word2Vec (Word To Vector)- The Goal
Similarity
Word2Vec (Word To Vector)- The Goal
BOW (bag of word)
Apple [ 1 , 0 , 0 ]
Mango [ 0 Is_fruit
, 1 , is_animal
0 ] is_eatable
Apple [ [ 0 0.9
Elephant , 0 , , 0.01
1 ,] 1 ]
Mango [ 0.85 , 0.02 , 1 ]
Elephant [ 0.1, 0.9 , 1 ]
NLP – Natural Language Processing
• Bag-of-words
• Word2Vec
• I. The Goal
• II. Implementation
• Reminder :
• Linear Regression
• Cost Function
• Neural Network
• Entrainment of the Network

• Skip-Gram (SG) / Continuous Bag Of Word (CBOW)


Linear Regression
Y

X
Independent
Variable Dependent

Linear Model:
Slope Intercept (bias)
NLP – Natural Language Processing
• Bag-of-words
• Word2Vec
• I. The Goal
• II. Implementation
• Reminder :
• Linear Regression
• Cost Function
• Neural Network
• Entrainment of the Network

• Skip-Gram (SG) / Continuous Bag Of Word (CBOW)


Error Function

X
Error Function-- SSE

Sum of Squared Errors (SSE) = ½ Sum (Actual House Price – Predicted House Price)2
=  ½ Sum(Y – Ypred)2
NLP – Natural Language Processing
• Bag-of-words
• Word2Vec
• I. The Goal
• II. Implementation
• Reminder :
• Linear Regression
• Cost Function
• Neural Network
• Entrainment of the Network

• Skip-Gram (SG) / Continuous Bag Of Word (CBOW)


Neural Network

X
‫פתרון‪ :‬רשת נוירונים ‪Neural network‬‬
‫היה לנו נוירון בודד וקלט‬ ‫•‬
‫נוסיף עוד נוירון (או יותר)‬ ‫•‬
‫לשכבת הפלט‬
‫נוסיף שכבת ביניים אחת‬ ‫•‬
‫(לפחות)‬
‫קיבלנו רשת נוירונים‬ ‫•‬

‫• ברשת הזאת כל הנוירונים‬


‫מחוברים זה לזה (נראה אח"כ‬
‫מבנים אחרים)‬
NLP – Natural Language Processing
• Bag-of-words
• Word2Vec
• I. The Goal
• II. Implementation
• Reminder :
• Linear Regression
• Cost Function
• Neural Network
• Entrainment of the Network

• Skip-Gram (SG) / Continuous Bag Of Word (CBOW)


Forward Propagation

https://www.youtube.com/watch?v=lGLto9Xd7bU
Backpropagation

https://www.youtube.com/watch?v=GJXKOrqZauk
NLP – Natural Language Processing
• Bag-of-words
• Word2Vec
• I. The Goal
• II. Implementation
• Reminder :
• Linear Regression
• Cost Function
• Neural Network
• Entrainment of the Network

• Continuous Bag Of Word (CBOW) / Skip-Gram (SG)


Continuous Bag Of Word (CBOW) / Skip-Gram (SG)
Continuous Bag Of Word
• Pineapples are spikey and yellow

Context Target

BIG Data
Skip-Gram (SG)
• Pineapples are spikey and yellow

Context Target

SMALL
Data

Skip-Gram (SG) Input : Target Output: Context

• Pineapples are spikey and yellow

Context Target
Skip-Gram (SG)
Similarity
Example:
Natural language processing
and machine is fun exciting

Is_fruit is_animal is_eatable


Apple [ 0.9 , 0.01 , 1 ]
Mango [ 0.85 , 0.02 , 1 ]
Elephant [ 0.1 , 0.9 , 1 ]
NLP – Natural Language Processing
• I. Machine learning in Natural Language Processing
• II. Word2Vec
• III. Transformers
Part 3

Transformer
More efficient
of Word2vec is it
possible ?
Embeddings that allows us Transformers model explicitly
to have multiple (more than Word2Vec takes as input the position (index)
one) vector (numeric) embeddings do of each word in the sentence
representations for the not take into
same word account the word
position.

Example: “John likes Mary” ~= “Mary likes John”


Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward

• The Decoder
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward

• The Decoder
Transformers structure
Encoder
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward

• The Decoder
The encoder
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward

• The Decoder
Step 1 :
Input
Embedding
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward

• The Decoder
Step 2 :
Positional
Encoding
For the positional encoding (PE) for each position and each
dimension i of the dmodel = 512 of the word embedding vector:
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward
• The Decoder
Step 3 :
Multi-Head
Attention
3 Random
Matrix that we
will calculate
optimal values
in the
computing
• Related to itself more that he
others
Attention Matrix
Come Back Dog ?
Food ?
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward
• The Decoder
Step 4 : Add & Norm
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward
• The Decoder
Step 5 :
Feed Two neural layers

Forward
Transformers
• Transformers structure
• The encoder:
• Input Embedding
• Positional Encoding
• Multi-Head Attention
• Add & Norm
• Feed Forward
• The Decoder
The Decoder
Decoder

The original Transformer was trained on a 4.5-million-sentence-pair English-


German
dataset and a 36-million-sentence English-French dataset.

You might also like