0% found this document useful (0 votes)

2 views

NLP_Machine Learning

Uploaded by

Thùy Minh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

NLP_Machine Learning

Uploaded by

Thùy Minh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Deep

Learning for
Natural
Chapter 24.
Language
7/27/2024 TRAN THI THUY MINH 1
1. Word embedding

2. RNN for NLP

TABLE OF 3. Sequence-to-sequence Models

CONTENTS
4. The transformer Architecture

5. Pretraining and Transfer Learning

6. Summary

7/27/2024 2
1. Word embedding
• If we want to plug words into a Neural Network, or some other machine learning
algorithm, we need a way to turn the words into numbers
• Word embedding is a technique used in NLP and ML to represent words as numerical
vectors

“She is beautiful”
She 1
Is 2
Beautiful 3

7/27/2024 TRAN THI THUY MINH 3

1. Word embedding

“She is beautiful” “She is pretty”

She 1 She
Is 2 Is
Beautiful 3 Pretty 4

“Beautiful” and “Pretty” mean similar things – they have different number
-> Neural network will need a lot more complexity and training

-> It would be nice if similar words that are used in similar ways could be given similar numbers.
So that learning how to use one word will help learn how to use the other at the same time

7/27/2024 TRAN THI THUY MINH 4

1. Word embedding

• Word embedding are learned automatically from the data

• The feature space has the property that similar words end up having similar vectors
• These numerical representations capture semantic relationships between words.
words with similar meanings will have similar vector representations

7/27/2024 TRAN THI THUY MINH 5

2. Recurrent Neural Networks for NLP
2.1. Language model with RNNs
• In RNN language model, each input word is encoded as a word embedding vector
• RNN training involves predicting the next word given previous words and updating weight through
backpropagation
• RNNs can generate text by sampling from output distributions, offering options like most likely word

7/27/2024 TRAN THI THUY MINH 6

2. Recurrent Neural Networks for NLP
2.2. Classification with recurrent neural networks
• For classification tasks, RNNs require labeled data
• To captue the context on the right, we can use a bidirectional RNN, which concatenates a separate right-to-left
model onto the left-to-right model

7/27/2024 TRAN THI THUY MINH 7

2. Recurrent Neural Networks for NLP
2.3. LSTMs for NLP tasks
Exploding/Vanishing gradient problem

---> Long Short Term Memory (LSTM)

• LSTMs is a kind of RNN, it can choose to remember some parts of the input, copying it over to the next timestep, and
forgot other parts.
• Unlike traditional RNNs, LSTMs use gating units to selectively retain or forget information over time steps, enabling
them to better preserve relevant information for NLP tasks.

7/27/2024 TRAN THI THUY MINH 8

2. Recurrent Neural Networks for NLP
2.3. LSTMs for NLP tasks
Long term memories

Short term memories

7/27/2024 TRAN THI THUY MINH 9

3. Sequence-to-sequence Models
• The most studied tasks in NLP is machine translation (MT)

Source language Target language

• The most commonly used for machine translation is called a sequence-to-sequence mode

7/27/2024 TRAN THI THUY MINH 10

3. Sequence-to-sequence Models
3.1. Attention
“Don’t eat the delicious looking and smelling pizza”

Forget

“Eat the delicious looking and smelling pizza”

7/27/2024 TRAN THI THUY MINH 11

3. Sequence-to-sequence Models
3.1. Attention
• The Main idea of Attention is to add a bunch of new paths from the Encoder to the Decoder, one per input
value, so that each step of the Decoder can directly access input values

7/27/2024 TRAN THI THUY MINH 12

3. Sequence-to-sequence Models
3.1. Attention

• First, the attention component itself has no learned weights and supports variable-leght
sequences on both source and target side
• Second, attention is entirely latent
• Attention can also be combined with multilayer RNNs

7/27/2024 TRAN THI THUY MINH 13

3. Sequence-to-sequence Models
3.2. Decoding
• Decoding is the procedure, that we generated the target one word at a time, and then feed
back in the word that we generated at the next timestep
• To improve decoding, beam search is often used

7/27/2024 TRAN THI THUY MINH 14

4. The Transformer Architecture
4.1. Self-attention
• Self-attention in sequence-to-sequence models allows each hidden state sequence to attend
to itself, capturing both nearby and long-distance context.
• The basic method of self-attention computes the attention matrix directly from the dot
product of input vectors, leading to a bias towards attending to oneself.
-> To address this, transformers project the input into three different representations using
separate weight matrices:
- The query vector:
- The key vector:
- The value vector:

7/27/2024 TRAN THI THUY MINH 15

4. The Transformer Architecture
4.2. From self-attention to transformer
• The transformer model comprises multiple layers, each containing sub-layers
• Self-attention is applied first in each layer, followed by feedforward layers with ReLU activation
• To mitigate vanishing gradients, residual connections are employed

7/27/2024 TRAN THI THUY MINH 16

5. Pretraining and Transfer Learning
5.1. Pretrained word embeddings
• Word embedding algorithm Word2vec
Glove
FastText
ELMo
BERT

GloVe (Global Vectors) model

Derive the semantic relationship between words using word-word co-occurrence matrix

7/27/2024 TRAN THI THUY MINH 17

5. Pretraining and Transfer Learning
5.1. Pretrained word embeddings
I love cats
I love cats you
I love you

Window = 1

I love cats you

I 0 2 0 0

love 2 0 1 1

cats 0 1 0 0

you 0 1 0 0

7/27/2024 TRAN THI THUY MINH 18

5. Pretraining and Transfer Learning
5.1. Pretrained word embeddings

I love cats you i The probability of ij

I 0 2 0 0

love 2 0 1 1 Pij = Xij/ Xi

cats 0 1 0 0

you 0 1 0 0
P(I | Love) = 2/2 = 1
j X

7/27/2024 TRAN THI THUY MINH 19

5. Pretraining and Transfer Learning
5.2. Pretrained contextual representation
• A contextual representations map both a word and the surrouding context of words into a word embedding
vector

7/27/2024 TRAN THI THUY MINH 20

5. Pretraining and Transfer Learning
5.3. Masked language models
• A masked language models (MLM) are trained by masking (hidden) individual words in the input and asking
the model to predict the masked words.
• For this task, one can use a deep bidirectional RNN or transformer on top of the masked sentence.
• For example, given the input sentence “The river__rose five feet” we can mask the middle word to get
• “The rive five feet” and ask the model to fill in the blank

rose

Transformer

The river [MASK] five feet

7/27/2024 TRAN THI THUY MINH 21

6. Summary

This chapter emphasizes:

1 Word embeddings provide robust, continuous representations of words, pretrained on unlabeled text

data.

2 Recurrent neural networks (RNNs) excel in capturing local and long-distance context.

3 Sequence-to-sequence models are valuable for machine translation and text generation.

4 Transformers, with self-attention, effectively model both local and long-range context, optimizing

hardware matrix multiplication.

5 Transfer learning, leveraging pretrained contextual word embeddings, enables versatile model

development

7/27/2024 TRAN THI THUY MINH 22

Thank you for listening

Artificial Intelligence - Knowledge Representation and Reasoning - Unit 4 - Week 1
No ratings yet
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 4 - Week 1
4 pages
Generative AI With Large Language Models
100% (1)
Generative AI With Large Language Models
31 pages
A Deep Neural Network Model For Target-Based Sentiment Analysis
No ratings yet
A Deep Neural Network Model For Target-Based Sentiment Analysis
7 pages
Deep Learning for Natural Language
No ratings yet
Deep Learning for Natural Language
23 pages
01 The Transformer
No ratings yet
01 The Transformer
64 pages
Model5 partial
No ratings yet
Model5 partial
52 pages
AN2DL_06_2324_AttentionAndTrasformers
No ratings yet
AN2DL_06_2324_AttentionAndTrasformers
60 pages
8 - 1 - NICS 2018 - Vietnamese Keyword Extraction Using Hybrid Deep Learning Methods - Bui Thanh Hung
No ratings yet
8 - 1 - NICS 2018 - Vietnamese Keyword Extraction Using Hybrid Deep Learning Methods - Bui Thanh Hung
6 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
Lec 02
No ratings yet
Lec 02
33 pages
Lecture-28-TransformerIntroductionFinal-1
No ratings yet
Lecture-28-TransformerIntroductionFinal-1
69 pages
10.0 SequenceModeling
No ratings yet
10.0 SequenceModeling
27 pages
generative AI Unit 3 notes
No ratings yet
generative AI Unit 3 notes
8 pages
French To English Translator in PyTorch
No ratings yet
French To English Translator in PyTorch
30 pages
L22_Attention in Deep Learning
No ratings yet
L22_Attention in Deep Learning
65 pages
Notes 1311
No ratings yet
Notes 1311
4 pages
Word Embeddings With Neural Network
No ratings yet
Word Embeddings With Neural Network
5 pages
Evolving landscap of nlp
No ratings yet
Evolving landscap of nlp
5 pages
Cdroh 0019
No ratings yet
Cdroh 0019
2 pages
Toward Multilingual Neural Machine Translation With Universal Encoder and Decoder
No ratings yet
Toward Multilingual Neural Machine Translation With Universal Encoder and Decoder
10 pages
cs224n 2023 Lecture9 Pretraining
No ratings yet
cs224n 2023 Lecture9 Pretraining
54 pages
Creación de aplicaciones LLM modelos de lenguaje…
No ratings yet
Creación de aplicaciones LLM modelos de lenguaje…
5 pages
CS 601 Translation, Beam Search, in ML
No ratings yet
CS 601 Translation, Beam Search, in ML
28 pages
13 - Bert
No ratings yet
13 - Bert
17 pages
UNIT 5a
No ratings yet
UNIT 5a
48 pages
Bert
No ratings yet
Bert
60 pages
A Review of Deep Learning Techniques For Speech Processing
No ratings yet
A Review of Deep Learning Techniques For Speech Processing
111 pages
Session 8
No ratings yet
Session 8
24 pages
Course Material- Artificail Intelligence-Week1_update
No ratings yet
Course Material- Artificail Intelligence-Week1_update
78 pages
Limitless Horizon
No ratings yet
Limitless Horizon
12 pages
UNIT-5 (2)
No ratings yet
UNIT-5 (2)
5 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
_exp_8_report
No ratings yet
_exp_8_report
2 pages
Generative AI With LArge Language Models
No ratings yet
Generative AI With LArge Language Models
36 pages
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
No ratings yet
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
18 pages
Deep Learning RNN
100% (1)
Deep Learning RNN
53 pages
Sentiment Analysis Using Convolutional Neural Network
No ratings yet
Sentiment Analysis Using Convolutional Neural Network
6 pages
CH-3
No ratings yet
CH-3
183 pages
Christopher Manning Lecture 5: Language Models and Recurrent Neural Networks (Oh, and Finish Neural Dependency Parsing J)
No ratings yet
Christopher Manning Lecture 5: Language Models and Recurrent Neural Networks (Oh, and Finish Neural Dependency Parsing J)
66 pages
Mathematics 12 00997
No ratings yet
Mathematics 12 00997
19 pages
Different Artificial Neural Networks Architectures
No ratings yet
Different Artificial Neural Networks Architectures
27 pages
32-Bidirectional Encoder Representations From Transformers (BERT) - 30!09!2024
No ratings yet
32-Bidirectional Encoder Representations From Transformers (BERT) - 30!09!2024
8 pages
Torralba Skip Thought Vectors
No ratings yet
Torralba Skip Thought Vectors
10 pages
Deep Neural Network AIML Handout v1.0-1
No ratings yet
Deep Neural Network AIML Handout v1.0-1
8 pages
Represented Using Tensors, and As A Result, Neural Network Programming Utilizes
No ratings yet
Represented Using Tensors, and As A Result, Neural Network Programming Utilizes
32 pages
Zhou 2020
No ratings yet
Zhou 2020
5 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
applsci-10-07557
No ratings yet
applsci-10-07557
22 pages
XCS224N Module6 Slides
No ratings yet
XCS224N Module6 Slides
99 pages
Unit 1
No ratings yet
Unit 1
12 pages
Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
lesson_13
No ratings yet
lesson_13
29 pages
Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
No ratings yet
Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
4 pages
495 Lecture 11 BERT
No ratings yet
495 Lecture 11 BERT
31 pages
Module1_L4_LLMs_new
No ratings yet
Module1_L4_LLMs_new
37 pages
NLP DL Lecture4
No ratings yet
NLP DL Lecture4
78 pages
ANN Text and Sequence Processing
No ratings yet
ANN Text and Sequence Processing
33 pages
Neural Networks For Machine Learning: Lecture 16a Learning A Joint Model of Images and Captions
No ratings yet
Neural Networks For Machine Learning: Lecture 16a Learning A Joint Model of Images and Captions
19 pages
Toxic Comment Classification Using Natural Language Processing IRJET-V7I61123
No ratings yet
Toxic Comment Classification Using Natural Language Processing IRJET-V7I61123
4 pages
Unit 6
No ratings yet
Unit 6
41 pages
Advanced Deep Learning Techniques for Natural Language Understanding: A Comprehensive Guide
From Everand
Advanced Deep Learning Techniques for Natural Language Understanding: A Comprehensive Guide
Adam Jones
No ratings yet
RCURRENCY: Live Digital Asset Trading Using A Recurrent Neural Network-Based Forecasting System
No ratings yet
RCURRENCY: Live Digital Asset Trading Using A Recurrent Neural Network-Based Forecasting System
8 pages
To Introduce The Foundations of Human Computer Interaction. To Explain The Models and Theories of HCI. To Review The Guidelines For User Interface
No ratings yet
To Introduce The Foundations of Human Computer Interaction. To Explain The Models and Theories of HCI. To Review The Guidelines For User Interface
13 pages
CV - Suman Adhikari - For Merge
No ratings yet
CV - Suman Adhikari - For Merge
5 pages
Ki 2002 Advances in Artificial Intelligence 25TH Annual German Conference On Ai
No ratings yet
Ki 2002 Advances in Artificial Intelligence 25TH Annual German Conference On Ai
5 pages
Medical Insurance Cost Prediction
100% (1)
Medical Insurance Cost Prediction
18 pages
From data to action
No ratings yet
From data to action
17 pages
Teaching Introductory Artificial Intelligence With Pac-Man: January 2010
No ratings yet
Teaching Introductory Artificial Intelligence With Pac-Man: January 2010
6 pages
Artificial Intelligence (Subject Code - 417) : Blue-Print For PT2 For
No ratings yet
Artificial Intelligence (Subject Code - 417) : Blue-Print For PT2 For
1 page
Assignment 2
No ratings yet
Assignment 2
7 pages
5 RRL Table
No ratings yet
5 RRL Table
3 pages
AI Notes
No ratings yet
AI Notes
91 pages
Brochure International Conference Andaman - Final
No ratings yet
Brochure International Conference Andaman - Final
2 pages
Digital Technologies and Their Impact On Society and Governance
No ratings yet
Digital Technologies and Their Impact On Society and Governance
16 pages
National Defence Aug 2019
No ratings yet
National Defence Aug 2019
48 pages
【4AI+4DO】MA01+-XACX0440_UserManual_EN_v1.1 (1)
No ratings yet
【4AI+4DO】MA01+-XACX0440_UserManual_EN_v1.1 (1)
40 pages
CS8080 - IRT - Question Bank - R 2017 - GCT
No ratings yet
CS8080 - IRT - Question Bank - R 2017 - GCT
10 pages
Pidato Bintang Bhs Inggris
100% (1)
Pidato Bintang Bhs Inggris
8 pages
The Electronic Bill of Lading: Challenges of Paperless Trade
No ratings yet
The Electronic Bill of Lading: Challenges of Paperless Trade
25 pages
Sobel Erosion Dilation Examples
No ratings yet
Sobel Erosion Dilation Examples
4 pages
Book Title: Integrated Devices and Circuits For Artificial Intelligence
No ratings yet
Book Title: Integrated Devices and Circuits For Artificial Intelligence
1 page
Ai - Introduction: FDP / Short Term Training On Artificial Intelligence & Deep Learning Applications
No ratings yet
Ai - Introduction: FDP / Short Term Training On Artificial Intelligence & Deep Learning Applications
6 pages
Sameh Sobhy Ahmed Kishta
No ratings yet
Sameh Sobhy Ahmed Kishta
3 pages
3-Intro To Deep Learning and Perceptron
No ratings yet
3-Intro To Deep Learning and Perceptron
43 pages
Thesis Project For Computer Science
100% (3)
Thesis Project For Computer Science
8 pages
DL501 Course Summary
100% (1)
DL501 Course Summary
2 pages
NeurIPS 2021 Diffusion Models Beat Gans On Image Synthesis Paper
No ratings yet
NeurIPS 2021 Diffusion Models Beat Gans On Image Synthesis Paper
15 pages
Final Ta
No ratings yet
Final Ta
18 pages
Arjun Alagappa Research Paper Final
No ratings yet
Arjun Alagappa Research Paper Final
5 pages
Electrical Engineering Technical Seminar Report
No ratings yet
Electrical Engineering Technical Seminar Report
19 pages