0% found this document useful (0 votes)

1 views

07-dlintro deep learning nlp

The document provides an overview of deep learning applications in natural language processing (NLP), contrasting traditional methods with neural network approaches. It discusses the evolution of feature representation from symbolic to neural, emphasizing the benefits of learned features and universal representations. Additionally, it covers various architectures, loss functions, optimization techniques, and practical advice for implementing deep learning in NLP tasks.

Uploaded by

Rasha Elsayed Sakr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

07-dlintro deep learning nlp

Uploaded by

Rasha Elsayed Sakr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

An Intro to Deep Learning for NLP

Mausam
Disclaimer: this is an outsider’s understanding. Some details may be inaccurate

(several slides by Yoav Goldberg & Graham Neubig)

NLP before DL #1
Assumptions
- doc: bag/sequence/tree of words
- model: bag of features (linear)
- feature: symbolic (diff wt for each)

Features
Model
(NB, SVM, CRF)

Supervised Optimize function

Training (LL, sqd error, margin…)
Data
Learn feature weights
NLP before DL #2
Assumptions
z1 z2 … - doc/query/word is a vector of numbers
- dot product can compute similarity
- via distributional hypothesis

Model
(MF, LSA, IR)

Unsupervised Optimize function

Co-occurrence (LL, sqd error, margin…)
Data
Learn vectors
NLP with DL

Features
Model
(NB, SVM, CRF)

Supervised Optimize function

Training (LL, sqd error, margin…)
Data
Learn feature weights
NLP with DL

Neural Model
Features (NB, SVM, CRF)

Supervised Optimize function

Training (LL, sqd error, margin…)
Data
Learn feature weights
NLP with DL

z1 z2 …

Neural Model
Features (NB, SVM, CRF)

Supervised Optimize function

Training (LL, sqd error, margin…)
Data
Learn feature weights+vectors
NLP with DL

z1 z2 …

Neural Model
Features NN= (NB, SVM, CRF, +++
+ feature discovery)
Supervised Optimize function
Training (LL, sqd error, margin…)
Data
Learn feature weights+vectors
NLP with DL
Assumptions
- doc/query/word is a vector of numbers
z1 z2 … - doc: bag/sequence/tree of words
- feature: neural (weights are shared)
- model: bag/seq of features (non-linear)

Neural Model
Features NN= (NB, SVM, CRF, +++
+ feature discovery)
Supervised Optimize function
Training (LL, sqd error, margin…)
Data
Learn feature weights+vectors
Meta-thoughts
Features
• Learned
• in a task specific end2end way
• not limited by human creativity
Everything is a “Point”
• Word embedding
• Phrase embedding
• Sentence embedding
• Word embedding in context of sentence
• Etc

• Also known as dense/distributed representations

Points are good  reduce sparsity by wt sharing

a single (complex) model can handle all pts
Universal Representations
• Non-linearities
– Allow complex functions

• Put anything computable in the loss function

– Any additional insight about data/external knowledge
Make symbolic operations continuous
• Symbolic  continuous
– Yes/No 
• (number between 0 and 1)
– Good/bad 
• (number between -1 and 1)

– Either remember or forget 

• partially remember
– Select from n things 
• weighted avg over n things
Encoder-Decoder

Symbolic Symbolic
Input z1 Model Neural Model Output
(word) Features (class, sentence..)

Encoder Decoder

Different assumptions on data create different architectures

Building Blocks

+ ; .

Matrix-mult gate non-linearity

x;y
x+y

Can also try

Dimension-wise
Max
(later weighted sum)
Concat vs. Sum
• Concatenating feature vectors: the
"roles" of each vector is retained.

prev current next

word word word

• Different features can have vectors of different dim.

• Fixed number of features in each example

(need to feed into a fixed dim layer).
Concat vs. Sum
• Summing feature vectors: "bag of features"

word word word

• Different feature vectors should have same dim.

• Can encode a bag of arbitrary number of features.

x.y
• degree of closeness
• alignment

• Uses
– question aligns with answer //QA
– sentence aligns with sentence //paraphrase
– word aligns with (~important for) sentence //attention
g(Ax+b)
• 1-layer MLP
• Take x
– project it into a different space //relevant to task
– add some scalar bias (only increases/decreases it)
– convert into a required output

• 2-layer MLP
– Common way to convert input to output
Loss Functions

Cross Entropy
Binary Cross Entropy
Max Margin
Encoder-Decoder
LOSS
P(y) y*

Symbolic Symbolic
Input z1 Model Neural Model Output
(word) Features (class, sentence..)

Encoder Decoder
Common Loss Functions
Common Loss Functions
• Max Margin
Loss = max(0, 1-(score(y*)-score(ybest)))

• Ranking loss (max margin: x ranked over x’)

Loss = max(0, 1-(score(x)-score(x’)))
Regularization
• L1
• L2
• Elastic Net
• DropOut
• Batch Normalization
• Layer Normalization
• Problem-specific regularizations
• Early Stopping
• https://towardsdatascience.com/different-
normalization-layers-in-deep-learning-1a7214ff71d6
Some Practical Advice
Optimization
• Stochastic Gradient Descent
• Mini-Batch Gradient Descent
• AdaGrad
• AdaDelta
• RMSProp
• Adam

Learning rate schedules

https://ruder.io/optimizing-gradient-descent/
Glorot/Xavier Initialization (tanh)
• Initializing W matrix of dimensionality dinxdout

He’s Initialization (tanh)

Batching
• Padding
Vanishing and Exploding Gradients
• Clipping

GNG 1105 Final Exam 2000
No ratings yet
GNG 1105 Final Exam 2000
3 pages
API 47 275 Standing
No ratings yet
API 47 275 Standing
14 pages
Problem Solving, Mathematical Investigation and Modeling Week 8 Logical Reasoning or Elimination
100% (2)
Problem Solving, Mathematical Investigation and Modeling Week 8 Logical Reasoning or Elimination
2 pages
07 Dlintro
No ratings yet
07 Dlintro
39 pages
Short Course On Deep Learning: Welcome!!
No ratings yet
Short Course On Deep Learning: Welcome!!
57 pages
Deep Learning For NLP
No ratings yet
Deep Learning For NLP
78 pages
2009 Tutorial Nips
No ratings yet
2009 Tutorial Nips
113 pages
3. Graph Representation Learning
No ratings yet
3. Graph Representation Learning
32 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
CS4740/5740 Introduction To NLP Fall 2017 Neural Language Models and Classifiers
No ratings yet
CS4740/5740 Introduction To NLP Fall 2017 Neural Language Models and Classifiers
7 pages
Session 2 ANN 2024
No ratings yet
Session 2 ANN 2024
29 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
CS585 Lecture October15th
No ratings yet
CS585 Lecture October15th
162 pages
NLP Short
No ratings yet
NLP Short
5 pages
Lecture 1a - Introduction
No ratings yet
Lecture 1a - Introduction
38 pages
Introduction To Deep Learning: Radu Ionescu, Prof. PHD
No ratings yet
Introduction To Deep Learning: Radu Ionescu, Prof. PHD
90 pages
Lecture 26
No ratings yet
Lecture 26
17 pages
Deep Learning
100% (2)
Deep Learning
49 pages
ML Intro Theory
No ratings yet
ML Intro Theory
10 pages
Computer Vision 11 Transformers
No ratings yet
Computer Vision 11 Transformers
63 pages
midterm_study_guide_csci566
No ratings yet
midterm_study_guide_csci566
20 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
WEEK 01 Merged
No ratings yet
WEEK 01 Merged
606 pages
Lecture Notes - Advanced Language Model - BERT, GPT
No ratings yet
Lecture Notes - Advanced Language Model - BERT, GPT
24 pages
Nn4nlp 02 LM
No ratings yet
Nn4nlp 02 LM
47 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
Deep Learning Lecture 0 Introduction Alexander Tkachenko
No ratings yet
Deep Learning Lecture 0 Introduction Alexander Tkachenko
31 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
01 - Introduction To Deep Learning
No ratings yet
01 - Introduction To Deep Learning
56 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
2.game AI 1
No ratings yet
2.game AI 1
268 pages
L4_CSE256_FA24_WE
No ratings yet
L4_CSE256_FA24_WE
68 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
4 Classification 2
No ratings yet
4 Classification 2
55 pages
Week1 UDL CM20315 01 Intro
No ratings yet
Week1 UDL CM20315 01 Intro
49 pages
Lecture 9
No ratings yet
Lecture 9
97 pages
Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge
100% (1)
Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge
40 pages
DL Full Merged
No ratings yet
DL Full Merged
454 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
110 pages
Lecture2 Slides 1
No ratings yet
Lecture2 Slides 1
28 pages
Cs224n 2025 Lecture03 Neuralnets
No ratings yet
Cs224n 2025 Lecture03 Neuralnets
96 pages
ANN_Unit-2
No ratings yet
ANN_Unit-2
48 pages
8. Deep learning
No ratings yet
8. Deep learning
95 pages
Deep Learning Final Sheet
No ratings yet
Deep Learning Final Sheet
915 pages
mlintro-3
No ratings yet
mlintro-3
28 pages
Lecture13 - ML Linear & Log-Linear Models
No ratings yet
Lecture13 - ML Linear & Log-Linear Models
34 pages
7 NN Apr 28 2021
No ratings yet
7 NN Apr 28 2021
81 pages
Week-12 - Introduction To ML-NN-CNN
No ratings yet
Week-12 - Introduction To ML-NN-CNN
45 pages
12 Advanced Machine Learning Algorithms
No ratings yet
12 Advanced Machine Learning Algorithms
41 pages
BA-LLMS-W3-S2-2024-2025 - Copy
No ratings yet
BA-LLMS-W3-S2-2024-2025 - Copy
64 pages
CM20315_01_Intro
No ratings yet
CM20315_01_Intro
62 pages
CS11-711 Advanced NLP: Retrieval and Retrieval-Augmented Generation
No ratings yet
CS11-711 Advanced NLP: Retrieval and Retrieval-Augmented Generation
37 pages
DL Concepts 1 Overview
No ratings yet
DL Concepts 1 Overview
80 pages
AI Chapter 5
No ratings yet
AI Chapter 5
31 pages
deep learning u1
No ratings yet
deep learning u1
5 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
771 A18 Lec2
No ratings yet
771 A18 Lec2
119 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
01 Introduction
No ratings yet
01 Introduction
50 pages
Java for Black Jack: Learn the Java Programming Language in One Session by Writing and Running a Java-Based Card Game Simulation
From Everand
Java for Black Jack: Learn the Java Programming Language in One Session by Writing and Running a Java-Based Card Game Simulation
U.Q. Magnusson
No ratings yet
Mastering Dynamic Programming in Java
From Everand
Mastering Dynamic Programming in Java
Ed A Norex
No ratings yet
lect33-textcat (1)
No ratings yet
lect33-textcat (1)
70 pages
Syntactic and Dependency Parsing
No ratings yet
Syntactic and Dependency Parsing
159 pages
Primes
No ratings yet
Primes
39 pages
ch07-consistency-replication (1)
No ratings yet
ch07-consistency-replication (1)
30 pages
new trends for authentication
No ratings yet
new trends for authentication
5 pages
Tut4_WordEmb nlp
No ratings yet
Tut4_WordEmb nlp
30 pages
bag_of_words nlp
No ratings yet
bag_of_words nlp
23 pages
slides08-lr-parsing
No ratings yet
slides08-lr-parsing
25 pages
reduction proofs
No ratings yet
reduction proofs
9 pages
2.BasicTextProcessing NEW
No ratings yet
2.BasicTextProcessing NEW
39 pages
2DI90_ch11 (1)
No ratings yet
2DI90_ch11 (1)
54 pages
2DI90_chID190-CH5
No ratings yet
2DI90_chID190-CH5
62 pages
10-estimators-pre-lecture
No ratings yet
10-estimators-pre-lecture
109 pages
2DI90_ch9 (1)
No ratings yet
2DI90_ch9 (1)
83 pages
NLP-LLM
No ratings yet
NLP-LLM
47 pages
Jarrar.LectureNotes.Ch1.Introduction
No ratings yet
Jarrar.LectureNotes.Ch1.Introduction
18 pages
ML4D-L6 nlp2
No ratings yet
ML4D-L6 nlp2
58 pages
CSE538 sp25 (4) Lexical and Vector Semantics 2-25 nlp
No ratings yet
CSE538 sp25 (4) Lexical and Vector Semantics 2-25 nlp
126 pages
3_slides corpus3
No ratings yet
3_slides corpus3
88 pages
13-oo-opolymorphism plc
No ratings yet
13-oo-opolymorphism plc
15 pages
imc_shift-cipher
No ratings yet
imc_shift-cipher
17 pages
13-neuralcrf pos tagging
No ratings yet
13-neuralcrf pos tagging
40 pages
01-bayes-all-handout prob
No ratings yet
01-bayes-all-handout prob
28 pages
61799956 POS tagging
No ratings yet
61799956 POS tagging
63 pages
02 Random Vars All Handout
No ratings yet
02 Random Vars All Handout
23 pages
04-textcat text class
No ratings yet
04-textcat text class
77 pages
01-introduction plc
No ratings yet
01-introduction plc
53 pages
4_slides Regualer expression
No ratings yet
4_slides Regualer expression
75 pages
07-covariance-answers-hidden-lecture
No ratings yet
07-covariance-answers-hidden-lecture
62 pages
Ch. 1 Notes
No ratings yet
Ch. 1 Notes
11 pages
Lug Analysis Calculator - Results
No ratings yet
Lug Analysis Calculator - Results
13 pages
Activity 4: A. Do What Is Asked
No ratings yet
Activity 4: A. Do What Is Asked
9 pages
Regex in C#
No ratings yet
Regex in C#
3 pages
Activity 2-Definition of Terms
No ratings yet
Activity 2-Definition of Terms
11 pages
05 Assignment Sheet
No ratings yet
05 Assignment Sheet
3 pages
Measuring The Damage Resistance of A Fiber-Reinforced Polymer-Matrix Composite To A Concentrated Quasi-Static Indentation Force
No ratings yet
Measuring The Damage Resistance of A Fiber-Reinforced Polymer-Matrix Composite To A Concentrated Quasi-Static Indentation Force
11 pages
Sample Chapter
No ratings yet
Sample Chapter
39 pages
Maths Iia Iib Important Questions
No ratings yet
Maths Iia Iib Important Questions
20 pages
PERANCANGAN WEBGIS - 2113034081 - Siti Rahma Diyanti - Tugas 1
No ratings yet
PERANCANGAN WEBGIS - 2113034081 - Siti Rahma Diyanti - Tugas 1
36 pages
MTH 202
No ratings yet
MTH 202
215 pages
Review Statistics and Probability FT Y10 Science
No ratings yet
Review Statistics and Probability FT Y10 Science
22 pages
Wind Rose
No ratings yet
Wind Rose
46 pages
9 Am-10 Am 10 Am-11 Am 11 Am-12 PM 12 pm-1 PM 1 pm-1:30 PM 1:30 pm-2:30 PM 2:30pm - 3:30 PM 3:30 pm-4:30 PM 4:30 pm-5:30 PM
No ratings yet
9 Am-10 Am 10 Am-11 Am 11 Am-12 PM 12 pm-1 PM 1 pm-1:30 PM 1:30 pm-2:30 PM 2:30pm - 3:30 PM 3:30 pm-4:30 PM 4:30 pm-5:30 PM
12 pages
Namma Kalvi Maths 12 3rd Chapter Full Test (2)
No ratings yet
Namma Kalvi Maths 12 3rd Chapter Full Test (2)
2 pages
04-Global_definition_ cities_settlements_Mwaniki
No ratings yet
04-Global_definition_ cities_settlements_Mwaniki
14 pages
Introduction To Normal Distribution
No ratings yet
Introduction To Normal Distribution
8 pages
TCS - CodeVita - Coding Arena
No ratings yet
TCS - CodeVita - Coding Arena
8 pages
Bahria University Karachi Campus: Management Sciences Department
No ratings yet
Bahria University Karachi Campus: Management Sciences Department
5 pages
Chap 21
No ratings yet
Chap 21
14 pages
Base Isolation
No ratings yet
Base Isolation
2 pages
6CS6.2 Unit 5 Learning
No ratings yet
6CS6.2 Unit 5 Learning
41 pages
Uml
100% (3)
Uml
132 pages
A Framework For Comparing Theories Related To Motivation in Education
No ratings yet
A Framework For Comparing Theories Related To Motivation in Education
30 pages
TG 9780199065264
No ratings yet
TG 9780199065264
176 pages
ClassNOtes-LCM & HCF
No ratings yet
ClassNOtes-LCM & HCF
23 pages
Vidya PYTHON REVISION TEST
No ratings yet
Vidya PYTHON REVISION TEST
18 pages
Apunts Modul 1
No ratings yet
Apunts Modul 1
25 pages