0% found this document useful (0 votes)

7 views39 pages

07 Dlintro

An Intro to Deep Learning for NLP

Uploaded by

maheshsangamreddiias

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views39 pages

07 Dlintro

An Intro to Deep Learning for NLP

Uploaded by

maheshsangamreddiias

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

An Intro to Deep Learning for NLP

Mausam
Disclaimer: this is an outsider’s understanding. Some details may be inaccurate

(several slides by Yonatan Belinkov Yoav Goldberg & Graham Neubig)

Back to the
1980s

[Figure:
Francesconi, 2022]
The Localist vs. Distributed Debate
Distributed vs. Localist
Representations
Localist: “..one computing element for each entity”

Distributed:
• “Each entity is represented by a pattern of activity
distributed over many computing elements”
• “each computing element is involved in representing
many different entitites”
Local Representation

[Thorpe 1989]
Distributed Representation

[Thorpe 1989]
Semi-Distributed Representation

[Thorpe 1989]
Distributed Representations: Pros
and Cons
Distributed
representations:
😀 😀
• Efficient 😀 Localistrepresentations
😀
• Continuous 😀 • Easier to work with(?)
😢
• Degrade gracefully • More interpretable
• Less interpretable

[Pate 2002]
So, who won?

I think we all know…

NLP before DL #1
Assumptions
- doc: bag/sequence/tree of words
- model: bag of features (linear)
- feature: symbolic (diff wt for each)

Features
Model
(NB, SVM, CRF)

Supervised Optimize function

Training (LL, sqd error, margin…)
Data
Learn feature weights
NLP before DL #2
Assumptions
z1 z2 … - doc/query/word is a vector of numbers
- dot product can compute similarity
- via distributional hypothesis

Model
(MF, LSA, IR)

Unsupervised Optimize function

Co-occurrence (LL, sqd error, margin…)
Data
Learn vectors
NLP with DL

Features
Model
(NB, SVM, CRF)

Supervised Optimize function

Training (LL, sqd error, margin…)
Data
Learn feature weights
NLP with DL

Neural Model
Features (NB, SVM, CRF)

Supervised Optimize function

Training (LL, sqd error, margin…)
Data
Learn feature weights
NLP with DL

z1 z2 …

Neural Model
Features (NB, SVM, CRF)

Supervised Optimize function

Training (LL, sqd error, margin…)
Data
Learn feature weights+vectors
NLP with DL

z1 z2 …

Neural Model
Features NN= (NB, SVM, CRF, +++
+ feature discovery)
Supervised Optimize function
Training (LL, sqd error, margin…)
Data
Learn feature weights+vectors
NLP with DL
Assumptions
- doc/query/word is a vector of numbers
z1 z2 … - doc: bag/sequence/tree of words
- feature: neural (weights are shared)
- model: bag/seq of features (non-linear)

Neural Model
Features NN= (NB, SVM, CRF, +++
+ feature discovery)
Supervised Optimize function
Training (LL, sqd error, margin…)
Data
Learn feature weights+vectors
Meta-thoughts
Features
• Learned
• in a task specific end2end way
• not limited by human creativity
Everything is a “Point”
• Word embedding
• Phrase embedding
• Sentence embedding
• Word embedding in context of sentence
• Etc

• Also known as dense/distributed representations

Points are good  reduce sparsity by wt sharing

a single (complex) model can handle all pts
Universal Representations
• Non-linearities
– Allow complex functions

• Put anything computable in the loss function

– Any additional insight about data/external knowledge
Make symbolic operations continuous
• Symbolic  continuous
– Yes/No 
• (number between 0 and 1)
– Good/bad 
• (number between -1 and 1)

– Either remember or forget 

• partially remember
– Select from n things 
• weighted avg over n things
Encoder-Decoder

Symbolic Symbolic
Input z1 Model Neural Model Output
(word) Features (class, sentence..)

Encoder Decoder

Different assumptions on data create different architectures

Building Blocks

+ ; .

Matrix-mult gate non-linearity

x;y
x+y

Can also try

Dimension-wise
Max
(later weighted sum)
Concat vs. Sum
• Concatenating feature vectors: the
"roles" of each vector is retained.

prev current next

word word word

• Different features can have vectors of different dim.

• Fixed number of features in each example

(need to feed into a fixed dim layer).
Concat vs. Sum
• Summing feature vectors: "bag of features"

word word word

• Different feature vectors should have same dim.

• Can encode a bag of arbitrary number of features.

x.y
• degree of closeness
• alignment

• Uses
– question aligns with answer //QA
– sentence aligns with sentence //paraphrase
– word aligns with (~important for) sentence //attention
g(Ax+b)
• 1-layer MLP
• Take x
– project it into a different space //relevant to task
– add some scalar bias (only increases/decreases it)
– convert into a required output

• 2-layer MLP
– Common way to convert input to output
Loss Functions

Cross Entropy
Binary Cross Entropy
Max Margin
Encoder-Decoder
LOSS
P(y) y*

Symbolic Symbolic
Input z1 Model Neural Model Output
(word) Features (class, sentence..)

Encoder Decoder
Common Loss Functions
Common Loss Functions
• Max Margin
Loss = max(0, 1-(score(y*)-score(ybest)))

• Ranking loss (max margin: x ranked over x’)

Loss = max(0, 1-(score(x)-score(x’)))
Regularization
• L1
• L2
• Elastic Net
• DropOut
• Batch Normalization
• Layer Normalization
• Problem-specific regularizations
• Early Stopping
• https://towardsdatascience.com/different-
normalization-layers-in-deep-learning-1a7214ff71d6
Some Practical Advice
Optimization
• Stochastic Gradient Descent
• Mini-Batch Gradient Descent
• AdaGrad
• AdaDelta
• RMSProp
• Adam

Learning rate schedules

https://ruder.io/optimizing-gradient-descent/
Glorot/Xavier Initialization (tanh)
• Initializing W matrix of dimensionality dinxdout

He’s Initialization (relu)

Batching
• Padding
Vanishing and Exploding Gradients
• Clipping

Lecture 02-1
No ratings yet
Lecture 02-1
42 pages
Lec12 Self Supervised Learning
No ratings yet
Lec12 Self Supervised Learning
91 pages
07-Dlintro Deep Learning NLP
No ratings yet
07-Dlintro Deep Learning NLP
31 pages
Deep Learning Final Sheet
No ratings yet
Deep Learning Final Sheet
915 pages
ANN Unit-2
No ratings yet
ANN Unit-2
48 pages
NLP 160709201345
No ratings yet
NLP 160709201345
61 pages
CS11-711 Advanced NLP: Retrieval and Retrieval-Augmented Generation
No ratings yet
CS11-711 Advanced NLP: Retrieval and Retrieval-Augmented Generation
37 pages
CS585 Lecture October15th
No ratings yet
CS585 Lecture October15th
162 pages
Yuandongtian
No ratings yet
Yuandongtian
87 pages
4 Classification 2
No ratings yet
4 Classification 2
55 pages
AI Primer
No ratings yet
AI Primer
12 pages
465-Lecture 1 (Deep Learning)
No ratings yet
465-Lecture 1 (Deep Learning)
47 pages
WEEK 01 Merged
No ratings yet
WEEK 01 Merged
606 pages
L2 - UCLxDeepMind DL2020
No ratings yet
L2 - UCLxDeepMind DL2020
104 pages
Week1 UDL CM20315 01 Intro
No ratings yet
Week1 UDL CM20315 01 Intro
49 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
Midterm Study Guide Csci566
No ratings yet
Midterm Study Guide Csci566
20 pages
Practical and Effective Neural NER
No ratings yet
Practical and Effective Neural NER
31 pages
Week-12 - Introduction To ML-NN-CNN
No ratings yet
Week-12 - Introduction To ML-NN-CNN
45 pages
Deep Learning U1
No ratings yet
Deep Learning U1
5 pages
Unit - V
No ratings yet
Unit - V
44 pages
Lecture13 - ML Linear & Log-Linear Models
No ratings yet
Lecture13 - ML Linear & Log-Linear Models
34 pages
DL Concepts 1 Overview
No ratings yet
DL Concepts 1 Overview
80 pages
Session 2 ANN 2024
No ratings yet
Session 2 ANN 2024
29 pages
Deep Learning For NLP
No ratings yet
Deep Learning For NLP
78 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
7 NN Apr 28 2021
No ratings yet
7 NN Apr 28 2021
81 pages
NLP Short
No ratings yet
NLP Short
5 pages
771 A18 Lec2
No ratings yet
771 A18 Lec2
119 pages
Introduction To Deep Learning: Radu Ionescu, Prof. PHD
No ratings yet
Introduction To Deep Learning: Radu Ionescu, Prof. PHD
90 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
3 - Deep Learning
No ratings yet
3 - Deep Learning
33 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
d2l en Pytorch
No ratings yet
d2l en Pytorch
979 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
110 pages
Lecture2 Slides 1
No ratings yet
Lecture2 Slides 1
28 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Graph Representation Learning
No ratings yet
Graph Representation Learning
32 pages
Deep Learning
100% (2)
Deep Learning
49 pages
KNN Kernels Margin
No ratings yet
KNN Kernels Margin
37 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
2009 Tutorial Nips
No ratings yet
2009 Tutorial Nips
113 pages
Google Gemini
No ratings yet
Google Gemini
56 pages
Neural Networks - Slides - CMU - Aarti Singh & Barnabas Poczos
No ratings yet
Neural Networks - Slides - CMU - Aarti Singh & Barnabas Poczos
36 pages
Nn4nlp 02 LM
No ratings yet
Nn4nlp 02 LM
47 pages
机器学习绘图模板
No ratings yet
机器学习绘图模板
101 pages
Short Course On Deep Learning: Welcome!!
No ratings yet
Short Course On Deep Learning: Welcome!!
57 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
Lecture Notes - Advanced Language Model - BERT, GPT
No ratings yet
Lecture Notes - Advanced Language Model - BERT, GPT
24 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
NLP - Natural Language Processing
No ratings yet
NLP - Natural Language Processing
74 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
CS4740/5740 Introduction To NLP Fall 2017 Neural Language Models and Classifiers
No ratings yet
CS4740/5740 Introduction To NLP Fall 2017 Neural Language Models and Classifiers
7 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
28 pages
Lecture 2a - Word Level Semantics
No ratings yet
Lecture 2a - Word Level Semantics
34 pages
Chapter 6
No ratings yet
Chapter 6
172 pages
Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge
100% (1)
Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge
40 pages
Lecture 22 Energy-Based Models - Hopfield Network
No ratings yet
Lecture 22 Energy-Based Models - Hopfield Network
57 pages
APELID Augmentd WGAN and Parallel Ensemble Learning
No ratings yet
APELID Augmentd WGAN and Parallel Ensemble Learning
17 pages
Evaluation of Classification Algorithms For Intrusion Detection System
No ratings yet
Evaluation of Classification Algorithms For Intrusion Detection System
14 pages
Deep Learning: Technical Introduction: Thomas Epelbaum
No ratings yet
Deep Learning: Technical Introduction: Thomas Epelbaum
106 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
Lesson 1 - History, Definitions and Basic Concepts
No ratings yet
Lesson 1 - History, Definitions and Basic Concepts
6 pages
DL Notes B Div
No ratings yet
DL Notes B Div
13 pages
ANN Backpropagation Algorithm
No ratings yet
ANN Backpropagation Algorithm
4 pages
Activation Funtions
No ratings yet
Activation Funtions
26 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
DLT Unit-1
No ratings yet
DLT Unit-1
66 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
DL Lab Manual
No ratings yet
DL Lab Manual
18 pages
Figures For Chapter 8 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Figures For Chapter 8 Introduction To Data Mining: by Tan, Steinbach, Kumar
41 pages
Lecture 1a - Introduction
No ratings yet
Lecture 1a - Introduction
38 pages
S. Learning - Clase 4
No ratings yet
S. Learning - Clase 4
29 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Clustering
No ratings yet
Clustering
24 pages
Offline Signature Verification
No ratings yet
Offline Signature Verification
13 pages
U2-ML-QB With Answers
No ratings yet
U2-ML-QB With Answers
16 pages
Cluster Analysis - Approach 1
No ratings yet
Cluster Analysis - Approach 1
28 pages
CST414 A
No ratings yet
CST414 A
2 pages
Machine Learning, ML Ass 6
No ratings yet
Machine Learning, ML Ass 6
11 pages
Tabnet: Attentive Interpretable Tabular Learning: Sercan O. Arık Tomas Pfister
No ratings yet
Tabnet: Attentive Interpretable Tabular Learning: Sercan O. Arık Tomas Pfister
12 pages
Le 4
No ratings yet
Le 4
12 pages
ML Draft Syllabus
No ratings yet
ML Draft Syllabus
3 pages
CLARA CLARANS Example
No ratings yet
CLARA CLARANS Example
3 pages
Paquete
No ratings yet
Paquete
4 pages
Assignment 8 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 8 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
3 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet