0% found this document useful (0 votes)

104 views35 pages

cs224n Lecture Notes

Uploaded by

Bonnie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views35 pages

cs224n Lecture Notes

Uploaded by

Bonnie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Natural Language Processing

with Deep Learning

CS224N/Ling284

Diyi Yang / Tatsunori Hashimoto

Lecture 1: Introduction and Word Vectors
Lecture Plan
Lecture 1: Introduction and Word Vectors
1. The course (10 mins)
2. Human language and word meaning (15 mins)
3. Word2vec introduction (15 mins)
4. Word2vec objective function gradients (25 mins)
5. Optimization basics (5 mins)
6. Looking at word vectors (10 mins or less)

Key learning today: The (astounding!) result that word meaning can be represented rather
well by a (high-dimensional) vector of real numbers

2
Course logistics in brief
• Instructor: Diyi Yang, Tatsunori Hashimoto
• Head TA: Nelson Liu
• Course Manager: John Cho
• TAs: Many wonderful people! See website
• Time: Tu/Th 4:30–5:50 Pacific time, Nvidia Aud. (à video)
• Email list: [email protected]
• We’ve put a lot of other important information on the class webpage. Please read it!
• http://cs224n.stanford.edu/
a.k.a., http://www.stanford.edu/class/cs224n/
• TAs, syllabus, help sessions/office hours, Ed (for all course questions/discussion)
• Office hours start Wednesday!
• Python/numpy and then PyTorch tutorials: First two Fridays. First is 4:30-5:20, Skilling Auditorium.
• Slide PDFs uploaded before each lecture
3
4
What do we hope to teach? (A.k.a. “learning goals”)
1. The foundations of the effective modern methods for deep learning applied to NLP
• Basics first, then key methods used in NLP in 2023: Word vectors, feed-forward
networks, recurrent networks, attention, encoder-decoder models, transformers,
pretraining, post-training (RLHF, SFT), efficient adaptation, benchmarking and
evaluation, human centered NLP, etc.

2. A big picture understanding of human languages and the difficulties in understanding

and producing them via computers

3. An understanding of and ability to build systems (in PyTorch) for some of the major
problems in NLP:
• Word meaning, dependency parsing, machine translation, question answering

5
Course work and grading policy
• 5 x 1-week Assignments: 6% + 4 x 12%: 54%
• HW1 is released today! Due next Tuesday! At 4:30 p.m.
• Submitted to Gradescope in Canvas (i.e., using @stanford.edu email for your Gradescope account)
• Final Default or Custom Course Project (1–3 people): 43%
• Project proposal: 5%, milestone: 5%, poster or web summary: 3%, report: 30%
• Participation: 3%
• Guest lecture reactions, Ed, course evals, karma – see website!
• Late day policy
• 6 free late days; afterwards, 1% off course grade per day late
• Assignments not accepted more than 3 days late per assignment unless given permission in advance

6
Course work and grading policy
• Collaboration policy:
• Please read the website and the Honor Code! Understand allowed collaboration and how to
document it: Don’t take code off the web; acknowledge working with other students; write your own
assignment solutions
• AI tools policy
• Must independently submit their solutions to CS224N homework
• Collaboration with AI tools is allowed; however, the direct solicitation is strictly prohibited
• Employing AI tools to substantially complete assignments will be considered a violation of the Honor
Code (see Generative AI Policy Guidance here for more details)

7
High-Level Plan for Assignments (to be completed individually!)
• Hw1 is hopefully an easy on ramp – a Jupyter/IPython Notebook
• Hw2 is pure Python (numpy) but expects you to do (multivariate) calculus, so you really
understand the basics
• Hw3 introduces PyTorch, building a feed-forward network for dependency parsing
• Hw4 and Hw5 use PyTorch on a GPU (Google Cloud)
• Libraries like PyTorch, Tensorflow, and Jax are now the standard tools of DL
• For Final Project, more details presented later, but you either:
• Do the default project, which is a question answering system
• Open-ended but an easier start; a good choice for many
• Propose a custom final project, which we approve
• You will receive feedback from a mentor (TA/prof/postdoc/PhD)
• Can work in teams of 1–3; can use any language/packages
8
Lecture Plan
1. The course (10 mins)
2. Human language and word meaning (15 mins)
3. Word2vec introduction (15 mins)
4. Word2vec objective function gradients (25 mins)
5. Optimization basics (5 mins)
6. Looking at word vectors (10 mins or less)

9
Trained on text data, neural machine translation is quite good!

https://kiswahili.tuko.co.ke/
Free-text question answering: Next gen search
when did Kendrick lamar’s
first album come out?
July 2, 2011

E.g., YONO (Lee et al. 2021, https://arxiv.org/pdf/2112.07381.pdf)

uses a T5-Large model fine-tuned for QA 3 times to run entire QA pipeline

12
GPT-3: A first step on the path to foundation models
The SEC said, “Musk, your tweets are a S: I broke the window.
blight. Q: What did I break?
S: I gracefully saved the day.
They really could cost you your job,
Q: What did I gracefully save?
if you don't stop all this tweeting at night.” S: I gave John flowers.
Then Musk cried, “Why? Q: Who did I give flowers to?
The tweets I wrote are not mean, S: I gave her a rose and a guitar.
I don't use all-caps Q: Who did I give a rose and a guitar to?
and I'm sure that my tweets are clean.” How many users have signed up since the start of 2020?
“But your tweets can move markets SELECT count(id) FROM users
WHERE created_at > ‘2020-01-01’
and that's why we're sore.
What is the average number of influencers each user is
You may be a genius and a billionaire, subscribed to?
but it doesn't give you the right to SELECT avg(count) FROM ( SELECT user_id, count(*)
be a bore!” FROM subscribers GROUP BY user_id )
AS avg_subscriptions_per_user
a train going over the Golden Gate bridge

a train going over the Golden

Gate bridge with the bay in the
background

a train going over the

Golden Gate bridge
detailed pencil drawing

cars and a train on the

Pictures by
Golden Gate bridge
OpenAI’s DALL-E 2 detailed pencil drawing
ChatGPT, GPT-4, and more
How do we represent the meaning of a word?

Definition: meaning (Webster dictionary)

• the idea that is represented by a word, phrase, etc.
• the idea that a person wants to express by using words, signs, etc.
• the idea that is expressed in a work of writing, art, etc.

Commonest linguistic way of thinking of meaning:

signifier (symbol) ⟺ signified (idea or thing)
= denotational semantics

tree ⟺ {🌳, 🌲, 🌴, …}
16
How do we have usable meaning in a computer?
Previously commonest NLP solution: Use, e.g., WordNet, a thesaurus containing lists of
synonym sets and hypernyms (“is a” relationships)
e.g., synonym sets containing “good”: e.g., hypernyms of “panda”:
from nltk.corpus import wordnet as wn from nltk.corpus import wordnet as wn
poses = { 'n':'noun', 'v':'verb', 's':'adj (s)', 'a':'adj', 'r':'adv'}
for synset in wn.synsets("good"): panda = wn.synset("panda.n.01")
print("{}: {}".format(poses[synset.pos()], hyper = lambda s: s.hypernyms()
", ".join([l.name() for l in synset.lemmas()]))) list(panda.closure(hyper))

noun: good
[Synset('procyonid.n.01'),
noun: good, goodness
Synset('carnivore.n.01'),
noun: good, goodness
noun: commodity, trade_good, good Synset('placental.n.01'),
Synset('mammal.n.01'),
adj: good
Synset('vertebrate.n.01'),
adj (sat): full, good
Synset('chordate.n.01'),
adj: good
Synset('animal.n.01'),
adj (sat): estimable, good, honorable, respectable
adj (sat): beneficial, good Synset('organism.n.01'),
Synset('living_thing.n.01'),
adj (sat): good
Synset('whole.n.02'),
adj (sat): good, just, upright
Synset('object.n.01'),
…
Synset('physical_entity.n.01'),
adverb: well, good
adverb: thoroughly, soundly, good Synset('entity.n.01')]

17
Problems with resources like WordNet
• A useful resource but missing nuance:
• e.g., “proficient” is listed as a synonym for “good”
This is only correct in some contexts
• Also, WordNet list offensive synonyms in some synonym sets without any
coverage of the connotations or appropriateness of words
• Missing new meanings of words:
• e.g., wicked, badass, nifty, wizard, genius, ninja, bombest
• Impossible to keep up-to-date!
• Subjective
• Requires human labor to create and adapt
• Can’t be used to accurately compute word similarity (see following slides)

18
Representing words as discrete symbols
In traditional NLP, we regard words as discrete symbols:
hotel, conference, motel – a localist representation

Means one 1, the rest 0s

Such symbols for words can be represented by one-hot vectors:

motel = [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]
hotel = [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]

Vector dimension = number of words in vocabulary (e.g., 500,000+)

19
Sec. 9.2.2

Problem with words as discrete symbols

Example: in web search, if a user searches for “Seattle motel”, we would like to match
documents containing “Seattle hotel”

But:
motel = [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]
hotel = [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
These two vectors are orthogonal
There is no natural notion of similarity for one-hot vectors!

Solution:
• Could try to rely on WordNet’s list of synonyms to get similarity?
• But it is well-known to fail badly: incompleteness, etc.
• Instead: learn to encode similarity in the vectors themselves
20
Representing words by their context
• Distributional semantics: A word’s meaning is given
by the words that frequently appear close-by
• “You shall know a word by the company it keeps” (J. R. Firth 1957: 11)
• One of the most successful ideas of modern statistical NLP!
• When a word w appears in a text, its context is the set of words that appear nearby
(within a fixed-size window).
• We use the many contexts of w to build up a representation of w

…government debt problems turning into banking crises as happened in 2009…

…saying that Europe needs unified banking regulation to replace the hodgepodge…
…India has just given its banking system a shot in the arm…

21 These context words will represent banking

Word vectors
We will build a dense vector for each word, chosen so that it is similar to vectors of words
that appear in similar contexts, measuring similarity as the vector dot (scalar) product

0.286 0.413
0.792 0.582
−0.177 −0.007
banking = −0.107 monetary = 0.247
0.109 0.216
−0.542 −0.718
0.349 0.147
0.271 0.051

Note: word vectors are also called (word) embeddings or (neural) word representations
They are a distributed representation
22
Word meaning as a neural word vector – visualization

0.286
0.792
−0.177
−0.107
expect = 0.109
−0.542
0.349
0.271
0.487

23
3. Word2vec: Overview
Word2vec is a framework for learning word vectors
(Mikolov et al. 2013)

Idea:
• We have a large corpus (“body”) of text: a long list of words
• Every word in a fixed vocabulary is represented by a vector
• Go through each position t in the text, which has a center
word c and context (“outside”) words o
• Use the similarity of the word vectors for c and o to calculate
the probability of o given c (or vice versa)
Skip-gram model
• Keep adjusting the word vectors to maximize this probability (Mikolov et al. 2013)

24
Word2Vec Overview
Example windows and process for computing 𝑃 𝑤!"# | 𝑤!

𝑃 𝑤!%$ | 𝑤! 𝑃 𝑤!"$ | 𝑤!
𝑃 𝑤!%# | 𝑤! 𝑃 𝑤!"# | 𝑤!

… problems turning into banking crises as …

outside context words center word outside context words

in window of size 2 at position t in window of size 2

25
Word2Vec Overview
Example windows and process for computing 𝑃 𝑤!"# | 𝑤!

𝑃 𝑤!%$ | 𝑤! 𝑃 𝑤!"$ | 𝑤!
𝑃 𝑤!%# | 𝑤! 𝑃 𝑤!"# | 𝑤!

… problems turning into banking crises as …

outside context words center word outside context words

in window of size 2 at position t in window of size 2

26
Word2vec: objective function
For each position 𝑡 = 1, … , 𝑇, predict context words within a window of fixed size m,
given center word 𝑤! . Data likelihood:
&
Likelihood = 𝐿 𝜃 = - - 𝑃 𝑤!"# | 𝑤! ; 𝜃
𝜃 is all variables !$% '()#)(
to be optimized #*+

sometimes called a cost or loss function

The objective function 𝐽 𝜃 is the (average) negative log likelihood:

&
1 1
𝐽 𝜃 = − log 𝐿(𝜃) = − 6 6 log 𝑃 𝑤!"# | 𝑤! ; 𝜃
𝑇 𝑇
!$% '()#)(
#*+
Minimizing objective function ⟺ Maximizing predictive accuracy
27
Word2vec: objective function
• We want to minimize the objective function:
'
1
𝐽 𝜃 =− + + log 𝑃 𝑤!"* | 𝑤! ; 𝜃
𝑇
!&# %()*)(
*+,

• Question: How to calculate 𝑃 𝑤!"# | 𝑤! ; 𝜃 ?

• Answer: We will use two vectors per word w:
• 𝑣- when w is a center word
• 𝑢- when w is a context word
• Then for a center word c and a context word o:

exp(𝑢-& 𝑣. )
𝑃 𝑜𝑐 = &𝑣 )
∑/∈1 exp(𝑢/ .
28
Word2vec: prediction function
② Exponentiation makes anything positive
① Dot product compares similarity of o and c.
𝑢! 𝑣 = 𝑢. 𝑣 = ∑%"#$ 𝑢" 𝑣"
exp(𝑢-& 𝑣. ) Larger dot product = larger probability
𝑃 𝑜𝑐 = &𝑣 )
∑/∈1 exp(𝑢/ .
③ Normalize over entire vocabulary
to give probability distribution

• This is an example of the softmax function ℝ2 → (0,1)2 Open

region
exp(𝑥3 )
softmax 𝑥3 = 2 = 𝑝3
∑#$% exp(𝑥# )
• The softmax function maps arbitrary values 𝑥3 to a probability distribution 𝑝3
• “max” because amplifies probability of largest 𝑥.
But sort of a weird name
• “soft” because still assigns some probability to smaller 𝑥. because it returns a distribution!
• Frequently used in Deep Learning

30
To train the model: Optimize value of parameters to minimize loss
To train a model, we gradually adjust parameters to minimize a loss

• Recall: 𝜃 represents all the

model parameters, in one
long vector
• In our case, with
d-dimensional vectors and
V-many words, we have à
• Remember: every word has
two vectors

• We optimize these parameters by walking down the gradient (see right figure)
• We compute all vector gradients!
31
Interactive Session!

• 𝐿 𝜃 = ∏&!$% ∏'()#)( 𝑃 𝑤!"# | 𝑤! ; 𝜃

#*+
9:;(=!" ># )
• For a center word c and a context word o: 𝑃 𝑜 𝑐 = ∑$∈& 9:;(=$"> )
#

32
4.

36
37
38
39

MD5 Mental Ability Test Answer Sheet
67% (18)
MD5 Mental Ability Test Answer Sheet
3 pages
Understanding Natural Language Understanding
No ratings yet
Understanding Natural Language Understanding
514 pages
Raymond S. T. Lee - Natural Language Processing. A Textbook With Python Implementation-Springer (2024)
No ratings yet
Raymond S. T. Lee - Natural Language Processing. A Textbook With Python Implementation-Springer (2024)
454 pages
SBM Checklist
100% (4)
SBM Checklist
10 pages
NLP Semester 7
No ratings yet
NLP Semester 7
1,072 pages
Crypt of Cthulhu 023 Cryptic.1984 Wolfhound
100% (4)
Crypt of Cthulhu 023 Cryptic.1984 Wolfhound
58 pages
tiaSYSUP1500 - 07 - PLC - Tags - en - 28 31 01 2020
No ratings yet
tiaSYSUP1500 - 07 - PLC - Tags - en - 28 31 01 2020
25 pages
Price vs. Innodata (Case)
No ratings yet
Price vs. Innodata (Case)
14 pages
NLP Handwritten Notes
No ratings yet
NLP Handwritten Notes
26 pages
Tutorial 3 Bail
No ratings yet
Tutorial 3 Bail
9 pages
Birth and Death
No ratings yet
Birth and Death
36 pages
Full Stack Java Notes
No ratings yet
Full Stack Java Notes
3 pages
Speech and Language Processing - J&M
No ratings yet
Speech and Language Processing - J&M
599 pages
Ed 3 Book
No ratings yet
Ed 3 Book
577 pages
Slides Lec1-3
No ratings yet
Slides Lec1-3
225 pages
cs224n 2025 Lecture01 Wordvecs1
No ratings yet
cs224n 2025 Lecture01 Wordvecs1
36 pages
2 - 6N302 Natural Language Processing
No ratings yet
2 - 6N302 Natural Language Processing
6 pages
Natural Language Processing-Course Handout September 2022
No ratings yet
Natural Language Processing-Course Handout September 2022
8 pages
Doing Narrative Analysis From A Narrativ
No ratings yet
Doing Narrative Analysis From A Narrativ
23 pages
cs224n spr2024 Lecture01 Wordvecs1
No ratings yet
cs224n spr2024 Lecture01 Wordvecs1
40 pages
Intro DL 10 NLP
No ratings yet
Intro DL 10 NLP
99 pages
Lecture 10
No ratings yet
Lecture 10
86 pages
Cs224n 2024 Lecture02 Wordvecs2
No ratings yet
Cs224n 2024 Lecture02 Wordvecs2
45 pages
Cs224n 2025 Lecture03 Neuralnets
No ratings yet
Cs224n 2025 Lecture03 Neuralnets
96 pages
1 Intro
No ratings yet
1 Intro
72 pages
Natural Language Processing Dossier 20231110 141736 0000
No ratings yet
Natural Language Processing Dossier 20231110 141736 0000
114 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
Module 5
No ratings yet
Module 5
76 pages
Market Makers and Institutional Brokers - Switzerland
No ratings yet
Market Makers and Institutional Brokers - Switzerland
1 page
Session 15-2 Future NLP & Deep Learning
No ratings yet
Session 15-2 Future NLP & Deep Learning
81 pages
Week 2 and 3
No ratings yet
Week 2 and 3
76 pages
Lecture 4 - Language Representation
No ratings yet
Lecture 4 - Language Representation
86 pages
Christopher Manning Lecture 1: Introduction and Word Vectors
No ratings yet
Christopher Manning Lecture 1: Introduction and Word Vectors
42 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
Large Language Models Ryan Cotterell, Mrinmaya Sachan, Florian Tramèr, Ce ZhangLecture 1
No ratings yet
Large Language Models Ryan Cotterell, Mrinmaya Sachan, Florian Tramèr, Ce ZhangLecture 1
37 pages
Natural Language Processing
No ratings yet
Natural Language Processing
77 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
Party Rock Anthem Bass in BB
No ratings yet
Party Rock Anthem Bass in BB
1 page
Rockwell HR-300 400
No ratings yet
Rockwell HR-300 400
6 pages
NLP 160709201345
No ratings yet
NLP 160709201345
61 pages
cs224n 2025 Lecture02 Wordvecs2
No ratings yet
cs224n 2025 Lecture02 Wordvecs2
46 pages
Language Models: A Guide For The Perplexed
No ratings yet
Language Models: A Guide For The Perplexed
35 pages
tdt4310 2024 Lect1 Full
No ratings yet
tdt4310 2024 Lect1 Full
42 pages
NLP Materia
No ratings yet
NLP Materia
29 pages
Gradivo ChatGPT in Umetna Inteligenca V Praksi
No ratings yet
Gradivo ChatGPT in Umetna Inteligenca V Praksi
38 pages
Semantic Networks
100% (1)
Semantic Networks
68 pages
NLP File
No ratings yet
NLP File
21 pages
NLP 1
No ratings yet
NLP 1
15 pages
cs224n Winter2023 Lecture1 Notes Draft
No ratings yet
cs224n Winter2023 Lecture1 Notes Draft
13 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
NLP Introduction
No ratings yet
NLP Introduction
35 pages
2020 NLPDeepLearning
No ratings yet
2020 NLPDeepLearning
72 pages
Participation Game (DSI 27 Apr) - A Post-Turing Frontier For Generative AI Systems
No ratings yet
Participation Game (DSI 27 Apr) - A Post-Turing Frontier For Generative AI Systems
39 pages
Madhav Institute of Technology & Science, Gwalior
No ratings yet
Madhav Institute of Technology & Science, Gwalior
13 pages
What Is Natural Language Processing (NLP)
No ratings yet
What Is Natural Language Processing (NLP)
15 pages
Against AI Understanding and Sentience-Large Language Models, Meaning, and The-Durt, Christoph, Froese, Tom, Fuchs, Thomas 2023 LLMs
No ratings yet
Against AI Understanding and Sentience-Large Language Models, Meaning, and The-Durt, Christoph, Froese, Tom, Fuchs, Thomas 2023 LLMs
15 pages
NLP Short Que Ans
No ratings yet
NLP Short Que Ans
21 pages
MOTM
No ratings yet
MOTM
3 pages
Land Bank v. Cacayuran
No ratings yet
Land Bank v. Cacayuran
9 pages
Complete NLP Mastery Study Plan
No ratings yet
Complete NLP Mastery Study Plan
18 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
34 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
36 pages
NLP Tutorial1
No ratings yet
NLP Tutorial1
7 pages
Ai NLP
No ratings yet
Ai NLP
9 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
Unit 5 DL
No ratings yet
Unit 5 DL
11 pages
HR Email Contact
No ratings yet
HR Email Contact
3 pages
Unit 1 and 2
No ratings yet
Unit 1 and 2
5 pages
Entrepreneurship
No ratings yet
Entrepreneurship
3 pages
NLP Lab Manual-1
No ratings yet
NLP Lab Manual-1
18 pages
Syllabus 2
No ratings yet
Syllabus 2
6 pages
Philosophy of Data Science 2 NLP Syllabus
No ratings yet
Philosophy of Data Science 2 NLP Syllabus
6 pages
Regulatory Alert Unbranded Enforcement - Digitas Health - May 2010
No ratings yet
Regulatory Alert Unbranded Enforcement - Digitas Health - May 2010
2 pages
Chapter Four The Foreign Exchange Market
No ratings yet
Chapter Four The Foreign Exchange Market
22 pages
考研英语语法10天速成复习
No ratings yet
考研英语语法10天速成复习
4 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
Internationale Bewerbende Merkblatt Auswahlkommission English
No ratings yet
Internationale Bewerbende Merkblatt Auswahlkommission English
4 pages
A Latent Variable Model Approach To PMI-based Word Embeddings
No ratings yet
A Latent Variable Model Approach To PMI-based Word Embeddings
16 pages
Module 4
No ratings yet
Module 4
7 pages
12-093 - Nguyen Thi Kim Hoang - MKT 601 - Final Assignment
No ratings yet
12-093 - Nguyen Thi Kim Hoang - MKT 601 - Final Assignment
28 pages
Full Science 7 Notes
No ratings yet
Full Science 7 Notes
22 pages
الإمبريالية الغربية تهدد المسلمين
No ratings yet
الإمبريالية الغربية تهدد المسلمين
44 pages
1伦博教育Adam and the Bear 课件
No ratings yet
1伦博教育Adam and the Bear 课件
23 pages
CCS369
No ratings yet
CCS369
2 pages
General Knowledge
No ratings yet
General Knowledge
8 pages
CNS - Unit 1
No ratings yet
CNS - Unit 1
13 pages
Roma
No ratings yet
Roma
2 pages
Mathematical Modeling of Inland Vessel Maneuverability Considering Rudder Hydrodynamics Jialun Liu Instant Download
No ratings yet
Mathematical Modeling of Inland Vessel Maneuverability Considering Rudder Hydrodynamics Jialun Liu Instant Download
55 pages
B Cisco Nexus 9000 NX Os Quality of Service Configuration Guide 93x - Chapter - 0111
No ratings yet
B Cisco Nexus 9000 NX Os Quality of Service Configuration Guide 93x - Chapter - 0111
18 pages
5 - MC01 - Exterior Walkaround & Normal Checklist - REV 05.1
No ratings yet
5 - MC01 - Exterior Walkaround & Normal Checklist - REV 05.1
2 pages
Un System Chart
No ratings yet
Un System Chart
1 page

cs224n Lecture Notes

Uploaded by

cs224n Lecture Notes

Uploaded by

Natural Language Processing

with Deep Learning

Diyi Yang / Tatsunori Hashimoto

2. A big picture understanding of human languages and the difficulties in understanding

E.g., YONO (Lee et al. 2021, https://arxiv.org/pdf/2112.07381.pdf)

a train going over the Golden

a train going over the

cars and a train on the

Definition: meaning (Webster dictionary)

Commonest linguistic way of thinking of meaning:

Means one 1, the rest 0s

Such symbols for words can be represented by one-hot vectors:

Vector dimension = number of words in vocabulary (e.g., 500,000+)

Problem with words as discrete symbols

…government debt problems turning into banking crises as happened in 2009…

21 These context words will represent banking

… problems turning into banking crises as …

outside context words center word outside context words

… problems turning into banking crises as …

outside context words center word outside context words

sometimes called a cost or loss function

The objective function 𝐽 𝜃 is the (average) negative log likelihood:

• Question: How to calculate 𝑃 𝑤!"# | 𝑤! ; 𝜃 ?

• This is an example of the softmax function ℝ2 → (0,1)2 Open

• Recall: 𝜃 represents all the

• 𝐿 𝜃 = ∏&!$% ∏'()#)( 𝑃 𝑤!"# | 𝑤! ; 𝜃

You might also like