0% found this document useful (0 votes)
11 views6 pages

Index Llibre 2

The document outlines fundamental algorithms and techniques in Natural Language Processing (NLP), covering topics such as tokenization, language models, classification methods, and neural networks. It includes sections on various algorithms, their evaluations, and applications in machine translation and other NLP tasks. Additionally, it discusses advanced concepts like large language models, masked language models, and model alignment.

Uploaded by

mery.oliveras98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

Index Llibre 2

The document outlines fundamental algorithms and techniques in Natural Language Processing (NLP), covering topics such as tokenization, language models, classification methods, and neural networks. It includes sections on various algorithms, their evaluations, and applications in machine translation and other NLP tasks. Additionally, it discusses advanced concepts like large language models, masked language models, and model alignment.

Uploaded by

mery.oliveras98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Contents

I Fundamental Algorithms for NLP 1


1 Introduction 3
2 Regular Expressions, Tokenization, Edit Distance 4
2.1 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Simple Unix Tools for Word Tokenization . . . . . . . . . . . . . 17
2.5 Word and Subword Tokenization . . . . . . . . . . . . . . . . . . 18
2.6 Word Normalization, Lemmatization and Stemming . . . . . . . . 23
2.7 Sentence Segmentation . . . . . . . . . . . . . . . . . . . . . . . 25
2.8 Minimum Edit Distance . . . . . . . . . . . . . . . . . . . . . . . 25
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 30
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 N-gram Language Models 32
3.1 N-Grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Evaluating Language Models: Training and Test Sets . . . . . . . 38
3.3 Evaluating Language Models: Perplexity . . . . . . . . . . . . . . 39
3.4 Sampling sentences from a language model . . . . . . . . . . . . . 42
3.5 Generalizing vs. overfitting the training set . . . . . . . . . . . . . 43
3.6 Smoothing, Interpolation, and Backoff . . . . . . . . . . . . . . . 45
3.7 Advanced: Perplexity’s Relation to Entropy . . . . . . . . . . . . 49
3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 52
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 Naive Bayes, Text Classification, and Sentiment 56
4.1 Naive Bayes Classifiers . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Training the Naive Bayes Classifier . . . . . . . . . . . . . . . . . 60
4.3 Worked example . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4 Optimizing for Sentiment Analysis . . . . . . . . . . . . . . . . . 62
4.5 Naive Bayes for other text classification tasks . . . . . . . . . . . 64
4.6 Naive Bayes as a Language Model . . . . . . . . . . . . . . . . . 65
4.7 Evaluation: Precision, Recall, F-measure . . . . . . . . . . . . . . 66
4.8 Test sets and Cross-validation . . . . . . . . . . . . . . . . . . . . 69
4.9 Statistical Significance Testing . . . . . . . . . . . . . . . . . . . 70
4.10 Avoiding Harms in Classification . . . . . . . . . . . . . . . . . . 73
4.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 75
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5 Logistic Regression 77
5.1 The sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2 Classification with Logistic Regression . . . . . . . . . . . . . . . 80
5.3 Multinomial logistic regression . . . . . . . . . . . . . . . . . . . 84
5.4 Learning in Logistic Regression . . . . . . . . . . . . . . . . . . . 87
5.5 The cross-entropy loss function . . . . . . . . . . . . . . . . . . . 88
5.6 Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.7 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.8 Learning in Multinomial Logistic Regression . . . . . . . . . . . . 96
5.9 Interpreting models . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.10 Advanced: Deriving the Gradient Equation . . . . . . . . . . . . . 98
5.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 100
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6 Vector Semantics and Embeddings 101
6.1 Lexical Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2 Vector Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3 Words and Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.4 Cosine for measuring similarity . . . . . . . . . . . . . . . . . . . 110
6.5 TF-IDF: Weighing terms in the vector . . . . . . . . . . . . . . . 111
6.6 Pointwise Mutual Information (PMI) . . . . . . . . . . . . . . . . 114
6.7 Applications of the tf-idf or PPMI vector models . . . . . . . . . . 116
6.8 Word2vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.9 Visualizing Embeddings . . . . . . . . . . . . . . . . . . . . . . . 123
6.10 Semantic properties of embeddings . . . . . . . . . . . . . . . . . 124
6.11 Bias and Embeddings . . . . . . . . . . . . . . . . . . . . . . . . 126
6.12 Evaluating Vector Models . . . . . . . . . . . . . . . . . . . . . . 127
6.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 129
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7 Neural Networks 132
7.1 Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.2 The XOR problem . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.3 Feedforward Neural Networks . . . . . . . . . . . . . . . . . . . . 138
7.4 Feedforward networks for NLP: Classification . . . . . . . . . . . 142
7.5 Training Neural Nets . . . . . . . . . . . . . . . . . . . . . . . . 145
7.6 Feedforward Neural Language Modeling . . . . . . . . . . . . . . 152
7.7 Training the neural language model . . . . . . . . . . . . . . . . . 155
7.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 157
8 RNNs and LSTMs 158
8.1 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . 158
8.2 RNNs as Language Models . . . . . . . . . . . . . . . . . . . . . 162
8.3 RNNs for other NLP tasks . . . . . . . . . . . . . . . . . . . . . . 165
8.4 Stacked and Bidirectional RNN architectures . . . . . . . . . . . . 168
8.5 The LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
8.6 Summary: Common RNN NLP Architectures . . . . . . . . . . . 174
8.7 The Encoder-Decoder Model with RNNs . . . . . . . . . . . . . . 174
8.8 Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 182
9 The Transformer 184
9.1 Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
9.2 Transformer Blocks . . . . . . . . . . . . . . . . . . . . . . . . . 190
9.3 Parallelizing computation using a single matrix X . . . . . . . . . 193
9.4 The input: embeddings for token and position . . . . . . . . . . . 196
9.5 The Language Modeling Head . . . . . . . . . . . . . . . . . . . 198
9.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 201
10 Large Language Models 203
10.1 Large Language Models with Transformers . . . . . . . . . . . . . 204
10.2 Sampling for LLM Generation . . . . . . . . . . . . . . . . . . . 207
10.3 Pretraining Large Language Models . . . . . . . . . . . . . . . . 210
10.4 Evaluating Large Language Models . . . . . . . . . . . . . . . . . 214
10.5 Dealing with Scale . . . . . . . . . . . . . . . . . . . . . . . . . . 216
10.6 Potential Harms from Language Models . . . . . . . . . . . . . . 219
10.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 220
11 Masked Language Models 223
11.1 Bidirectional Transformer Encoders . . . . . . . . . . . . . . . . . 223
11.2 Training Bidirectional Encoders . . . . . . . . . . . . . . . . . . . 226
11.3 Contextual Embeddings . . . . . . . . . . . . . . . . . . . . . . . 230
11.4 Fine-Tuning for Classification . . . . . . . . . . . . . . . . . . . . 234
11.5 Fine-Tuning for Sequence Labelling: Named Entity Recognition . 237
11.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 241
12 Model Alignment, Prompting, and In-Context Learning 242
12.1 Prompting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
12.2 Post-training and Model Alignment . . . . . . . . . . . . . . . . . 248
12.3 Model Alignment: Instruction Tuning . . . . . . . . . . . . . . . . 249
12.4 Chain-of-Thought Prompting . . . . . . . . . . . . . . . . . . . . 254
12.5 Automatic Prompt Optimization . . . . . . . . . . . . . . . . . . . 255
12.6 Evaluating Prompted Language Models . . . . . . . . . . . . . . . 258
12.7 Model Alignment with Human Preferences: RLHF and DPO . . . 259
12.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 259
II NLP Applications 261
13 Machine Translation 263
13.1 Language Divergences and Typology . . . . . . . . . . . . . . . . 264
13.2 Machine Translation using Encoder-Decoder . . . . . . . . . . . . 268
13.3 Details of the Encoder-Decoder Model . . . . . . . . . . . . . . . 272
13.4 Decoding in MT: Beam Search . . . . . . . . . . . . . . . . . . . 274
13.5 Translating in low-resource situations . . . . . . . . . . . . . . . . 278
13.6 MT Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
13.7 Bias and Ethical Issues . . . . . . . . . . . . . . . . . . . . . . . 284
13.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 286
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
14 Question Answering, Information Retrieval, and RAG 289
14.1 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . 290
14.2 Information Retrieval with Dense Vectors . . . . . . . . . . . . . . 298
14.3 Answering Questions with RAG . . . . . . . . . . . . . . . . . . 301
14.4 Evaluating Question Answering . . . . . . . . . . . . . . . . . . . 304
14.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 306
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
15 Chatbots & Dialogue Systems 309
15.1 Properties of Human Conversation . . . . . . . . . . . . . . . . . 311
15.2 Frame-Based Dialogue Systems . . . . . . . . . . . . . . . . . . . 314
15.3 Dialogue Acts and Dialogue State . . . . . . . . . . . . . . . . . . 317
15.4 Chatbots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
15.5 Dialogue System Design . . . . . . . . . . . . . . . . . . . . . . . 325
15.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 328
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
16 Automatic Speech Recognition and Text-to-Speech 331
16.1 The Automatic Speech Recognition Task . . . . . . . . . . . . . . 332
16.2 Feature Extraction for ASR: Log Mel Spectrum . . . . . . . . . . 334
16.3 Speech Recognition Architecture . . . . . . . . . . . . . . . . . . 339
16.4 CTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
16.5 ASR Evaluation: Word Error Rate . . . . . . . . . . . . . . . . . 346
16.6 TTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
16.7 Other Speech Tasks . . . . . . . . . . . . . . . . . . . . . . . . . 353
16.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 354
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
III Annotating Linguistic Structure 359
17 Sequence Labeling for Parts of Speech and Named Entities 362
17.1 (Mostly) English Word Classes . . . . . . . . . . . . . . . . . . . 363
17.2 Part-of-Speech Tagging . . . . . . . . . . . . . . . . . . . . . . . 365
17.3 Named Entities and Named Entity Tagging . . . . . . . . . . . . . 367
17.4 HMM Part-of-Speech Tagging . . . . . . . . . . . . . . . . . . . 369
17.5 Conditional Random Fields (CRFs) . . . . . . . . . . . . . . . . . 376
17.6 Evaluation of Named Entity Recognition . . . . . . . . . . . . . . 381
17.7 Further Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
17.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 384
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
18 Context-Free Grammars and Constituency Parsing 387
18.1 Constituency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
18.2 Context-Free Grammars . . . . . . . . . . . . . . . . . . . . . . . 388
18.3 Treebanks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
18.4 Grammar Equivalence and Normal Form . . . . . . . . . . . . . . 394
18.5 Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
18.6 CKY Parsing: A Dynamic Programming Approach . . . . . . . . 397
18.7 Span-Based Neural Constituency Parsing . . . . . . . . . . . . . . 403
18.8 Evaluating Parsers . . . . . . . . . . . . . . . . . . . . . . . . . . 405
18.9 Heads and Head-Finding . . . . . . . . . . . . . . . . . . . . . . 406
18.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 408
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
19 Dependency Parsing 411
19.1 Dependency Relations . . . . . . . . . . . . . . . . . . . . . . . . 412
19.2 Transition-Based Dependency Parsing . . . . . . . . . . . . . . . 416
19.3 Graph-Based Dependency Parsing . . . . . . . . . . . . . . . . . 425
19.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
19.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 433
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
20 Information Extraction: Relations, Events, and Time 435
20.1 Relation Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 436
20.2 Relation Extraction Algorithms . . . . . . . . . . . . . . . . . . . 438
20.3 Extracting Events . . . . . . . . . . . . . . . . . . . . . . . . . . 446
20.4 Representing Time . . . . . . . . . . . . . . . . . . . . . . . . . . 447
20.5 Representing Aspect . . . . . . . . . . . . . . . . . . . . . . . . . 450
20.6 Temporally Annotated Datasets: TimeBank . . . . . . . . . . . . . 451
20.7 Automatic Temporal Analysis . . . . . . . . . . . . . . . . . . . . 452
20.8 Template Filling . . . . . . . . . . . . . . . . . . . . . . . . . . . 456
20.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 459
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
21 Semantic Role Labeling 461
21.1 Semantic Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
21.2 Diathesis Alternations . . . . . . . . . . . . . . . . . . . . . . . . 462
21.3 Semantic Roles: Problems with Thematic Roles . . . . . . . . . . 464
21.4 The Proposition Bank . . . . . . . . . . . . . . . . . . . . . . . . 465
21.5 FrameNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
21.6 Semantic Role Labeling . . . . . . . . . . . . . . . . . . . . . . . 468
21.7 Selectional Restrictions . . . . . . . . . . . . . . . . . . . . . . . 472
21.8 Primitive Decomposition of Predicates . . . . . . . . . . . . . . . 476
21.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 478
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
22 Lexicons for Sentiment, Affect, and Connotation 481
22.1 Defining Emotion . . . . . . . . . . . . . . . . . . . . . . . . . . 482
22.2 Available Sentiment and Affect Lexicons . . . . . . . . . . . . . . 484
22.3 Creating Affect Lexicons by Human Labeling . . . . . . . . . . . 485
22.4 Semi-supervised Induction of Affect Lexicons . . . . . . . . . . . 487
22.5 Supervised Learning of Word Sentiment . . . . . . . . . . . . . . 490
22.6 Using Lexicons for Sentiment Recognition . . . . . . . . . . . . . 495
22.7 Using Lexicons for Affect Recognition . . . . . . . . . . . . . . . 496
22.8 Lexicon-based methods for Entity-Centric Affect . . . . . . . . . . 497
22.9 Connotation Frames . . . . . . . . . . . . . . . . . . . . . . . . . 497
22.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 500
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
23 Coreference Resolution and Entity Linking 501
23.1 Coreference Phenomena: Linguistic Background . . . . . . . . . . 504
23.2 Coreference Tasks and Datasets . . . . . . . . . . . . . . . . . . . 509
23.3 Mention Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 510
23.4 Architectures for Coreference Algorithms . . . . . . . . . . . . . 513
23.5 Classifiers using hand-built features . . . . . . . . . . . . . . . . . 515
23.6 A neural mention-ranking algorithm . . . . . . . . . . . . . . . . 517
23.7 Entity Linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
23.8 Evaluation of Coreference Resolution . . . . . . . . . . . . . . . . 524
23.9 Winograd Schema problems . . . . . . . . . . . . . . . . . . . . . 525
23.10 Gender Bias in Coreference . . . . . . . . . . . . . . . . . . . . . 526
23.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 528
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
24 Discourse Coherence 531
24.1 Coherence Relations . . . . . . . . . . . . . . . . . . . . . . . . . 533
24.2 Discourse Structure Parsing . . . . . . . . . . . . . . . . . . . . . 536
24.3 Centering and Entity-Based Coherence . . . . . . . . . . . . . . . 540
24.4 Representation learning models for local coherence . . . . . . . . 544
24.5 Global Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . 546
24.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . 550
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
Bibliography 553
Subject Index 585

You might also like