0% found this document useful (0 votes)

36 views48 pages

UNIT 5a

The document discusses best practices for developing natural language processing applications including how to design and train NLP models, techniques for batching tokens like padding and sorting, methods for handling unknown words like character-based and subword tokenization, ways to avoid overfitting such as regularization and early stopping, and approaches for dealing with imbalanced datasets.

Uploaded by

20bd1a6622

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views48 pages

UNIT 5a

Uploaded by

20bd1a6622

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Natural language Processing

Prepared By
J.Kamal Vijetha|Anuradha Surabhi|Asha Jyothi
Unit 5a
Best practices in developing NLP applications
Objectives
• Design a NLP Model
• Training NLP model
• Making neural network inference more efficient by sorting, padding, and masking
tokens
• Applying character-based and BPE tokenization for splitting text into tokens
• Avoiding overfitting
• Dealing with imbalanced datasets by using upsampling, downsampling, and loss
weighting
• Optimizing hyperparameters
5.0 Building NLP applications
typical structure of a modern NLP application

Figure : training Chat Bot, spell checker

Train the NLP Model
NLP Models
• Deep neural network models such as RNNS, CNNS, and the Transformer, and
modern NLP frameworks(Hugging Face )
• In training and inference the model
For example,
⮚ How do you train and make predictions efficiently?
⮚ How do you avoid having your model over fit?
⮚ How do you optimize hyper parameters?
⮚ These factors could make a huge impact on the final performance and
generalizability of your model.
NLP Models
• Logistic regression – 2 class binary sentimental analysis
• RNN- sequential labeling
• RNN with feedback – language models , Generation and detection
• Seq2 Seq- Encoder Decoder-Machine Tanslator,chatbot
• CNN – text classification- Document classification
• Seq2seq with attentions- machine Translation
• Seq2seq with self attentions- Transformers- MT,spell checking
• BERT- sentence classification- sentimental analysis
• NLI – sentence pair classification
How to build robust & accurate NLP application
Techniques used to build the model

• Batching instances- Padding, Sorting , Masking

• Handling Unknown words Tokenization for neural models - Character models, Subword
models

• Avoiding overfitting -Regularization, Early stopping, Cross-validation

• Dealing with imbalanced datasets -Using appropriate evaluation metrics, Upsampling and
downsampling, Weighting losses

• Hyperparameter tuning- epochs, parameters, Grid search vs. random search

5.1 Batching
• Batching is a machine learning technique where instances are grouped together to
form batches and sent to the processor (CPU or, more often GPU).
• Batching is necessary when training large neural networks—it is critical for efficient
and stable training.
• Batching is used for making model more efficient computation.
• Batching instances methods
⮚ Padding,
⮚ Sorting ,
⮚ Masking
Batching- Padding
• Training large neural networks requires a number of linear algebra operations such
as matrix addition and multiplication,
• it requires specialized hardware such as GPUs, processors designed to execute such
operations in a highly parallelized manner
• Data as input is sent to the GPU as tensors (high-dimensional arrays) and do
mathematical operations and the result is sent back as another tensor.
• In NLP, handling the sequences of text in different lengths.
• Training the sequences in row should be same length and batches have to be
rectangular, we need to do padding, (i.e., append special tokens,<PAD> ,
• The need to pad short sequences is to make as long as the longest sequence in the
same batch. This is illustrated in figure
Batching- Padding
Padding and batching. Black squares are tokens, gray ones are EOS tokens, and
white ones are padding.
Batching- Sorting
• Padding and batching of embedded sequences create rectangular, three-
dimensional tensors.
Batching- Sorting
Sorting instances before batching (right) reduces the total number of tensors.
Batching- Masking
Masking is an operation where
you ignore some part of the
network that corresponds to
padding.

This becomes relevant especially

when you are dealing with a
sequential-labeling or a language-
generation model.
5.2 challenges in Tokenization for neural models
• The basic linguistic units are words, characters, and n-grams and how to
compute their embeddings.

• Here we will focus on how to analyze texts and obtain these units —a
process called tokenization.

• Neural network models pose a set of unique challenges on how to deal

with tokens when they are Unknown words.
⮚ Character models
⮚ Sub-word models
5.2 challenges in Tokenization
• NLP model deals with set of tokens as vocabulary.

• Many neural NLP models operate within a fixed, finite set of tokens.

• A huge problem when we are building a machine translation system or a

conversational engine , MT system or a chatbot if it produces “I don’t know” every
time it sees new words!

• In general, the OOV problem- Unknown words is more serious for language-
generation systems (including machine translation and conversational AI) compared
to NLP systems for prediction (sentiment analysis, POS tagging, and so on)
How to solve OOV tokens problem in NLP
• big problem is handling OOV tokens in NLP have a lot of research work
to deal with them.
• commonly used two techniques for building robust neural NLP models.
character-based and subword-based models,
Handling OOV-Character Models
• The effective solution for OOV problem is to treat characters as tokens.
• break the input text into individual characters, even including punctuation and
whitespace, and treat them as if they are regular tokens.
• The rest of the application is unchanged—“word” embeddings are assigned to
characters, which are further processed by the model. If the model produces text, it
does so character-by-character as in character-level model or language generator.
• Instead of generating text word-by-word, the RNN produces text one character at a
time, as illustrated in figure below. Thanks to this strategy, the model was able to
produce words that look like English but actually aren’t.
• If the model operated on words, it produces only known words (or UNKs when
unsure), and this wouldn’t have been possible.
How to solve OOV tokens problem in NLP
• A language-generation model that generates text character-by character (including
whitespace)
How to solve OOV tokens problem in NLP
• The word-based approach is efficient but not great with unknown words.
• The character-based approach is great with unknown words but is inefficient.
• Is there something in between?
Tokenization - unknown words
• Tokenization that is both efficient and robust to unknown words?
• Subword models are a recent invention that addresses this problem for neural
networks.
• In subword models, the input text is segmented into a unit called subwords, which
simply means something smaller than words.
• There is no formal linguistic definition as to what subwords actually are, but they
roughly correspond to part of words that appear frequently.
• For example, one way to segment “dishwasher” is “dish + wash + er,” although
some other segmentation is possible.
• Some varieties of algorithms (such as WordPiece and SentencePiece ) tokenize
input into subwords, but by far the most widely used is byte-pair encoding (BPE).
Byte-pair encoding (BPE)
• Byte-pair encoding (BPE) is a compression algorithm, used as a tokenization for
neural models, particularly in machine translation.
• The BPE is to keep frequent words (such as “the” and “you”),n-grams ,
unsegmented words (such as “-able” and “anti-”), while breaking up rarer words
(such as “dishwasher”) into subwords (“dish + wash + er”).
• Keeping frequent words and n-grams together helps the model process those
tokens efficiently, whereas breaking up rare words ensures there are no UNK
tokens, because everything can be ultimately broken up into individual characters.
• By flexibly choosing where to tokenize based on the frequency, BPE achieves the
best of two worlds—being efficient while addressing the unknown word problem.
N-grams
n-gram is a contiguous sequence of one or more occurrences of linguistic units, such as characters and words.

Uni gram
bi gram
Tri gram

I have a book <start> I have a book <end>

Unigrams of =4 Unigrams of =6

bi grams=3 bi grams=5
Tri gram=2 Tri gram=4

in search and information retrieval, n-grams often mean character n-grams used for indexing documents.
BPE
Figure BPE learns subword units by iteratively merging consecutive units that cooccur
frequently.
5.3 Avoiding overfitting
• Overfitting is one of the most common and important issues when building any
machine learning applications.
• An ML model is said to overfit when it fits the given data so well that it loses its
generalization ability to unseen data.
• the model may capture the training data very well and show good performance on
it, but it may not be able to capture its inherent patterns well and shows poor
performance on test data that the model has never seen before.
5.3 Avoiding overfitting-
• To avoid overfitting a number of algorithms and techniques
⮚ Regularization- L2 regularization (weight decay)
⮚ Dropouts
⮚ and early stopping.
⮚ Cross Validation
⮚ Call Backs
These are popular in any ML applications (not just NLP) and worth getting under your
belt.
Regularization
• Regularization in ML refers to techniques that encourage the simplicity and
the generalization of the model.

Figure : Classification boundaries with increasing complexity

Regularization
• L2 regularization, also called weight decay, is one of the most common
regularization.
• L2 regularization adds a penalty for the complexity of a model measured by
how large its parameters are.
• To represent a complex classification boundary, an ML model needs to
adjust a large number of parameters (the “magic constants”) to extreme
values, measured by the L2 loss, which captures how far away they are from
zero. Such models incur a larger L2 penalty, which is why L2 encourages
simpler models.
DROPOUT
• Dropout is another popular regularization technique commonly used with
neural networks.
• Dropout works by randomly “dropping” neurons during training, where a
“neuron” is basically a dimension of an intermediate layer .
• “dropping” means to mask it with zeros.
Early stopping
• Early stopping is a technique where you stop training your model when the
model performance stops improving, usually measured by the validation set
loss.
• Eg EnglishSpanish machine translation model the validation loss curve
flattens out around the eighth epoch and starts to creep up after that, which
is a sign of overfitting.
• Early stopping would detect this, stop the training, and use the result from
the best epoch when the loss is lowest.
• early stopping has a “patience” parameter, which is the number of non
improving epochs for early stopping to kick in. When patience is 10 epochs,
for example, the training pipeline will wait 10 epochs after the loss stops
improving to stop the training.
Early stopping

The validation loss curve flattens out around the eighth epoch and creeps back up.
Cross-validation
• Cross-validation is not exactly a regularization method,
• if training data is small, the model is validated and tested on just a few
dozen instances, which can make the estimated metrics unstable.

k-fold cross validation, the dataset is split into k equally sized folds and one is used for validation.
5.4 How to deal Imbalanced Datasets
• To encounter the class imbalance problem in building NLP and ML models
• Eg: The goal of a classification task is to assign one of the classes (e.g., spam or
nonspam).
• In document classification, some topics (such as politics or sports) are usually
more popular than other topics.
• when some classes have way more instances than others is called imbalanced .
• techniques used to balance the dataset.
⮚ Using appropriate evaluation metrics- F1-measure instead
of accuracy
⮚ Upsampling and downsampling
⮚ Weighting losses
How to deal Imbalanced Datasets
a) Calculating the F1 score

b) Upsampling and down sampling

How to deal Imbalanced Datasets
c)Weighting losses
• weighting when computing the loss, instead of making modifiy training data.
• the loss penalizes more when the ground truth belongs to the minority class
• In binary cross-entropy loss,
• When the prediction is perfectly correct (probability = 1), there’s no
penalty, whereas as the predition gets worse (probability < 1), the loss goes up

binary cross entropy loss Weighted binary cross entropy loss

5.5 Hyperparameter tuning-optimization
• Hyperparameters are parameters about the model and the training
algorithm. This term is used in contrast with parameters, which are numbers
that are used by the model to make predictions from the input. “magic
constants”
• Hyperparameters
⮚ how many hidden units (dimensions) to use for representing words
⮚ number of RNN layers to use
⮚ the number of attention heads
⮚ the learning rate
⮚ number of epochs (iterations through the training dataset)
Hyperparameter tuning
• set of hyperparameters that look reasonable and measure the model’s
performance on a validation set
• One issue with this manual tuning approach is that it is slow and arbitrary
• two more-organized ways of tuning hyperparameters—grid search and
random search.

• Hyperparameter tuning with Optuna plug in

Byte-pair encoding (BPE)
• Byte-pair encoding (BPE) is a compression algorithm, used as a tokenization
for neural models, particularly in machine translation.
• The BPE is to keep frequent words (such as “the” and “you”) and n-grams
unsegmented words (such as “-able” and “anti-”), while breaking up rarer
words (such as “dishwasher”) into subwords (“dish + wash + er”).
• Keeping frequent words and n-grams together helps the model process
those tokens efficiently, whereas breaking up rare words ensures there are
no UNK tokens, because everything can be ultimately broken up into
individual characters.
• By flexibly choosing where to tokenize based on the frequency, BPE
achieves the best of two worlds—being efficient while addressing the
unknown word problem.
Summary
Figure BPE learns subword units by iteratively merging consecutive units that cooccur
frequently.
Summary
▪ Instances are sorted, padded, and batched together for more efficient computation.

▪ Subword tokenization algorithms such as BPE split words into units smaller than
words to mitigate the out-of-vocabulary problem in neural network models.

▪ Regularization (such as L2 and dropout) is a technique to encourage model

simplicity and generalizability in machine learning.

▪ data upsampling, downsampling, or loss weights for addressing the data imbalance
issue.

▪ Hyperparameters are parameters about the model or the training algorithm. They
can be optimized using manual, grid, or random search.
Calculate the Accuracy Precision and recall
2-Class problem
Instances Actual Labels Predicted Labels
Review 1 Positive Positive
Review 2 Negative Negative
Review 3 Positive Positive
Review 4 Positive Negative
Review 5 Negative Positive
Review 6 Positive Negative
Review 7 Negative Negative
Review 8 Negative Positive
Review 9 Positive Positive
Review 10 Negative Negative
Calculate the Accuracy Precision and recall

Actual Labels Predicted Labels

Positive Negative Positive
Positive
TP= 3 FP=2
Negative Negative
FN=3 TN=2 Positive Positive
Positive Negative
Negative Positive
Accuracy= TP+TN / (TP+TN+FP+FN)
Positive Negative
=0.5 Positive Negative

Recall= TP/TP+FN =0.6 Negative Positive

Positive Positive
Precision = TP/TP+FP =0.6 Negative Negative
Instances Actual Labels Predicted Labels
3-Class problem
Review 1 Positive Positive
Review 2 Negative Negative
Review 3 Positive Positive positive neutral negative
Review 4 Neutral Neutral 3 1 2
Review 5 Neutral Positive 2 2 1
Review 6 Positive Negative 1 0 3
Review 7 Negative Negative
Review 8 Negative Positive
Review 9 Positive Positive
Review 10 Negative Negative
Review 11 Positive Neutral
Review 12 Neutral Neutral
Review 13 Negative Positive
Review 14 Positive Neutral
Review 15 Negative Neutral
positive others

TP= 3 FP=3
positive neutral negative Fo=3 To=6
3 1 2
2 2 1
neutral others
1 0 3
TN= 1 FN=5
Fo=2 To=7
Accuracy= TP+TN / (TP+TN+FP+FN)

Recall= TP/TP+FN
negative others
Precision = TP/TP+FP
TNg= 2 FNg=4
Macro average Accuracy= A1*A2*A3/3 Fo=4 To=5

Macro average Recall= R1R2R3/3

Macro average Precision=P1P2P3/3

Building LLMs - Stanford
No ratings yet
Building LLMs - Stanford
78 pages
Week 02 Tokenizers
No ratings yet
Week 02 Tokenizers
36 pages
Tokenization
No ratings yet
Tokenization
34 pages
Cs224n 2023 Lecture05 RNNLM
No ratings yet
Cs224n 2023 Lecture05 RNNLM
68 pages
Large Language Models (LLM)
100% (1)
Large Language Models (LLM)
139 pages
CS585 Lecture October15th
No ratings yet
CS585 Lecture October15th
162 pages
NLP Basics
No ratings yet
NLP Basics
119 pages
Token Izer
No ratings yet
Token Izer
17 pages
Unit - 2
No ratings yet
Unit - 2
55 pages
Normalization and Pre-Tokenization - Hugging Face NLP Course
No ratings yet
Normalization and Pre-Tokenization - Hugging Face NLP Course
11 pages
XCS224N Module4 Slides
No ratings yet
XCS224N Module4 Slides
91 pages
Lesson 1 Intro
No ratings yet
Lesson 1 Intro
51 pages
Revisiting Simple Neural Probabilistic Language Models (2021)
No ratings yet
Revisiting Simple Neural Probabilistic Language Models (2021)
8 pages
Anlp 02 Wordrep Textclass
No ratings yet
Anlp 02 Wordrep Textclass
59 pages
Christopher Manning Lecture 5: Language Models and Recurrent Neural Networks (Oh, and Finish Neural Dependency Parsing J)
No ratings yet
Christopher Manning Lecture 5: Language Models and Recurrent Neural Networks (Oh, and Finish Neural Dependency Parsing J)
66 pages
Anlp 02 Wordrep Textclass
No ratings yet
Anlp 02 Wordrep Textclass
58 pages
NLP Text Classification Week4
No ratings yet
NLP Text Classification Week4
26 pages
Recurrent Neural Networks: Amir H. Payberah
No ratings yet
Recurrent Neural Networks: Amir H. Payberah
142 pages
2 Generative Models
No ratings yet
2 Generative Models
60 pages
Introduction To NLP
No ratings yet
Introduction To NLP
68 pages
Assignment No 1 - Genai Fa24-Msds-0007
No ratings yet
Assignment No 1 - Genai Fa24-Msds-0007
10 pages
NLP - Unit-5 - Final New
No ratings yet
NLP - Unit-5 - Final New
22 pages
Unit - 2
No ratings yet
Unit - 2
10 pages
LLM Book 43-102
No ratings yet
LLM Book 43-102
60 pages
Harvard CS197 Lecture 4 Notes
No ratings yet
Harvard CS197 Lecture 4 Notes
15 pages
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
No ratings yet
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
83 pages
NLP Detailed QA
No ratings yet
NLP Detailed QA
3 pages
NLP Pipeline: Chapter-2
No ratings yet
NLP Pipeline: Chapter-2
171 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
NLP DeepNLP
No ratings yet
NLP DeepNLP
61 pages
Pipeline
No ratings yet
Pipeline
9 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
Trend
No ratings yet
Trend
47 pages
Perspectives in Business Ethics
No ratings yet
Perspectives in Business Ethics
113 pages
Cs224n Text Generation
No ratings yet
Cs224n Text Generation
73 pages
DL Practical 09text Pre Processing
No ratings yet
DL Practical 09text Pre Processing
6 pages
NLP For ML - Spam Classifier
No ratings yet
NLP For ML - Spam Classifier
14 pages
Natural Language Processing
No ratings yet
Natural Language Processing
28 pages
NLP - 1 - 250119 - 222702
No ratings yet
NLP - 1 - 250119 - 222702
71 pages
Chowdhery Et Al. - 2022 - PaLM Scaling Language Modeling With Pathways
No ratings yet
Chowdhery Et Al. - 2022 - PaLM Scaling Language Modeling With Pathways
83 pages
Evaluating Language Models
No ratings yet
Evaluating Language Models
21 pages
CSC 528 Lecture 3
No ratings yet
CSC 528 Lecture 3
42 pages
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
14 LookingForward
No ratings yet
14 LookingForward
48 pages
NLP Crash Course Comprehensive
No ratings yet
NLP Crash Course Comprehensive
2 pages
Unit 5 NLP
No ratings yet
Unit 5 NLP
24 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
AI4youngster - 6 - Topic NLP
No ratings yet
AI4youngster - 6 - Topic NLP
66 pages
NLP 9
No ratings yet
NLP 9
44 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
Chapter 2. Transformers: A Note For Early Release Readers
No ratings yet
Chapter 2. Transformers: A Note For Early Release Readers
85 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
ChatGPT KZ Feb2023 PDF
No ratings yet
ChatGPT KZ Feb2023 PDF
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
NLP - Shortnotes Unit 1 & 2
No ratings yet
NLP - Shortnotes Unit 1 & 2
16 pages
Cable Laying Specification
No ratings yet
Cable Laying Specification
16 pages
Compressors, Gas Dynamics and Gas Turbines MCQ
100% (1)
Compressors, Gas Dynamics and Gas Turbines MCQ
11 pages
Comparison TIA Portal Vs Studio 5000 1
100% (1)
Comparison TIA Portal Vs Studio 5000 1
53 pages
System Administration Books - Red Hat Enterprise Linux 5
No ratings yet
System Administration Books - Red Hat Enterprise Linux 5
136 pages
By Microsoft Website: DURATION: 6 Weeks Amount Paid: Yes: Introduction To Data Science
100% (1)
By Microsoft Website: DURATION: 6 Weeks Amount Paid: Yes: Introduction To Data Science
21 pages
Assignment 3 (Compiled)
No ratings yet
Assignment 3 (Compiled)
10 pages
Analysis of Temperature and Pressure Changes in Liquefied Natural
No ratings yet
Analysis of Temperature and Pressure Changes in Liquefied Natural
9 pages
15 16 H2 Quantum Physics II Summary
No ratings yet
15 16 H2 Quantum Physics II Summary
1 page
Sma 306 - Complex Analysis 1 - April 2017
No ratings yet
Sma 306 - Complex Analysis 1 - April 2017
4 pages
02 Chem30 Exemplars 2009 10
No ratings yet
02 Chem30 Exemplars 2009 10
94 pages
SYNOPSIS Hydraulic Press
No ratings yet
SYNOPSIS Hydraulic Press
10 pages
Subtraction Strategies That Lead To Regrouping
100% (1)
Subtraction Strategies That Lead To Regrouping
6 pages
WH Questions
100% (1)
WH Questions
13 pages
Failures Related To Heat Treating Operations PDF
No ratings yet
Failures Related To Heat Treating Operations PDF
32 pages
IEC Certification Kit Release Notes
No ratings yet
IEC Certification Kit Release Notes
27 pages
SAEP-348 - Chemical Cleaning, Disinfection, Post Treatment and Storage of Reverse Osmosis Membranes
No ratings yet
SAEP-348 - Chemical Cleaning, Disinfection, Post Treatment and Storage of Reverse Osmosis Membranes
34 pages
TLE - Dressmaking-9-Quarter-3 - Mod3-Drafting Basic Pattern - v3-1
No ratings yet
TLE - Dressmaking-9-Quarter-3 - Mod3-Drafting Basic Pattern - v3-1
20 pages
Visio-PMPP-DIA-001 - Rev0 - Tank Project Delivery Process - Final
No ratings yet
Visio-PMPP-DIA-001 - Rev0 - Tank Project Delivery Process - Final
1 page
Picrosiriusred Protocol
No ratings yet
Picrosiriusred Protocol
8 pages
1995 Gom Amberjack Field Case History
No ratings yet
1995 Gom Amberjack Field Case History
10 pages
IPXP One Data Sheet
No ratings yet
IPXP One Data Sheet
8 pages
Distance Calculator
No ratings yet
Distance Calculator
201 pages
SSMP Vespa Service Manual
No ratings yet
SSMP Vespa Service Manual
25 pages
14 Loci and Transformations
No ratings yet
14 Loci and Transformations
83 pages
Face Recognisation System
No ratings yet
Face Recognisation System
25 pages
IE210 Int. To Systems and Mathematical Modeling For Ind. Eng
No ratings yet
IE210 Int. To Systems and Mathematical Modeling For Ind. Eng
15 pages
Design The Midship Section and Calculate Von-Misses Stress.: Pathak Pharindra
No ratings yet
Design The Midship Section and Calculate Von-Misses Stress.: Pathak Pharindra
31 pages
L14: Optimal Linear Filtering - Wiener Filtering: Lennart Svensson
No ratings yet
L14: Optimal Linear Filtering - Wiener Filtering: Lennart Svensson
12 pages
Cbjescpl 02
No ratings yet
Cbjescpl 02
10 pages
Reed Switch Oil Level Sensor
No ratings yet
Reed Switch Oil Level Sensor
2 pages
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
KenLM: Efficient Language Modeling in Practice
From Everand
KenLM: Efficient Language Modeling in Practice
William Smith
No ratings yet
Applied Natural Language Processing with AllenNLP: Definitive Reference for Developers and Engineers
From Everand
Applied Natural Language Processing with AllenNLP: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

UNIT 5a

Uploaded by

UNIT 5a

Uploaded by

Natural language Processing

Figure : training Chat Bot, spell checker

• Batching instances- Padding, Sorting , Masking

• Avoiding overfitting -Regularization, Early stopping, Cross-validation

• Hyperparameter tuning- epochs, parameters, Grid search vs. random search

This becomes relevant especially

• Neural network models pose a set of unique challenges on how to deal

• A huge problem when we are building a machine translation system or a

I have a book <start> I have a book <end>

Figure : Classification boundaries with increasing complexity

b) Upsampling and down sampling

binary cross entropy loss Weighted binary cross entropy loss

• Hyperparameter tuning with Optuna plug in

▪ Regularization (such as L2 and dropout) is a technique to encourage model

Actual Labels Predicted Labels

Recall= TP/TP+FN =0.6 Negative Positive

Macro average Recall= R1*R2*R3/3

Macro average Precision=P1*P2*P3/3

You might also like

Macro average Recall= R1R2R3/3

Macro average Precision=P1P2P3/3