3-Lecture Three - (Chapter Two-N-gram Language Models)
3-Lecture Three - (Chapter Two-N-gram Language Models)
Models
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of CSE
Dr. Mesfin Abebe Haile (2022)
Outline
Introduction
The role of language models
Simple N-gram models
Estimating parameters and smoothing
Evaluating language models
01/02/23 2
Introduction
01/02/23 3
Introduction …
Ŵ = max(∀ i : p(Ŵi | Y) )
01/02/23 4
Introduction …
01/02/23 5
Introduction …
Probabilities of the
observed elements P(s)
01/02/23 6
Role of Language Models
01/02/23 7
Role of Language Models…
01/02/23 8
Role of Language Models…
Completion Prediction
A language model also supports predicting the completion of a
sentence.
Please turn off your cell _____
Your program does not ______
Predictive text input systems can guess what you are typing and
give choices on how to complete it.
01/02/23 9
Simple N-Gram Models
Two benefits of n-gram models (and algorithms that use them) are
simplicity and scalability – with larger n, a model can store more
context with a well-understood space–time tradeoff, enabling small
experiments to scale up efficiently.
01/02/23 10
Simple N-Gram Models…
01/02/23 11
Simple N-Gram Models…
01/02/23 13
Estimating Probabilities
01/02/23 14
Example
01/02/23 15
Example
01/02/23 16
Example
01/02/23 18
Example
01/02/23 19
Simple N-Gram Models …
01/02/23 20
Simple N-Gram Models …
01/02/23 21
Estimating Parameters and
Smoothing
Estimating Parameters
Parameter estimation is fundamental to many statistical approaches
to NLP.
Because of the high-dimensional nature of natural language, it is
often easy to generate an extremely large number of features.
The challenge of parameter estimation is to find a combination of
the typically noisy, redundant features that accurately predicts the
target output variable and avoids over fitting.
List of potential parameter estimators:
Maximum Entropy (ME) estimation with L 2 regularization, the
Averaged Perceptron (AP), Boosting, ME estimation with L 1
regularization using a novel optimization algorithm, and BLasso,
which is a version of Boosting with Lasso (L 1) regularization, etc
01/02/23 22
Estimating Parameters and
Smoothing…
Estimating Parameters…
Intuitively, this can be achieved either
By selecting a small number of highly-effective features and ignoring
the others, or
By averaging over a large number of weakly informative features.
The first intuition motivates feature selection methods such as
Boosting and Blasso which usually work best when many features
are completely irrelevant.
L1 or Lasso regularization of linear models embeds feature
selection into regularization so that both an assessment of the
reliability of a feature and the decision about whether to remove it
are done in the same framework, and has generated a large amount
of interest in the NLP community recently.
01/02/23 23
Estimating Parameters and
Smoothing…
Estimating Parameters…
If on the other hand most features are noisy but at least weakly
correlated with the target, it may be reasonable to attempt to
reduce noise by averaging over all of the features.
01/02/23 27
Thank You !!!
01/02/23 28