0% found this document useful (0 votes)

28 views50 pages

Language Models

Uploaded by

Aathmika Vijay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views50 pages

Language Models

Uploaded by

Aathmika Vijay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

CS447: Natural Language Processing

http://courses.engr.illinois.edu/cs447

Lecture 3:
Language models

Julia Hockenmaier
[email protected]
3324 Siebel Center
Last lecture’s key concepts
Morphology (word structure): stems, affixes
Derivational vs. inflectional morphology
Compounding
Stem changes
Morphological analysis and generation

Finite-state automata
Finite-state transducers
Composing finite-state transducers

CS447: Natural Language Processing (J. Hockenmaier) 2

Finite-state transducers
-FSTs define a relation between two regular
languages.
-Each state transition maps (transduces) a character
from the input language to a character (or a
sequence of characters) in the output language
x:y

-By using the empty character (ε), characters can

be deleted (x:ε) or inserted(ε:y)
x:ε ε:y

-FSTs can be composed (cascaded), allowing us to

define intermediate representations.
CS447: Natural Language Processing (J. Hockenmaier) 3
Today’s lecture
How can we distinguish word salad, spelling errors
and grammatical sentences?

Language models define probability distributions

over the strings in a language.
N-gram models are the simplest and most common
kind of language model.

We’ll look at how they’re defined, how to estimate

(learn) them, and what their shortcomings are.

We’ll also review some very basic probability theory.

CS447: Natural Language Processing (J. Hockenmaier) 4

Why do we need language models?
Many NLP tasks return output in natural language:
- Machine translation
- Speech recognition
- Natural language generation
- Spell-checking

Language models define probability distributions

over (natural language) strings or sentences.

We can use them to score/rank possible sentences:

If PLM(A) > PLM(B), choose sentence A over B

CS447: Natural Language Processing (J. Hockenmaier) 5

Reminder:
Basic Probability
Theory

CS447: Natural Language Processing (J. Hockenmaier) 6

Sampling with replacement
Pick a random shape, then put it back in the bag.

P( ) = 2/15 P( ) = 1/15 P( or ) = 2/15

P(blue) = 5/15 P(red) = 5/15 P( |red) = 3/5
P(blue | ) = 2/5 P( ) = 5/15

CS447: Natural Language Processing (J. Hockenmaier) 7

Sampling with replacement
Pick a random shape, then put it back in the bag.
What sequence of shapes will you draw?
P( )
= 1/15 × 1/15 × 1/15 × 2/15
= 2/50625
P( )
= 3/15 × 2/15 × 2/15 × 3/15
= 36/50625
P( ) = 2/15 P( ) = 1/15 P( or ) = 2/15
P(blue) = 5/15 P(red) = 5/15 P( |red) = 3/5
P(blue | ) = 2/5 P( ) = 5/15

CS447: Natural Language Processing (J. Hockenmaier) 8

Sampling with replacement
Alice was beginning to get very tired of
sitting by her sister on the bank, and of
having nothing to do: once or twice she
had peeped into the book her sister was
reading, but it had no pictures or
conversations in it, 'and what is the use
of a book,' thought Alice 'without
pictures or conversation?'

P(of) = 3/66 P(her) = 2/66

P(Alice) = 2/66 P(sister) = 2/66
P(was) = 2/66 P(,) = 4/66
P(to) = 2/66 P(') = 4/66

CS447: Natural Language Processing (J. Hockenmaier) 9

Sampling with replacement
beginning by, very Alice but was and?
reading no tired of to into sitting
sister the, bank, and thought of without
her nothing: having conversations Alice
once do or on she it get the book her had
peeped was conversation it pictures or
sister in, 'what is the use had twice of
a book''pictures or' to

P(of) = 3/66 P(her) = 2/66

P(Alice) = 2/66 P(sister) = 2/66
P(was) = 2/66 P(,) = 4/66
P(to) = 2/66 P(') = 4/66

In this model, P(English sentence) = P(word salad)

CS447: Natural Language Processing (J. Hockenmaier) 10

Probability theory: terminology
Trial:
Picking a shape, predicting a word

Sample space Ω:
The set of all possible outcomes
(all shapes; all words in Alice in Wonderland)

Event ω ⊆ Ω:
An actual outcome (a subset of Ω)
(predicting ‘the’, picking a triangle)

CS447: Natural Language Processing (J. Hockenmaier) 11

The probability of events
Kolmogorov axioms:
1) Each event has a probability between 0 and 1.
2) The null event has probability 0.
The probability that any event happens is 1.
3) The probability of all disjoint events sums to 1.

0 ⇥ P( )⇥1
P (⇤) = 0 and P ( ) = 1
P ( i ) = 1 if ⇥j = i : i ⌅ j = ⇤
i and i i =

CS447: Natural Language Processing (J. Hockenmaier) 12

Discrete probability distributions:
single trials
Bernoulli distribution (two possible outcomes)
The probability of success (=head,yes)
The probability of head is p.
The probability of tail is 1−p.

Categorical distribution (N possible outcomes)

The probability of category/outcome ci is pi
(0≤ pi ≤1 ∑i pi = 1)

CS447: Natural Language Processing (J. Hockenmaier) 13

Joint and Conditional Probability
The conditional probability of X given Y, P(X | Y),
is defined in terms of the probability of Y, P( Y ),
and the joint probability of X and Y, P(X,Y):

P (X, Y )
P (X|Y ) =
P (Y )

P(blue | ) = 2/5

CS447: Natural Language Processing (J. Hockenmaier) 14

Conditioning on the previous word
Alice was beginning to get very tired of
sitting by her sister on the bank, and of
having nothing to do: once or twice she
had peeped into the book her sister was
reading, but it had no pictures or
conversations in it, 'and what is the use
of a book,' thought Alice 'without
pictures or conversation?'

P(wi+1 = of | wi = tired) = 1 P(wi+1 = bank | wi = the) = 1/3

CS447: Natural Language Processing (J. Hockenmaier) 15

Conditioning on the previous word
English Word Salad
beginning by, very Alice but was and?
Alice was beginning to get very reading no tired of to into sitting
tired of sitting by her sister on sister the, bank, and thought of without
the bank, and of having nothing to her nothing: having conversations Alice
do: once or twice she had peeped once do or on she it get the book her had
into the book her sister was peeped was conversation it pictures or
reading, but it had no pictures or sister in, 'what is the use had twice of
conversations in it, 'and what is a book''pictures or' to
the use of a book,' thought Alice
'without pictures or conversation?'

Now, P(English) ⪢ P(word salad)

P(wi+1 = of | wi = tired) = 1 P(wi+1 = bank | wi = the) = 1/3

CS447: Natural Language Processing (J. Hockenmaier) 16

The chain rule
The joint probability P(X,Y) can also be expressed in
terms of the conditional probability P(X | Y)
P (X, Y ) = P (X|Y )P (Y )

This leads to the so-called chain rule:

P (X1 , X2 , . . . , Xn ) = P (X1 )P (X2 |X1 )P (X3 |X2 , X1 )....P (Xn |X1 , ...Xn 1)
n
= P (X1 ) P (Xi |X1 . . . Xi 1)
i=2

CS447: Natural Language Processing (J. Hockenmaier) 17

Independence
Two random variables X and Y are independent if
P (X, Y ) = P (X)P (Y )

If X and Y are independent, then P(X | Y) = P(X):

P (X, Y )
P (X|Y ) =
P (Y )
P (X)P (Y )
= (X , Y independent)
P (Y )
= P (X)

CS447: Natural Language Processing (J. Hockenmaier) 18

Probability models
Building a probability model consists of two steps:
1. Defining the model
2. Estimating the model’s parameters
(= training/learning )

Models (almost) always make

independence assumptions.
That is, even though X and Y are not actually independent,
our model may treat them as independent.

This reduces the number of model parameters that

we need to estimate (e.g. from n2 to 2n)

CS447: Natural Language Processing (J. Hockenmaier) 19

Language modeling
with n-grams

CS447: Natural Language Processing (J. Hockenmaier) 20

Language modeling with N-grams
A language model over a vocabulary V assigns
probabilities to strings drawn from V*.

Recall the chain rule:

P (w1 ...wi ) = P (w1 )P (w2 |w1 )P (w3 |w1 w2 )...P (wi |w1 ...wi 1)

An n-gram language model assumes each word depends

only on the last n−1 words:
Pngram (w1 ...wi ) := P (w1 )P (w2 |w1 )...P ( wi | wi n 1 ...wi 1 )
⇤⇥ ⌅ ⇤ ⇥ ⌅
nth word prev. n 1 words

CS447: Natural Language Processing (J. Hockenmaier) 21

N-gram models assume each word (event) depends

only on the previous n−1 words (events).
Such independence assumptions are called
Markov assumptions (of order n−1).
P(wi |w1 ...wi 1 ) :⇡ P(wi |wi n 1 ...wi 1 )

CS447: Natural Language Processing (J. Hockenmaier) 22

Estimating N-gram models
1. Bracket each sentence by special start and end symbols:
<s> Alice was beginning to get very tired… </s>
(We only assign probabilities to strings <s>...</s>)

2. Count the frequency of each n-gram….

C(<s> Alice) = 1, C(Alice was) = 1,….

3. .... and normalize these frequencies to get the probability:

C(wn 1 wn )
P (wn |wn 1) =
C(wn 1 )
This is called a relative frequency estimate of P(wn | wn−1)

CS447: Natural Language Processing (J. Hockenmaier) 23

Start and End symbols <s>… <\s>
Why do we need a start-of-sentence symbol?
This is just a mathematical convenience, since it allows us to
write e.g. P(w1 | <s>) for the probability of the first word in
analogy to P(wi+1 | wi ) for any other word.

Why do we need an end-of-sentence symbol?

This is necessary if we want to compare the probability of
strings of different lengths (and actually define a probability
distribution over V*).
We include <\s> in the vocabulary V, require that each string
ends in <\s> and that <\s> can only appear at the end of
sentences, and estimate P(wi+1 = <\s> | wi ).

CS447: Natural Language Processing (J. Hockenmaier) 24

Parameter estimation (training)
Parameters: the actual probabilities
P(wi = ‘the’ | wi-1 = ‘on’) = ???

We need (a large amount of) text as training data

to estimate the parameters of a language model.

The most basic estimation technique:

relative frequency estimation (= counts)
P(wi = ‘the’ | wi-1 = ‘on’) = C(‘on the’) / C(‘on’)
Also called Maximum Likelihood Estimation (MLE)

MLE assigns all probability mass to events that occur

in the training corpus.
CS447: Natural Language Processing (J. Hockenmaier) 25
How do we use language models?
Independently of any application, we can use a
language model as a random sentence generator
(i.e we sample sentences according to their language model
probability)

Systems for applications such as machine translation,

speech recognition, spell-checking, generation, often
produce multiple candidate sentences as output.
- We prefer output sentences SOut that have a higher probability
- We can use a language model P(SOut) to score and rank these
different candidate output sentences, e.g. as follows:
argmaxSOut P(SOut | Input) = argmaxSOut P(Input | SOut)P(SOut)

CS447: Natural Language Processing (J. Hockenmaier) 26

Using n-gram models
to generate language

CS447: Natural Language Processing (J. Hockenmaier) 27

Generating from a distribution
How do you generate text from an n-gram model?

That is, how do you sample from a distribution P(X |Y=y)?

- Assume X has N possible outcomes (values): {x1, …, xN}
and P(X=xi | Y=y) = pi
- Divide the interval [0,1] into N smaller intervals according to
the probabilities of the outcomes
- Generate a random number r between 0 and 1.
- Return the x1 whose interval the number is in.
r

x1 x2 x3 x4 x5
0 p1 p1+p2 p1+p2+p3 p1+p2+p3+p4 1

CS447: Natural Language Processing (J. Hockenmaier) 28

Generating the Wall Street Journal

CS447: Natural Language Processing (J. Hockenmaier) 29

Generating Shakespeare

CS447: Natural Language Processing (J. Hockenmaier) 30

Intrinsic vs Extrinsic Evaluation
How do we know whether one language model
is better than another?

There are two ways to evaluate models:

- intrinsic evaluation captures how well the model captures
what it is supposed to capture (e.g. probabilities)
- extrinsic (task-based) evaluation captures how useful the
model is in a particular task.

Both cases require an evaluation metric that allows us

to measure and compare the performance of different
models.

CS447: Natural Language Processing (J. Hockenmaier) 31

How do we evaluate models?
Define an evaluation metric (scoring function).
We will want to measure how similar the predictions
of the model are to real text.

Train the model on a ‘seen’ training set

Perhaps: tune some parameters based on held-out data
(disjoint from the training data, meant to emulate unseen data)

Test the model on an unseen test set

(usually from the same source (e.g. WSJ) as the training data)
Test data must be disjoint from training and held-out data
Compare models by their scores (more on this next week).

CS447: Natural Language Processing (J. Hockenmaier) 32

Intrinsic Evaluation
of Language Models:
Perplexity

CS447: Natural Language Processing (J. Hockenmaier) 33

Perplexity
Perplexity is the inverse of the probability of the test
set (as assigned by the language model), normalized
by the number of word tokens in the test set.

Minimizing perplexity = maximizing probability!

Language model LM1 is better than LM2

if LM1 assigns lower perplexity (= higher probability)
to the test corpus w1…wN

NB: the perplexity of LM1 and LM2 can only be directly

compared if both models use the same vocabulary.

CS447: Natural Language Processing (J. Hockenmaier) 34

Perplexity
The inverse of the probability of the test set,
normalized by the number of tokens in the test set.

Assume the test corpus has N tokens, w1…wN

If the LM assigns probability P(w1, …, wi−n) to the test
corpus, its perplexity, PP(w1…wN), is defined as:
11
PPPP(w
(w11...w N ))
...wN =
= (w11...w
P(w
P ...wNN)) NN
⇥⇥
11
= NN

PP(w(w11...w
...wNN))
⇧⇧
A LM with lower perplexity is⌅ ⌅
⌅better
⌅ NN because
1 1
it assigns
= ⇤
N⇤
N
a higher probability to the unseen PP
test
(w
(w i |w corpus.
|w
1 ...w i i 1 )1 )
...w
i=1
i=1 i 1
⇧⇧
⌅⌅ NN
⌅
CS447: Natural Language Processing (J. Hockenmaier)
⌅ 1 35
=def ⇤ N
⇤
N 1
Perplexity PP(w1…wn)
Given a test corpus with N tokens, w1…wN,
and an n-gram model P(wi | wi−1, …, wi−n+1)
we compute its perplexity PP(w1…wN) as follows:
1
11
(w
PPPPPP(w
(w ...wNNN)))
11...w
1 ...w
=
=
= P (w
P (w111...w
...wN
...w N)))
N N
NN
⇥⇥
11
1
=
=
=
N
N
P (w
P (w ...w
(w111...w
...wNN )))
N
⇧⇧
⇧
⌅⌅N
⌅
⌅⌅ N
11
= ⇤
N⌅
⇤
N 1
=
= N⇤
N
(w |w )
(Chain rule)
P (wi|w11...w
P
i=1 P (wii |w
i=1
...wii 11) )
...w
1 i 1
s ⇧
⌅⇧N
i=1
N ⌅N 1 11
’
N= ⇤⌅ N (N-gram
= def
N
=def N
⇤
=def P(wi=1
def N
|w P (w,i...,
|w i wni...w 1))
ii 1) model)
i=1 i
i=1
P (wi |wi n ...wi 1 )
i 1 n+1

CS447: Natural Language Processing (J. Hockenmaier) 36

Practical issues
Since language model probabilities are very small,
multiplying them together often yields to underflow.

It is often better to use logarithms instead, so replace

s
N
1
PP(w1 ...wN ) =def N
’ P(wi |wi 1 , ..., wi n+1 )
i=1

with
✓ N ◆
1
PP(w1 ...wN ) =def exp Â
N i=1
log P(wi |wi 1 , ..., wi n+1

CS447: Natural Language Processing (J. Hockenmaier) 37

Perplexity and LM order
Bigram LMs have lower perplexity than unigram LMs
Trigram LMs have lower perplexity than bigram LMs
…

Example from the textbook

(WSJ corpus, trained on 38M tokens, tested on 1.5 M tokens,
vocabulary: 20K word types)

Unigram Bigram Trigram

Perplexity 962 170 109

CS447: Natural Language Processing (J. Hockenmaier) 38

Extrinsic (Task-Based)
Evaluation of LMs:
Word Error Rate

CS447: Natural Language Processing (J. Hockenmaier) 39

Intrinsic vs. Extrinsic Evaluation
Perplexity tells us which LM assigns a higher
probability to unseen text

This doesn’t necessarily tell us which LM is better for

our task (i.e. is better at scoring candidate sentences)

Task-based evaluation:
- Train model A, plug it into your system for performing task T
- Evaluate performance of system A on task T.
- Train model B, plug it in, evaluate system B on same task T.
- Compare scores of system A and system B on task T.

CS447: Natural Language Processing (J. Hockenmaier) 40

Word Error Rate (WER)
Originally developed for speech recognition.

How much does the predicted sequence of words

differ from the actual sequence of words in the correct
transcript?
Insertions + Deletions + Substitutions
WER =
Actual words in transcript

Insertions: “eat lunch” → “eat a lunch”

Deletions: “see a movie” → “see movie”
Substitutions: “drink ice tea”→ “drink nice tea”

CS447: Natural Language Processing (J. Hockenmaier) 41

But….
… unseen test data will contain unseen words

CS447: Natural Language Processing (J. Hockenmaier) 42

Getting back to
Shakespeare…

CS447: Natural Language Processing (J. Hockenmaier) 43

Generating Shakespeare

CS447: Natural Language Processing (J. Hockenmaier) 44

Shakespeare as corpus
The Shakespeare corpus consists of N=884,647 word
tokens and a vocabulary of V=29,066 word types

Shakespeare produced 300,000 bigram types

out of V2= 844 million possible bigram types.

99.96% of possible bigrams don’t occur in the corpus.

Our relative frequency estimate assigns non-zero

probability to only 0.04% of the possible bigrams
That percentage is even lower for trigrams, 4-grams, etc.
4-grams look like Shakespeare because they are Shakespeare!

CS447: Natural Language Processing (J. Hockenmaier) 45

MLE doesn’t capture unseen events
We estimated a model on 440K word tokens, but:

Only 30,000 word types occur in the training data

Any word that does not occur in the training data
has zero probability!

Only 0.04% of all possible bigrams (over 30K word

types) occur in the training data
Any bigram that does not occur in the training data
has zero probability (even if we have seen both words in
the bigram)

CS447: Natural Language Processing (J. Hockenmaier) 46

Zipf’s law: the long tail
How many words occur How
once, twice,
many 100 Ntimes,
words occur times? 1000 times?
100000

A few words
the r-th most (log-scale) 10000 are very frequent
common word wr
(log)

has P(wr) ∝ 1/r 1000

Frequency
Word frequency

Most words
100
are very rare

1
1 10 100 1000 10000 100000
Number of words (log)
English words, sorted by frequency (log-scale)
w1 = the, w2 = to, …., w5346 = computer, ...
In natural language:
- A small number of events (e.g. words) occur with high frequency
- A large number of events occur with very low frequency
CS447: Natural Language Processing (J. Hockenmaier) 47
So….
… we can’t actually evaluate our MLE models on
unseen test data (or system output)…

… because both are likely to contain words/n-grams

that these models assign zero probability to.

We need language models that assign some

probability mass to unseen words and n-grams.

We will get back to this on Friday.

CS447: Natural Language Processing (J. Hockenmaier) 48

To recap….

CS447: Natural Language Processing (J. Hockenmaier) 49

Today’s key concepts
N-gram language models
Independence assumptions
Relative frequency (maximum likelihood) estimation
Evaluating language models: Perplexity, WER
Zipf’s law

Today’s reading:
Jurafsky and Martin, Chapter 4, sections 1-4 (2008 edition)
Chapter 3 (3rd Edition)

Friday’s lecture: Handling unseen events!

CS447: Natural Language Processing (J. Hockenmaier) 50

NLP Notes of Unit One
No ratings yet
NLP Notes of Unit One
278 pages
NLP Unit-1
No ratings yet
NLP Unit-1
12 pages
Nooj Manual: (Revised 2020/06/09)
100% (1)
Nooj Manual: (Revised 2020/06/09)
213 pages
AllExercise FA-TOC Sipser
0% (1)
AllExercise FA-TOC Sipser
17 pages
NLP - Module 2
No ratings yet
NLP - Module 2
77 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
NLP Unit I
No ratings yet
NLP Unit I
117 pages
Transducers: Anab Batool Kazmi
No ratings yet
Transducers: Anab Batool Kazmi
38 pages
NLP Important Question and Answers Module Wise
No ratings yet
NLP Important Question and Answers Module Wise
101 pages
Syllabus AI MTech CSE
No ratings yet
Syllabus AI MTech CSE
31 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
N Grams - Nptel Notes
No ratings yet
N Grams - Nptel Notes
75 pages
Chapter 6-NLP
No ratings yet
Chapter 6-NLP
8 pages
Data Link Layer Flow Control
No ratings yet
Data Link Layer Flow Control
17 pages
Module5 DS PPT
No ratings yet
Module5 DS PPT
38 pages
Ngrams
100% (1)
Ngrams
22 pages
Lecture 2 Language Model
No ratings yet
Lecture 2 Language Model
127 pages
L3 LanguageModels
No ratings yet
L3 LanguageModels
118 pages
Mallet Tutorial
No ratings yet
Mallet Tutorial
120 pages
ML Kernel Methods
No ratings yet
ML Kernel Methods
51 pages
NLP Question and Answers Final
No ratings yet
NLP Question and Answers Final
129 pages
Error Detection and Correction - Hamming Code, CRC Checksum-1
No ratings yet
Error Detection and Correction - Hamming Code, CRC Checksum-1
92 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
Natural Language Processing:: N-Gram Language Models
No ratings yet
Natural Language Processing:: N-Gram Language Models
48 pages
2 Corpora and Smoothing
No ratings yet
2 Corpora and Smoothing
85 pages
PCFG
No ratings yet
PCFG
79 pages
Lecture1 5 IntroToNLP
No ratings yet
Lecture1 5 IntroToNLP
73 pages
Chapter Four 1
No ratings yet
Chapter Four 1
91 pages
PCFG
No ratings yet
PCFG
79 pages
Language Models: Instructor: Rada Mihalcea Taught by Bonnie Dorr at Univ. of Maryland
No ratings yet
Language Models: Instructor: Rada Mihalcea Taught by Bonnie Dorr at Univ. of Maryland
74 pages
CME4408 P4 RE FSA Morphology FST
No ratings yet
CME4408 P4 RE FSA Morphology FST
85 pages
Lec 3 slp04 LM and Ngrans
No ratings yet
Lec 3 slp04 LM and Ngrans
73 pages
NLP Cat 2
No ratings yet
NLP Cat 2
78 pages
Lecture-3 (Words - Transducers)
No ratings yet
Lecture-3 (Words - Transducers)
61 pages
05 Introduction To NLP
No ratings yet
05 Introduction To NLP
63 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
59 pages
Lecture 03
No ratings yet
Lecture 03
58 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
Adv. Natural Language Processing: Instructor: Dr. Muhammad Asfand-E-Yar
No ratings yet
Adv. Natural Language Processing: Instructor: Dr. Muhammad Asfand-E-Yar
54 pages
Lecture 02
No ratings yet
Lecture 02
62 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Language Models Probabilistic Model 1735045992
No ratings yet
Language Models Probabilistic Model 1735045992
55 pages
Language Models
No ratings yet
Language Models
50 pages
CH 6. Applications of AI-NLP
No ratings yet
CH 6. Applications of AI-NLP
65 pages
10 1 1 37 307
No ratings yet
10 1 1 37 307
50 pages
NLP07 Generative Language Models
No ratings yet
NLP07 Generative Language Models
50 pages
NLP Week4 Ngrams
No ratings yet
NLP Week4 Ngrams
60 pages
Bcse306l Ai Module-7 Smsatapathy
No ratings yet
Bcse306l Ai Module-7 Smsatapathy
51 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
56 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Lecture 04
No ratings yet
Lecture 04
42 pages
Lec15 17 N Gram Language Model Part1
No ratings yet
Lec15 17 N Gram Language Model Part1
49 pages
2 - Unit - 1 - Find Structures of Words
No ratings yet
2 - Unit - 1 - Find Structures of Words
42 pages
Module 7 - Multimedia Information Retrieval
No ratings yet
Module 7 - Multimedia Information Retrieval
38 pages
NLP Lecture 5 PDF
No ratings yet
NLP Lecture 5 PDF
36 pages
Finite State Transducers: Data Structures and Algorithms For Computational Linguistics III
No ratings yet
Finite State Transducers: Data Structures and Algorithms For Computational Linguistics III
31 pages
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
NLP PLM
No ratings yet
NLP PLM
35 pages
Module 6 - Intrusion Detection System
No ratings yet
Module 6 - Intrusion Detection System
31 pages
NLP Week 03
No ratings yet
NLP Week 03
33 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
Language Technology in Tamil
No ratings yet
Language Technology in Tamil
38 pages
Lecture 4 N Grams
No ratings yet
Lecture 4 N Grams
29 pages
Speech Recognition With Weighted Finite-State Transducers: Mehryar Mohri Fernando Pereira Michael Riley
No ratings yet
Speech Recognition With Weighted Finite-State Transducers: Mehryar Mohri Fernando Pereira Michael Riley
31 pages
10 FST
No ratings yet
10 FST
26 pages
NLP Unit 1
No ratings yet
NLP Unit 1
23 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
NLP 1.2
No ratings yet
NLP 1.2
22 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
N-Gram Language Models
No ratings yet
N-Gram Language Models
26 pages
Lecture02 Tokenization
No ratings yet
Lecture02 Tokenization
16 pages
Morphosyntactic Analysis of Georgian
No ratings yet
Morphosyntactic Analysis of Georgian
21 pages
Language Modelling
No ratings yet
Language Modelling
17 pages
NLP UNIT III (Part 1)
No ratings yet
NLP UNIT III (Part 1)
15 pages
Unit1 SNLP Osmania University
No ratings yet
Unit1 SNLP Osmania University
16 pages
Language Models: Last Lecture's Key Concepts
No ratings yet
Language Models: Last Lecture's Key Concepts
13 pages
Amazigh-Sys: Intelligent System For Recognition of Amazigh Words
No ratings yet
Amazigh-Sys: Intelligent System For Recognition of Amazigh Words
8 pages
Artificial Intelligence: Natural Language Processing
No ratings yet
Artificial Intelligence: Natural Language Processing
13 pages
1JgIM70DTxiYCDO9A98YgQ Lecture04-Part3
No ratings yet
1JgIM70DTxiYCDO9A98YgQ Lecture04-Part3
12 pages
N-Grams - Text Representation
No ratings yet
N-Grams - Text Representation
23 pages
Unit 1 NLP
No ratings yet
Unit 1 NLP
11 pages
NLP Sample QB
No ratings yet
NLP Sample QB
12 pages
Csa4006 Natural-Language-Processing LT 1.0 6 Csa4006
No ratings yet
Csa4006 Natural-Language-Processing LT 1.0 6 Csa4006
2 pages
It-3035 (NLP) - CS Mid Feb 2024
No ratings yet
It-3035 (NLP) - CS Mid Feb 2024
6 pages
FST Explained
No ratings yet
FST Explained
7 pages
Finite-State Description of Vietnamese Reduplication
No ratings yet
Finite-State Description of Vietnamese Reduplication
7 pages
Finite State Transducers
No ratings yet
Finite State Transducers
4 pages