0% found this document useful (0 votes)

157 views28 pages

NLP Programming en 01 Unigramlm

This document provides an introduction to unigram language models for natural language processing. It explains that unigram models assign a probability to each word independently of context by calculating the frequency of that word in a training corpus. The document outlines how to estimate these probabilities using maximum likelihood estimation and adjust for unknown words. It also describes how to evaluate language models on test data by calculating the likelihood, perplexity, and coverage of the model for the test sentences. Exercises are provided to implement a program for training a unigram model from data and testing it on a corpus.

Uploaded by

dhashrath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

157 views28 pages

NLP Programming en 01 Unigramlm

Uploaded by

dhashrath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

NLP Programming Tutorial 1 Unigram Language Model

NLP Programming Tutorial 1 Unigram Language Models

Graham Neubig Nara Institute of Science and Technology (N IST!

NLP Programming Tutorial 1 Unigram Language Model

Language Model "asics

NLP Programming Tutorial 1 Unigram Language Model

#hy Language Models$

#e ha%e an &nglish s'eech recognition system( )hich ans)er is better$

#1 * s'eech recognition system #+ * s'eech cognition system #- * s'ec. 'odcast histamine #, *

S'eech

NLP Programming Tutorial 1 Unigram Language Model

#hy Language Models$

#e ha%e an &nglish s'eech recognition system( )hich ans)er is better$

#1 * s'eech recognition system #+ * s'eech cognition system #- * s'ec. 'odcast histamine #, *

S'eech

Language models tell us the ans)er/

NLP Programming Tutorial 1 Unigram Language Model

Probabilistic Language Models

Language models assign a 'robability to each sentence P(#1! * ,01+1 2 11-P(#+! * 304-+ 2 11-, P(#-! * +0,-+ 2 11-5 P(#,! * 401+, 2 11-+-

#1 * s'eech recognition system #+ * s'eech cognition system #- * s'ec. 'odcast histamine #, *

#e )ant P(#1! 6 P(#+! 6 P(#-! 6 P(#,!

(or P(#,! 6 P(#1!( P(#+!( P(#-! for 7a'anese$!

NLP Programming Tutorial 1 Unigram Language Model

8alculating Sentence Probabilities

#e )ant the 'robability of

# * s'eech recognition system

9e'resent this mathematically as:

P(;#; * -( )1*<s'eech<( )+*<recognition<( )-*<system<!

NLP Programming Tutorial 1 Unigram Language Model

8alculating Sentence Probabilities

#e )ant the 'robability of

# * s'eech recognition system

9e'resent this mathematically as (using chain rule!:

P()1*=s'eech< ; )1 * =>s6<! 2 P()+*<recognition< ; )1 * =>s6<( )1*=s'eech<! 2 P()-*<system< ; )1 * =>s6<( )1*=s'eech<( )+*<recognition<!
2 P(),*<>?s6< ; )1 * =>s6<( )1*=s'eech<( )+*<recognition<( )-*<system<!

P(;#; * -( )1<s'eech<( )+<recognition<( )-<system<!

N@T&: sentence start >s6 and end >?s6 symbol

N@T&: P()1 * >s6! * 1

NLP Programming Tutorial 1 Unigram Language Model

Incremental 8om'utation

Pre%ious eAuation can be )ritten:

P ( W )=i =1 P ( wiw 0 wi1 )

W + 1

Bo) do )e decide 'robability$

P ( wiw 0 wi1 )

NLP Programming Tutorial 1 Unigram Language Model

MaCimum Li.elihood &stimation

8alculate )ord strings in cor'us( ta.e fraction

c ( w1 wi ) P ( wiw 1 w i1)= c ( w 1 w i 1)
i li%e in osa.a 0 >?s6 i am a graduate student 0 >?s6 my school is in nara 0 >?s6 P(li%e ; >s6 i! * c(>s6 i li%e!?c(>s6 i! * 1 ? + * 10D P(am ; >s6 i! * c(>s6 i am!?c(>s6 i! * 1 ? + * 10D

NLP Programming Tutorial 1 Unigram Language Model

Problem #ith Eull &stimation

#ea. )hen counts are lo):

Training:

i li%e in osa.a 0 >?s6 i am a graduate student 0 >?s6 my school is in nara 0 >?s6 >s6 i li%e in nara 0 >?s6

Test:

P(nara;>s6 i li%e in! * 1?1 * 1 P(#>s6 i li%e in nara 0 >?s6! 1

NLP Programming Tutorial 1 Unigram Language Model

Unigram Model

Fo not use history:

P ( wiw 1 w i1) P ( wi )=

c ( wi ) ) w c (w

P(nara! * 1?+1 * 101D i li%e in osa.a 0 >?s6 * +?+1 * 101 i am a graduate student 0 >?s6 P(i! my school is in nara 0 >?s6 P(>?s6! * -?+1 * 101D P(#*i li%e in nara 0 >?s6! * 101 2 101D 2 101 2 101D 2 101D 2 101D * D0G+D 2 11 -5
11

NLP Programming Tutorial 1 Unigram Language Model

"e 8areful of Integers/

Fi%ide t)o integers( you get an integer (rounded do)n!

$ ./my-program.py 0

8on%ert one integer to a float( and you )ill be @H

$ ./my-program.py 0.5

NLP Programming Tutorial 1 Unigram Language Model

#hat about Un.no)n #ords$/

Sim'le ML estimation doesnIt )or.

i li%e in osa.a 0 >?s6 i am a graduate student 0 >?s6 my school is in nara 0 >?s6 P(nara! * 1?+1 * 101D P(i! * +?+1 * 101 P(.yoto! * 1?+1 * 1

@ften( un.no)n )ords are ignored ( S9! "etter )ay to sol%e

Sa%e some 'robability for un.no)n )ords (Jun. * 1-J1! Guess total %ocabulary siKe (N!( including un.no)ns

1 P ( wi )=1 P ML ( wi )+( 1 1) N
13

NLP Programming Tutorial 1 Unigram Language Model

Un.no)n #ord &Cam'le

Total %ocabulary siKe: N*11G Un.no)n )ord 'robability: Jun.*101D (J1 * 104D!

1 P ( wi )=1 P ML ( wi )+( 1 1) N
P(nara! * 104D2101D L 101D2(1?11G! * 101,5D111D P(i! * 104D21011 L 101D2(1?11G! * 1014D1111D P(.yoto! * 104D21011 L 101D2(1?11G! * 101111111D
14

NLP Programming Tutorial 1 Unigram Language Model

&%aluating Language Models

NLP Programming Tutorial 1 Unigram Language Model

&C'erimental Setu'

Use training and test sets

Train Model

Training Fata
i li%e in osa.a i am a graduate student my school is in nara 000

Model

Testing Fata
i li%e in nara i am a student i ha%e lots of home)or. M

Test Model Model ccuracy Li.elihood Log Li.elihood &ntro'y Per'leCity 16

NLP Programming Tutorial 1 Unigram Language Model

Li.elihood

Li.elihood is the 'robability of some obser%ed data (the test set #test!( gi%en the model M

P ( W testM )= w W P ( wM )
test

i li%e in nara i am a student my classes are hard

P()<i li%e in nara<;M! P()<i am a student<;M! P()<my classes are hard<;M!

+0D+211-+1 -0,3211-14 +01D211--,

C C

* 1034211-517

NLP Programming Tutorial 1 Unigram Language Model

Log Li.elihood

Li.elihood uses %ery small numbers*underflo) Ta.ing the log resol%es this 'roblem

log P ( W testM )= w W log P ( wM )

test

i li%e in nara i am a student my classes are hard

log P()<i li%e in nara<;M! log P()<i am a student<;M!

-+10D3 -130,D

log P()<my classes are hard<;M! ---0G5

* -5+0G1
18

NLP Programming Tutorial 1 Unigram Language Model

8alculating Logs

PythonIs math 'ac.age has a function for logs

$ ./my-program.py 4.60517018599 2.0

NLP Programming Tutorial 1 Unigram Language Model

&ntro'y

&ntro'y B is a%erage negati%e log+ li.elihood 'er )ord

1 H ( W testM )= | W test |
i li%e in nara i am a student my classes are hard

w W

test

log 2 P ( wM )

log+ P()<i li%e in nara<;M! log+ P()<i am a student<;M!

log+ P()<my classes are hard<;M! 11103, N of )ords*

( )
G30,G10-+

L ?

1+ * +10120

2 note( )e can also count >?s6 in N of )ords (in )hich case it is 1D!

NLP Programming Tutorial 1 Unigram Language Model

&ntro'y and 8om'ression

&ntro'y B is also the a%erage number of bits needed to encode information (ShannonIs information theory!

a bird a cat a dog a >?s6

a bird cat dog >?s6 O1 O 111 O 111 O 111 O 111 P()* =a<! * 10D -log+ 10D * 1 P()* =bird<! * 101+D -log+ 101+D * P()* =cat<! * 101+D -log+ 101+D * P()* =dog<! * 101+D -log+ 101+D * P()* =>?s6<! * 101+D -log+ 101+D * 21

1 H= | W test | w W

wtest

log 2 P ( wM )

&ncoding

1111111111111111

NLP Programming Tutorial 1 Unigram Language Model

Per'leCity

&Aual to t)o to the 'o)er of 'er-)ord entro'y

PPL =2

(Mainly because it ma.es more im'ressi%e numbers! Eor uniform distributions( eAual to the siKe of %ocabulary

V =5

1 H =log 2 5

PPL =2 =2

log 2

1 5

log2 5

=5
22

NLP Programming Tutorial 1 Unigram Language Model

8o%erage

The 'ercentage of .no)n )ords in the cor'us a bird a cat a dog a >?s6
=dog< is an un.no)n )ord 8o%erage: 5?3 2

2 often omit the sentence-final symbol O G?5

NLP Programming Tutorial 1 Unigram Language Model

&Cercise

NLP Programming Tutorial 1 Unigram Language Model

&Cercise

#rite t)o 'rograms

train-unigram: 8reates a unigram model test-unigram: 9eads a unigram model and calculates entro'y and co%erage for the test set

Test them test?11-train-in'ut0tCt test?11-test-in'ut0tCt Train the model on data?)i.i-en-train0)ord 8alculate entro'y and co%erage on data?)i.i-entest0)ord 9e'ort your scores neCt )ee.
25

NLP Programming Tutorial 1 Unigram Language Model

train-unigram Pseudo-8ode
create a map counts create a variable total_count * 1 for each line in the training_file split line into an array of words append =>?s6< to the end of words for each word in words add 1 to countsPwordQ add 1 to total_count open the model_file for )riting for each )ord( count in counts probability * countsPwordQ?total_count print word( probability to model_file
26

NLP Programming Tutorial 1 Unigram Language Model

test-unigram Pseudo-8ode
1 = 104D, Jun. * 1-J1( R * 1111111( # * 1( B * 1

Load Model
create a map probabilities for each line in model_file split line into w and P set probabilitiesPwQ * P

Test and Print

for each line in test_file split line into an array of words append =>?s6< to the end of words for each w in words add 1 to W set P * Jun. ? R if probabilitiesPwQ eCists set P L* J1 2 'robabilitiesPwQ else add 1 to unk add -log2 P to H print =entro'y * H!W print =co%erage * < L (#-un.!?#
27

NLP Programming Tutorial 1 Unigram Language Model

Than. Sou/

NLP 2-5 Unit Notes
No ratings yet
NLP 2-5 Unit Notes
83 pages
04 Language Modeling
No ratings yet
04 Language Modeling
70 pages
NLP - Module 2
No ratings yet
NLP - Module 2
77 pages
Lecture4 421
No ratings yet
Lecture4 421
111 pages
Lecture 2 Language Model
No ratings yet
Lecture 2 Language Model
127 pages
Language Modeling and Spelling Correction
No ratings yet
Language Modeling and Spelling Correction
97 pages
Lec15 17 N Gram Language Model Part1
No ratings yet
Lec15 17 N Gram Language Model Part1
49 pages
NLP Week4 Ngrams
No ratings yet
NLP Week4 Ngrams
60 pages
NLP07 Generative Language Models
No ratings yet
NLP07 Generative Language Models
50 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
Ai&Ml Bai601 NLP Lab Manual
No ratings yet
Ai&Ml Bai601 NLP Lab Manual
48 pages
Language Modeling
No ratings yet
Language Modeling
50 pages
IS 7118 Unit-4 N-Grams
100% (2)
IS 7118 Unit-4 N-Grams
93 pages
02 NLP LM
No ratings yet
02 NLP LM
99 pages
Deeplearning Ai
No ratings yet
Deeplearning Ai
69 pages
NLP Week 03
No ratings yet
NLP Week 03
33 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
N Gram Presentation
No ratings yet
N Gram Presentation
29 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
Module-1 ch-2
No ratings yet
Module-1 ch-2
31 pages
Lecture 4 N Grams
No ratings yet
Lecture 4 N Grams
29 pages
N-Gram in NLP
No ratings yet
N-Gram in NLP
15 pages
03 LanguageModel
No ratings yet
03 LanguageModel
41 pages
NLP 1.2
No ratings yet
NLP 1.2
22 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
25 pages
Adv. Natural Language Processing: Instructor: Dr. Muhammad Asfand-E-Yar
No ratings yet
Adv. Natural Language Processing: Instructor: Dr. Muhammad Asfand-E-Yar
54 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
NLP - (Natural Language Processing Lab Manual)
No ratings yet
NLP - (Natural Language Processing Lab Manual)
12 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
NLP PLM
No ratings yet
NLP PLM
35 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
Unit 1
No ratings yet
Unit 1
17 pages
Lecture 6 To 8 N-Gram
No ratings yet
Lecture 6 To 8 N-Gram
19 pages
Unit 5 Notes Final
No ratings yet
Unit 5 Notes Final
14 pages
Statistical Inference
No ratings yet
Statistical Inference
38 pages
5) Lecture Feb11&13&17&18
No ratings yet
5) Lecture Feb11&13&17&18
21 pages
IJISRT18DC138
No ratings yet
IJISRT18DC138
6 pages
NLP Lec 11
No ratings yet
NLP Lec 11
6 pages
Project 4e Level 1 Student's Book Sample
100% (1)
Project 4e Level 1 Student's Book Sample
16 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
Artificial Intelligence: Natural Language Processing
No ratings yet
Artificial Intelligence: Natural Language Processing
13 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Unit - 2
No ratings yet
Unit - 2
10 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Julie Word Exercise 1 - Home Tab
No ratings yet
Julie Word Exercise 1 - Home Tab
5 pages
SSC English Sample 2023
No ratings yet
SSC English Sample 2023
146 pages
What Is A Performance Task
No ratings yet
What Is A Performance Task
104 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
N Gram Model
No ratings yet
N Gram Model
4 pages
Nlp Basic 03-N-gram Language Model: Nguyễn Quốc Thái
No ratings yet
Nlp Basic 03-N-gram Language Model: Nguyễn Quốc Thái
31 pages
Readings in Philippine History LESSON 12
100% (1)
Readings in Philippine History LESSON 12
34 pages
A34 NLP Expt 02
No ratings yet
A34 NLP Expt 02
7 pages
19102B0052 - NLP - Exp - 4
No ratings yet
19102B0052 - NLP - Exp - 4
5 pages
Ngrams
100% (1)
Ngrams
22 pages
Trigram Language Models
No ratings yet
Trigram Language Models
19 pages
Language Models: CS6370: Natural Language Processing
No ratings yet
Language Models: CS6370: Natural Language Processing
35 pages
Lala Sita Ram
No ratings yet
Lala Sita Ram
29 pages
Open Mind Intermediate Teachers Book Unit 8 PDF
67% (3)
Open Mind Intermediate Teachers Book Unit 8 PDF
10 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
NLP Exp03
No ratings yet
NLP Exp03
5 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Analysis of Statistical Parsing in Natural Language Processing
No ratings yet
Analysis of Statistical Parsing in Natural Language Processing
6 pages
The Guide
No ratings yet
The Guide
9 pages
Language Modelling
No ratings yet
Language Modelling
3 pages
The Michelin Guide Tokyo 2018
No ratings yet
The Michelin Guide Tokyo 2018
18 pages
Speaking Speaking: Procedure, Script and Materials Procedure, Script and Materials
No ratings yet
Speaking Speaking: Procedure, Script and Materials Procedure, Script and Materials
8 pages
Roncesvalles - The Tale of Lute by Gao Ming
No ratings yet
Roncesvalles - The Tale of Lute by Gao Ming
13 pages
PD Questions
No ratings yet
PD Questions
9 pages
Baccmass - Chords
No ratings yet
Baccmass - Chords
7 pages
Tut 28 POYO & HQ & Supplementary Qns (Solutions)
No ratings yet
Tut 28 POYO & HQ & Supplementary Qns (Solutions)
9 pages
ENGLISH 4 4th Rating
No ratings yet
ENGLISH 4 4th Rating
47 pages
Full-Text Search and PostgresSQL Extensions PDF
No ratings yet
Full-Text Search and PostgresSQL Extensions PDF
26 pages
RWS Module 2
No ratings yet
RWS Module 2
33 pages
Regedit XLR8 - FFXX
No ratings yet
Regedit XLR8 - FFXX
3 pages
06 IP-20G First Login - 0316 - v6 - 8.3
No ratings yet
06 IP-20G First Login - 0316 - v6 - 8.3
15 pages
(影印) the Name and Nature of Translation Studies
No ratings yet
(影印) the Name and Nature of Translation Studies
13 pages
We Use Machines in Our Day To Day Life To Make Our Life Easier. Machines Can Carry Out Our Task
No ratings yet
We Use Machines in Our Day To Day Life To Make Our Life Easier. Machines Can Carry Out Our Task
8 pages
Machine Learning Lab Assignment 3
0% (1)
Machine Learning Lab Assignment 3
6 pages
Improved Lower Bound On The Shannon Capacity of C7
No ratings yet
Improved Lower Bound On The Shannon Capacity of C7
6 pages
Timeline Project and Rubric 2018
No ratings yet
Timeline Project and Rubric 2018
2 pages
Task 2 - Opinion Essay
No ratings yet
Task 2 - Opinion Essay
6 pages
Design and Verification of 16 Bit RISC Processor
No ratings yet
Design and Verification of 16 Bit RISC Processor
6 pages
Extra Layer of Authentication.: Keepassxc Is A Free and Open-Source Password Manager. It Started As A Community Fork
No ratings yet
Extra Layer of Authentication.: Keepassxc Is A Free and Open-Source Password Manager. It Started As A Community Fork
2 pages
Protein Purification Problem Set
No ratings yet
Protein Purification Problem Set
10 pages
S. G. Balekundri Institute of Technology: Shivabasavanagar, Belagavi-590 010, Karnataka - India
No ratings yet
S. G. Balekundri Institute of Technology: Shivabasavanagar, Belagavi-590 010, Karnataka - India
7 pages
SDS-PAGE Principle
No ratings yet
SDS-PAGE Principle
2 pages
Oop Set 9
No ratings yet
Oop Set 9
4 pages
ZID Maping
No ratings yet
ZID Maping
1 page
Resume Sample 1
No ratings yet
Resume Sample 1
2 pages
Bacteria Growth
No ratings yet
Bacteria Growth
1 page
Exercise 1
No ratings yet
Exercise 1
1 page

NLP Programming en 01 Unigramlm

Uploaded by

NLP Programming en 01 Unigramlm

Uploaded by

NLP Programming Tutorial 1 Unigram Language Model

NLP Programming Tutorial 1 Unigram Language Models

Graham Neubig Nara Institute of Science and Technology (N IST!

NLP Programming Tutorial 1 Unigram Language Model

Language Model "asics

NLP Programming Tutorial 1 Unigram Language Model

#hy Language Models$

#e ha%e an &nglish s'eech recognition system( )hich ans)er is better$

NLP Programming Tutorial 1 Unigram Language Model

#hy Language Models$

#e ha%e an &nglish s'eech recognition system( )hich ans)er is better$

Language models tell us the ans)er/

NLP Programming Tutorial 1 Unigram Language Model

Probabilistic Language Models

#1 * s'eech recognition system #+ * s'eech cognition system #- * s'ec. 'odcast histamine #, *

#e )ant P(#1! 6 P(#+! 6 P(#-! 6 P(#,!

(or P(#,! 6 P(#1!( P(#+!( P(#-! for 7a'anese$!

NLP Programming Tutorial 1 Unigram Language Model

8alculating Sentence Probabilities

#e )ant the 'robability of

9e'resent this mathematically as:

NLP Programming Tutorial 1 Unigram Language Model

8alculating Sentence Probabilities

#e )ant the 'robability of

9e'resent this mathematically as (using chain rule!:

P(;#; * -( )1*<s'eech<( )+*<recognition<( )-*<system<! *

N@T&: sentence start >s6 and end >?s6 symbol

N@T&: P()1 * >s6! * 1

NLP Programming Tutorial 1 Unigram Language Model

Pre%ious eAuation can be )ritten:

P ( W )=i =1 P ( wiw 0 wi1 )

Bo) do )e decide 'robability$

NLP Programming Tutorial 1 Unigram Language Model

MaCimum Li.elihood &stimation

8alculate )ord strings in cor'us( ta.e fraction

NLP Programming Tutorial 1 Unigram Language Model

Problem #ith Eull &stimation

#ea. )hen counts are lo):

P(nara;>s6 i li%e in! * 1?1 * 1 P(#*>s6 i li%e in nara 0 >?s6! * 1

NLP Programming Tutorial 1 Unigram Language Model

Fo not use history:

NLP Programming Tutorial 1 Unigram Language Model

"e 8areful of Integers/

Fi%ide t)o integers( you get an integer (rounded do)n!

8on%ert one integer to a float( and you )ill be @H

NLP Programming Tutorial 1 Unigram Language Model

#hat about Un.no)n #ords$/

Sim'le ML estimation doesnIt )or.

@ften( un.no)n )ords are ignored ( S9! "etter )ay to sol%e

NLP Programming Tutorial 1 Unigram Language Model

Un.no)n #ord &Cam'le

NLP Programming Tutorial 1 Unigram Language Model

&%aluating Language Models

NLP Programming Tutorial 1 Unigram Language Model

Use training and test sets

Test Model Model ccuracy Li.elihood Log Li.elihood &ntro'y Per'leCity 16

NLP Programming Tutorial 1 Unigram Language Model

i li%e in nara i am a student my classes are hard

P()*<i li%e in nara<;M! * P()*<i am a student<;M! * P()*<my classes are hard<;M! *

+0D+211-+1 -0,3211-14 +01D211--,

NLP Programming Tutorial 1 Unigram Language Model

log P ( W testM )= w W log P ( wM )

i li%e in nara i am a student my classes are hard

log P()*<i li%e in nara<;M! * log P()*<i am a student<;M! *

log P()*<my classes are hard<;M! * ---0G5

NLP Programming Tutorial 1 Unigram Language Model

PythonIs math 'ac.age has a function for logs

$ ./my-program.py 4.60517018599 2.0

NLP Programming Tutorial 1 Unigram Language Model

&ntro'y B is a%erage negati%e log+ li.elihood 'er )ord

log+ P()*<i li%e in nara<;M!* log+ P()*<i am a student<;M!*

log+ P()*<my classes are hard<;M!* 11103, N of )ords*

NLP Programming Tutorial 1 Unigram Language Model

&ntro'y and 8om'ression

a bird a cat a dog a >?s6

NLP Programming Tutorial 1 Unigram Language Model

P(;#; * -( )1<s'eech<( )+<recognition<( )-<system<!

P(nara;>s6 i li%e in! * 1?1 * 1 P(#>s6 i li%e in nara 0 >?s6! 1

P()<i li%e in nara<;M! P()<i am a student<;M! P()<my classes are hard<;M!

log P()<i li%e in nara<;M! log P()<i am a student<;M!

log P()<my classes are hard<;M! ---0G5

log+ P()<i li%e in nara<;M! log+ P()<i am a student<;M!

log+ P()<my classes are hard<;M! 11103, N of )ords*