0% found this document useful (0 votes)

4 views

NLP 5th unit

N-grams are contiguous sequences of 'n' items, commonly used in natural language processing, where 'n' can represent unigrams, bigrams, trigrams, etc. The document explains how n-gram models estimate the probability of a word based on the preceding words, utilizing techniques like Maximum Likelihood Estimation and Bayesian Estimation for parameter estimation. It also provides examples of generating n-grams using Python's NLTK library.

Uploaded by

frag3676

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

NLP 5th unit

Uploaded by

frag3676

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

N-grams

 N-grams are contiguous sequences of ’n’

items, typically words in the context of
NLP.
 Theseitems can be characters, words, or
even syllables, depending on the
granularity desired. The value of ’n’
determines the order of the N-gram.
N-grams Example

Examples:
Unigrams (1-grams): Single words, e.g., “cat,” “dog.”
Bigrams (2-grams): Pairs of consecutive words, e.g., “natural
language,” “deep learning.”
Trigrams (3-grams): Triplets of consecutive words, e.g., “machine
learning model,” “data science approach.”
4-grams, 5-grams, etc.: Sequences of four, five, or more consecutive
words.
N-grams
N-grams Formula

P(w1,w2,...,wn)≈i=1∏nP(wi∣wi−(n−1),...,
wi−1)
wi: The current word at position iii
N-grams

 The n in n-grams specify the size of a number of items

to consider, unigram for n =1, bigram for n = 2, and
trigram for n = 3, and so on.

 n-gram models are a specific type of language model

that rely on how frequently sequences of tokens
(words or characters) occur in a text.
bigrams

"The cat eats fish.", the pairs of consecutive words

are:
(The, cat)
(cat, eats)
(eats, fish)
(fish, .)
These are all 2-token sequences, hence bigrams.
trigrams

n the trigrams example, the 3-token sequences from the

same sentence are:
(The, cat, eats)
(cat, eats, fish)
(eats, fish, .)
These are all 3-token sequences, hence trigrams.
N-gram model

 This formula estimates the probability of a word

wnw_nwn, given all the previous words in the
sequence.
 But since it's impractical to consider all previous
words (especially for long sequences),
 n-gram models approximate this probability
using only the last few words — specifically, the
last n−1n-1n−1 words.
N -Grma Model Example Program

import nltk
nltk.download('punkt')

from nltk import ngrams

from nltk.tokenize import word_tokenize

# Example sentence
sentence = "N-grams enhance language processing tasks."
N -Grma Model Example Program

 # Tokenize the sentence

 tokens = word_tokenize(sentence)

 # Generate bigrams
 bigrams = list(ngrams(tokens, 2))

 # Generate trigrams
 trigrams = list(ngrams(tokens, 3))
Example Program

# Print the results

print("Bigrams:", bigrams)
print("Trigrams:", trigrams)
Out Put : Output:
Bigrams: [('N-grams', 'enhance'), ('enhance', 'language'), ('language',
'processing'), ('processing', 'tasks'), ('tasks', '.')]
Trigrams: [('N-grams', 'enhance', 'language'), ('enhance', 'language',
'processing'), ('language', 'processing', 'tasks'), ('processing', 'tasks', '.')]
'''
Language modeling

 Language modeling (LM) is the use of various statistical and probabilistic

techniques to determine the probability of a given sequence of words occurring
in a sentence.

 Example: The sentence "I am going to school" is more probable than "School
going I to am".
 Given "I am going to", the model might assign:
 "school" → 0.7
 "market" → 0.2
 "banana" → 0.0
Parameter estimation

 Parameter estimation is the process of finding the best

values for these "knobs" (parameters) based on the data
the model is trained on.
 The goal is to adjust the parameters so that the model
can accurately perform the desired NLP task (e.g.,
predicting the next word, classifying text, translating
languages).
Example :parameter Estimation
Maximum Likelihood Estimation
 Maximum Likelihood Estimation (MLE) is a key method in statistical
modeling, used to estimate parameters by finding the best fit to the observed
data

 Maximum Likelihood Estimation (MLE) is a method used to find the

values of model parameters that make the observed data most probable.
 It works by maximizing a likelihood function, which tells us how likely it is to
observe the data, given different parameter values
Likelihood Estimation

 Likelihood Estimation is a statistical method used to

estimate the parameters of a probability distribution or a
statistical model based on observed data.
 Unlike traditional estimation methods that focus on
finding the "best-fitting" parameters, likelihood
estimation frames the problem in terms of the likelihood
function
Example :Maximum Likelihood
Estimation
Example: Estimating the Probability of a Coin Toss
Suppose you have a coin, and you don't know whether it's
fair (i.e., the probability of heads, P(H), might not be 0.5).
You toss the coin 10 times, and you observe:
7 Heads
3 Tails
Your goal is to estimate the probability of heads (θ) using
Maximum Likelihood Estimation.
Example
Example 2: parameter Estimation

 Bayesian Estimation is a method of statistical inference in which

we estimate unknown parameters by combining:
 Prior beliefs (what we assume or know before seeing the data),
and
 Observed data (evidence)
 using Bayes' Theorem to calculate a posterior probability
distribution for the parameter
Bayesian Formula

Flipped Class: Shedding Light On The Confusion, Critique, and Hype
No ratings yet
Flipped Class: Shedding Light On The Confusion, Critique, and Hype
3 pages
Fink Bloom Taxonomies
No ratings yet
Fink Bloom Taxonomies
5 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
6.Chapter6_LanguageModel
No ratings yet
6.Chapter6_LanguageModel
33 pages
CME4408 P5 N-grams Smooting
No ratings yet
CME4408 P5 N-grams Smooting
43 pages
NLP_Module 2(1)
No ratings yet
NLP_Module 2(1)
77 pages
Language Model PDF
No ratings yet
Language Model PDF
76 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
79 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
N Grams
No ratings yet
N Grams
51 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
Ngrams
100% (1)
Ngrams
22 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
3_LM_2024
No ratings yet
3_LM_2024
78 pages
Chapter 03-Number System
No ratings yet
Chapter 03-Number System
88 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
NLP_Unit2 (2)
No ratings yet
NLP_Unit2 (2)
65 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
08 Language Models
No ratings yet
08 Language Models
69 pages
N-Grams and Corpus Linguistics: Julia Hirschberg
No ratings yet
N-Grams and Corpus Linguistics: Julia Hirschberg
47 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
88 pages
lm24aug
No ratings yet
lm24aug
84 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
NLp
No ratings yet
NLp
12 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
module5_DS_ppt
No ratings yet
module5_DS_ppt
38 pages
Probabilistic Language Modeling Challenges
No ratings yet
Probabilistic Language Modeling Challenges
12 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
Session 2-3 Language Modeling
No ratings yet
Session 2-3 Language Modeling
69 pages
3_2
No ratings yet
3_2
26 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Language Modeling and Spelling Correction
No ratings yet
Language Modeling and Spelling Correction
97 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
NLP m2
No ratings yet
NLP m2
74 pages
NLP-UNITS-IV-V
No ratings yet
NLP-UNITS-IV-V
30 pages
Lecture 6 to 8 N-gram
No ratings yet
Lecture 6 to 8 N-gram
19 pages
NLP UNIT-V
No ratings yet
NLP UNIT-V
30 pages
04_N-gram Language Models
No ratings yet
04_N-gram Language Models
41 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
UNIT 3 Language Modelling
No ratings yet
UNIT 3 Language Modelling
15 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
2. Language Modeling
No ratings yet
2. Language Modeling
50 pages
3. n grams
No ratings yet
3. n grams
3 pages
LM
No ratings yet
LM
76 pages
Language Modelling
No ratings yet
Language Modelling
3 pages
NLP Cat 2
No ratings yet
NLP Cat 2
78 pages
NLP_Lec_11
No ratings yet
NLP_Lec_11
6 pages
KEN2570 4 LanguageModel
No ratings yet
KEN2570 4 LanguageModel
17 pages
lecture5-ngrams
No ratings yet
lecture5-ngrams
40 pages
Lecture 03
No ratings yet
Lecture 03
41 pages
Lecture_4_N_grams
No ratings yet
Lecture_4_N_grams
29 pages
02 NLP LM
No ratings yet
02 NLP LM
99 pages
5)Lecture-Feb11&13&17&18
No ratings yet
5)Lecture-Feb11&13&17&18
21 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Why Fonts Matter PDF
No ratings yet
Why Fonts Matter PDF
31 pages
Trends and Issues in Education (Comprehensive Report)
No ratings yet
Trends and Issues in Education (Comprehensive Report)
19 pages
Provide Advice To Clients
No ratings yet
Provide Advice To Clients
5 pages
Answer Key of Class-9th-Set-B
No ratings yet
Answer Key of Class-9th-Set-B
6 pages
Birte U. Forstmann (Editor), Brandon M. Turner (Editor) - An Introduction to Model-Based Cognitive Neuroscience-Springer (2024)
No ratings yet
Birte U. Forstmann (Editor), Brandon M. Turner (Editor) - An Introduction to Model-Based Cognitive Neuroscience-Springer (2024)
384 pages
(Muhammad Luthfan Akbar Kadafi) : (NIM - FTI/2019)
No ratings yet
(Muhammad Luthfan Akbar Kadafi) : (NIM - FTI/2019)
6 pages
2021 March Soalan Assignment PAD270
No ratings yet
2021 March Soalan Assignment PAD270
5 pages
Cognitive Neuroscience - Psychology - Oxford Bibliographies - Cognitive Neuroscience
No ratings yet
Cognitive Neuroscience - Psychology - Oxford Bibliographies - Cognitive Neuroscience
19 pages
Second Story Window Fluency Homework
100% (1)
Second Story Window Fluency Homework
6 pages
Paradigms New
No ratings yet
Paradigms New
22 pages
Literature Review On Eucalyptus
100% (2)
Literature Review On Eucalyptus
7 pages
Batia's
No ratings yet
Batia's
14 pages
Geography Dissertation Ideas Scotland
100% (2)
Geography Dissertation Ideas Scotland
7 pages
LEGO Education State of Classroom Engagement Report
No ratings yet
LEGO Education State of Classroom Engagement Report
12 pages
RAS 5.2 Syringe Robot Arm
No ratings yet
RAS 5.2 Syringe Robot Arm
2 pages
Lesson Plan in Tle
No ratings yet
Lesson Plan in Tle
3 pages
Yukl's Taxonomy of Behaviour
No ratings yet
Yukl's Taxonomy of Behaviour
1 page
Midterm Exam B01
No ratings yet
Midterm Exam B01
3 pages
A Rubuce Take (Huguma)
No ratings yet
A Rubuce Take (Huguma)
334 pages
Vandalism Arg Essay
100% (1)
Vandalism Arg Essay
2 pages
South Zone Kho Kho Men 2024-25- Ietter 2
No ratings yet
South Zone Kho Kho Men 2024-25- Ietter 2
3 pages
Readings in Philippine History: Module Overview: Course: Learning Objectives
No ratings yet
Readings in Philippine History: Module Overview: Course: Learning Objectives
5 pages
Oct 20 - MGT162 - Lesson Plan and Assessment
No ratings yet
Oct 20 - MGT162 - Lesson Plan and Assessment
6 pages
All Certificates ISCC System Page6
No ratings yet
All Certificates ISCC System Page6
8 pages
Aits 2324 FT V Jeem LD Offline
No ratings yet
Aits 2324 FT V Jeem LD Offline
15 pages
5405-1 Groffman Manual
No ratings yet
5405-1 Groffman Manual
12 pages
Easter Sunday Bulletin
100% (1)
Easter Sunday Bulletin
6 pages
A) Thinking Skills: C) 21 CL Activities: B) Classroom-Based Assessment: D) 21 CL Method
No ratings yet
A) Thinking Skills: C) 21 CL Activities: B) Classroom-Based Assessment: D) 21 CL Method
3 pages

NLP 5th unit

Uploaded by

NLP 5th unit

Uploaded by

N-grams

 N-grams are contiguous sequences of ’n’

 The n in n-grams specify the size of a number of items

 n-gram models are a specific type of language model

"The cat eats fish.", the pairs of consecutive words

n the trigrams example, the 3-token sequences from the

 This formula estimates the probability of a word

from nltk import ngrams

 # Tokenize the sentence

# Print the results

 Language modeling (LM) is the use of various statistical and probabilistic

 Parameter estimation is the process of finding the best

 Maximum Likelihood Estimation (MLE) is a method used to find the

 Likelihood Estimation is a statistical method used to

 Bayesian Estimation is a method of statistical inference in which

You might also like