0% found this document useful (0 votes)
4 views

NLP 5th unit

N-grams are contiguous sequences of 'n' items, commonly used in natural language processing, where 'n' can represent unigrams, bigrams, trigrams, etc. The document explains how n-gram models estimate the probability of a word based on the preceding words, utilizing techniques like Maximum Likelihood Estimation and Bayesian Estimation for parameter estimation. It also provides examples of generating n-grams using Python's NLTK library.

Uploaded by

frag3676
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

NLP 5th unit

N-grams are contiguous sequences of 'n' items, commonly used in natural language processing, where 'n' can represent unigrams, bigrams, trigrams, etc. The document explains how n-gram models estimate the probability of a word based on the preceding words, utilizing techniques like Maximum Likelihood Estimation and Bayesian Estimation for parameter estimation. It also provides examples of generating n-grams using Python's NLTK library.

Uploaded by

frag3676
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

N-grams

 N-grams are contiguous sequences of ’n’


items, typically words in the context of
NLP.
 Theseitems can be characters, words, or
even syllables, depending on the
granularity desired. The value of ’n’
determines the order of the N-gram.
N-grams Example

Examples:
Unigrams (1-grams): Single words, e.g., “cat,” “dog.”
Bigrams (2-grams): Pairs of consecutive words, e.g., “natural
language,” “deep learning.”
Trigrams (3-grams): Triplets of consecutive words, e.g., “machine
learning model,” “data science approach.”
4-grams, 5-grams, etc.: Sequences of four, five, or more consecutive
words.
N-grams
N-grams Formula

P(w1,w2,...,wn)≈i=1∏nP(wi∣wi−(n−1),...,
wi−1)
wi: The current word at position iii
N-grams

 The n in n-grams specify the size of a number of items


to consider, unigram for n =1, bigram for n = 2, and
trigram for n = 3, and so on.

 n-gram models are a specific type of language model


that rely on how frequently sequences of tokens
(words or characters) occur in a text.
bigrams

"The cat eats fish.", the pairs of consecutive words


are:
(The, cat)
(cat, eats)
(eats, fish)
(fish, .)
These are all 2-token sequences, hence bigrams.
trigrams

n the trigrams example, the 3-token sequences from the


same sentence are:
(The, cat, eats)
(cat, eats, fish)
(eats, fish, .)
These are all 3-token sequences, hence trigrams.
N-gram model

 This formula estimates the probability of a word


wnw_nwn, given all the previous words in the
sequence.
 But since it's impractical to consider all previous
words (especially for long sequences),
 n-gram models approximate this probability
using only the last few words — specifically, the
last n−1n-1n−1 words.
N -Grma Model Example Program

import nltk
nltk.download('punkt')

from nltk import ngrams


from nltk.tokenize import word_tokenize

# Example sentence
sentence = "N-grams enhance language processing tasks."
N -Grma Model Example Program

 # Tokenize the sentence


 tokens = word_tokenize(sentence)

 # Generate bigrams
 bigrams = list(ngrams(tokens, 2))

 # Generate trigrams
 trigrams = list(ngrams(tokens, 3))
Example Program

# Print the results


print("Bigrams:", bigrams)
print("Trigrams:", trigrams)
Out Put : Output:
Bigrams: [('N-grams', 'enhance'), ('enhance', 'language'), ('language',
'processing'), ('processing', 'tasks'), ('tasks', '.')]
Trigrams: [('N-grams', 'enhance', 'language'), ('enhance', 'language',
'processing'), ('language', 'processing', 'tasks'), ('processing', 'tasks', '.')]
'''
Language modeling

 Language modeling (LM) is the use of various statistical and probabilistic


techniques to determine the probability of a given sequence of words occurring
in a sentence.

 Example: The sentence "I am going to school" is more probable than "School
going I to am".
 Given "I am going to", the model might assign:
 "school" → 0.7
 "market" → 0.2
 "banana" → 0.0
Parameter estimation

 Parameter estimation is the process of finding the best


values for these "knobs" (parameters) based on the data
the model is trained on.
 The goal is to adjust the parameters so that the model
can accurately perform the desired NLP task (e.g.,
predicting the next word, classifying text, translating
languages).
Example :parameter Estimation
Maximum Likelihood Estimation
 Maximum Likelihood Estimation (MLE) is a key method in statistical
modeling, used to estimate parameters by finding the best fit to the observed
data

 Maximum Likelihood Estimation (MLE) is a method used to find the


values of model parameters that make the observed data most probable.
 It works by maximizing a likelihood function, which tells us how likely it is to
observe the data, given different parameter values
Likelihood Estimation

 Likelihood Estimation is a statistical method used to


estimate the parameters of a probability distribution or a
statistical model based on observed data.
 Unlike traditional estimation methods that focus on
finding the "best-fitting" parameters, likelihood
estimation frames the problem in terms of the likelihood
function
Example :Maximum Likelihood
Estimation
Example: Estimating the Probability of a Coin Toss
Suppose you have a coin, and you don't know whether it's
fair (i.e., the probability of heads, P(H), might not be 0.5).
You toss the coin 10 times, and you observe:
7 Heads
3 Tails
Your goal is to estimate the probability of heads (θ) using
Maximum Likelihood Estimation.
Example
Example 2: parameter Estimation

 Bayesian Estimation is a method of statistical inference in which


we estimate unknown parameters by combining:
 Prior beliefs (what we assume or know before seeing the data),
and
 Observed data (evidence)
 using Bayes' Theorem to calculate a posterior probability
distribution for the parameter
Bayesian Formula

You might also like