NLP 5th unit
NLP 5th unit
Examples:
Unigrams (1-grams): Single words, e.g., “cat,” “dog.”
Bigrams (2-grams): Pairs of consecutive words, e.g., “natural
language,” “deep learning.”
Trigrams (3-grams): Triplets of consecutive words, e.g., “machine
learning model,” “data science approach.”
4-grams, 5-grams, etc.: Sequences of four, five, or more consecutive
words.
N-grams
N-grams Formula
P(w1,w2,...,wn)≈i=1∏nP(wi∣wi−(n−1),...,
wi−1)
wi: The current word at position iii
N-grams
import nltk
nltk.download('punkt')
# Example sentence
sentence = "N-grams enhance language processing tasks."
N -Grma Model Example Program
# Generate bigrams
bigrams = list(ngrams(tokens, 2))
# Generate trigrams
trigrams = list(ngrams(tokens, 3))
Example Program
Example: The sentence "I am going to school" is more probable than "School
going I to am".
Given "I am going to", the model might assign:
"school" → 0.7
"market" → 0.2
"banana" → 0.0
Parameter estimation