0% found this document useful (0 votes)
33 views2 pages

N Gram Model

This document discusses building an n-gram model in Python for natural language processing. It imports NLTK libraries, tokenizes a sample text into words, creates an n-gram dictionary to store 3-grams and their following words, and then uses the dictionary to generate new text by randomly selecting the next word based on the previous 3-gram. The result demonstrates the model generating additional text in a similar style to the original sample.

Uploaded by

Premjit Sengupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views2 pages

N Gram Model

This document discusses building an n-gram model in Python for natural language processing. It imports NLTK libraries, tokenizes a sample text into words, creates an n-gram dictionary to store 3-grams and their following words, and then uses the dictionary to generate new text by randomly selecting the next word based on the previous 3-gram. The result demonstrates the model generating additional text in a similar style to the original sample.

Uploaded by

Premjit Sengupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

4/17/2020 Untitled12 - Jupyter Notebook

In [8]:

# Natural Language Processing using Python

# N-Gram Modelling - Character Grams


# Importing libraries
import random
import nltk

In [10]:

text = """Global warming or climate change has become a worldwide concern. It is gradually

In [11]:

n = 3

In [12]:

ngrams = {}

In [13]:

# Building the model


words = nltk.word_tokenize(text)
for i in range(len(words)-n):
gram = ' '.join(words[i:i+n])
if gram not in ngrams.keys():
ngrams[gram] = []
ngrams[gram].append(words[i+n])

In [14]:

# Testing the model


currentGram = ' '.join(words[0:n])
result = currentGram
for i in range(30):
if currentGram not in ngrams.keys():
break
possibilities = ngrams[currentGram]
nextItem = possibilities[random.randrange(len(possibilities))]
result += ' '+nextItem
rWords = nltk.word_tokenize(result)
currentGram = ' '.join(rWords[len(rWords)-n:len(rWords)])

print(result)

Global warming or climate change has become a worldwide concern . It is grad


ually developing into an unprecedented environmental crisis evident in melti
ng glaciers , changing weather patterns , rising sea levels ,

127.0.0.1:8888/notebooks/Untitled12.ipynb?kernel_name=python3 1/2
4/17/2020 Untitled12 - Jupyter Notebook

In [ ]:

127.0.0.1:8888/notebooks/Untitled12.ipynb?kernel_name=python3 2/2

You might also like