0% found this document useful (0 votes)

115 views

NLTK Cheatsheet

Algoritmos de seleção de texto

Uploaded by

Jesse James Rodrigues Freire

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views

NLTK Cheatsheet

Algoritmos de seleção de texto

Uploaded by

Jesse James Rodrigues Freire

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Python

Comprehensive Guide to NLTK Framework

The Natural Language Toolkit (NLTK) is a powerful Python library used for working with human language
data (text). It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet,
along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing,
and semantic reasoning. NLTK is widely used in research and industry for tasks such as text analysis,
sentiment analysis, and natural language processing (NLP).

Cheat Sheet of Useful NLTK Methods

Method Name Definition

nltk.download Download NLTK datasets and models

word_tokenize Tokenize a string to words

sent_tokenize Tokenize a string to sentences

pos_tag Part-of-speech tagging for tokens

ne_chunk Named Entity Recognition

FreqDist Frequency distribution of words

ConditionalFreqDist Conditional frequency distribution of words

Text Create a Text object for analysis

concordance Find concordance of words in text

similar Find words similar to a given word

common_contexts Find common contexts shared by two words

dispersion_plot Display a lexical dispersion plot

generate Generate random text based on a language model

bigrams Generate bigrams from text

ngrams Generate n-grams from text

Method Name Definition

wordnet.synsets Get synsets for a word using WordNet

wordnet.lemmas Get lemmas for a synset using WordNet

wordnet.synset Get a specific synset using WordNet

wordnet.morphy Find the base form of a word using WordNet

wordnet.synset.lemma_names Get lemma names of a synset

wordnet.synset.definition Get the definition of a synset

wordnet.synset.examples Get example sentences for a synset

wordnet.synset.hypernyms Get hypernyms of a synset

wordnet.synset.hyponyms Get hyponyms of a synset

wordnet.synset.member_holonyms Get member holonyms of a synset

wordnet.synset.part_meronyms Get part meronyms of a synset

nltk.corpus.words.words Get a list of words from the Words corpus

nltk.corpus.stopwords.words Get a list of stopwords for a language

nltk.corpus.gutenberg.raw Get raw text from the Gutenberg corpus

nltk.corpus.brown.words Get words from the Brown corpus

nltk.corpus.reuters.words Get words from the Reuters corpus

nltk.corpus.inaugural.words Get words from the Inaugural Address corpus

nltk.corpus.webtext.words Get words from the Web Text corpus

nltk.corpus.treebank.parsed_sents Get parsed sentences from the Treebank corpus

nltk.corpus.semcor.tagged_sents Get tagged sentences from the SemCor corpus

nltk.corpus.names.words Get names from the Names corpus

nltk.corpus.sentiwordnet.senti_synset Get a SentiSynset from SentiWordNet

nltk.corpus.wordnet_ic.ic Get information content from WordNet IC corpus

nltk.corpus.nps_chat.tagged_posts Get tagged posts from the NPS Chat corpus

nltk.corpus.movie_reviews.words Get words from the Movie Reviews corpus

nltk.corpus.twitter_samples.strings Get strings from the Twitter Samples corpus

nltk.classify.NaiveBayesClassifier A Naive Bayes classifier for text classification

Method Name Definition

nltk.classify.DecisionTreeClassifier A Decision Tree classifier for text classification

A part-of-speech tagger using the Averaged Perceptron

nltk.tag.PerceptronTagger
algorithm

nltk.tag.HMMTagger A Hidden Markov Model part-of-speech tagger

nltk.chunk.RegexpParser A regular expression based chunk parser

nltk.translate.bleu_score Calculate BLEU score for machine translation evaluation

nltk.stem.PorterStemmer Porter Stemmer for stemming words

nltk.stem.LancasterStemmer Lancaster Stemmer for stemming words

nltk.stem.SnowballStemmer Snowball Stemmer for stemming words

nltk.stem.WordNetLemmatizer WordNet Lemmatizer for lemmatizing words

Detailed Explanation and Usage of Each Method

nltk.download

This method is used to download the necessary NLTK datasets and models. It's essential for setting up
your environment with the required resources.

import nltk

# Download the NLTK datasets and models

nltk.download()

word_tokenize
Tokenizes a given string into individual words.

from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data."
tokens = word_tokenize(text)
print(tokens)
sent_tokenize

Tokenizes a given string into sentences.

from nltk.tokenize import sent_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data. I
sentences = sent_tokenize(text)
print(sentences)

pos_tag

Performs part-of-speech tagging for a list of tokens.

from nltk import pos_tag

from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data."
tokens = word_tokenize(text)
tagged_tokens = pos_tag(tokens)
print(tagged_tokens)

ne_chunk
Performs named entity recognition on a list of tagged tokens.

from nltk import ne_chunk, pos_tag

from nltk.tokenize import word_tokenize

text = "Barack Obama was born in Hawaii."

tokens = word_tokenize(text)
tagged_tokens = pos_tag(tokens)
named_entities = ne_chunk(tagged_tokens)
print(named_entities)

FreqDist

Calculates the frequency distribution of words in a text.

from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data. I
tokens = word_tokenize(text)
fdist = FreqDist(tokens)
print(fdist.most_common(5))

ConditionalFreqDist

Calculates the conditional frequency distribution of words.

from nltk.probability import ConditionalFreqDist

from nltk.corpus import brown

cfd = ConditionalFreqDist(
(genre, word)
for genre in brown.categories()
for word in brown.words(categories=genre)
)

print(cfd['news'].most_common(10))

Text

Creates a Text object for text analysis.

from nltk.text import Text

from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data."
tokens = word_tokenize(text)
text_obj = Text(tokens)
text_obj.concordance('NLTK')

concordance

Finds the concordance of a word in a text.

from nltk.text import Text
from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data. N
tokens = word_tokenize(text)
text_obj = Text(tokens)
text_obj.concordance('NLTK')

similar

Finds words similar to a given word.

from nltk.text import Text

from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data. N
tokens = word_tokenize(text)
text_obj = Text(tokens)
text_obj.similar('NLTK')

common_contexts

Find

s common contexts shared by two words.

from nltk.text import Text

from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data. N
tokens = word_tokenize(text)
text_obj = Text(tokens)
text_obj.common_contexts(['NLTK', 'platform'])

dispersion_plot

Displays a lexical dispersion plot.

from nltk.draw.dispersion import dispersion_plot
from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data. N
tokens = word_tokenize(text)
dispersion_plot(tokens, ['NLTK', 'platform'])

generate

Generates random text based on a language model.

from nltk.text import Text

from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data. N
tokens = word_tokenize(text)
text_obj = Text(tokens)
text_obj.generate()

bigrams
Generates bigrams from text.

from nltk import bigrams

from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data."
tokens = word_tokenize(text)
bigrams_list = list(bigrams(tokens))
print(bigrams_list)

ngrams

Generates n-grams from text.

from nltk import ngrams
from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data."
tokens = word_tokenize(text)
trigrams_list = list(ngrams(tokens, 3))
print(trigrams_list)

wordnet.synsets

Gets synsets for a word using WordNet.

from nltk.corpus import wordnet

synsets = wordnet.synsets('dog')
print(synsets)

wordnet.lemmas

Gets lemmas for a synset using WordNet.

from nltk.corpus import wordnet

synset = wordnet.synset('dog.n.01')
lemmas = synset.lemmas()
print(lemmas)

wordnet.synset

Gets a specific synset using WordNet.

from nltk.corpus import wordnet

synset = wordnet.synset('dog.n.01')
print(synset)

wordnet.morphy

Finds the base form of a word using WordNet.

from nltk.corpus import wordnet

base_form = wordnet.morphy('running')
print(base_form)

wordnet.synset.lemma_names

Gets lemma names of a synset.

from nltk.corpus import wordnet

synset = wordnet.synset('dog.n.01')
lemma_names = synset.lemma_names()
print(lemma_names)

wordnet.synset.definition

Gets the definition of a synset.

from nltk.corpus import wordnet

synset = wordnet.synset('dog.n.01')
definition = synset.definition()
print(definition)

wordnet.synset.examples

Gets example sentences for a synset.

from nltk.corpus import wordnet

synset = wordnet.synset('dog.n.01')
examples = synset.examples()
print(examples)

wordnet.synset.hypernyms

Gets hypernyms of a synset.

from nltk.corpus import wordnet

synset = wordnet.synset('dog.n.01')
hypernyms = synset.hypernyms()
print(hypernyms)

wordnet.synset.hyponyms

Gets hyponyms of a synset.

from nltk.corpus import wordnet

synset = wordnet.synset('dog.n.01')
hyponyms = synset.hyponyms()
print(hyponyms)

wordnet.synset.member_holonyms

Gets member holonyms of a synset.

from nltk.corpus import wordnet

synset = wordnet.synset('dog.n.01')
member_holonyms = synset.member_holonyms()
print(member_holonyms)

wordnet.synset.part_meronyms

Gets part meronyms of a synset.

from nltk.corpus import wordnet

synset = wordnet.synset('dog.n.01')
part_meronyms = synset.part_meronyms()
print(part_meronyms)

nltk.corpus.words.words

Gets a list of words from the Words corpus.

from nltk.corpus import words

word_list = words.words()
print(word_list[:10])

nltk.corpus.stopwords.words

Gets a list of stopwords for a language.

from nltk.corpus import stopwords

stopword_list = stopwords.words('english')
print(stopword_list)

nltk.corpus.gutenberg.raw

Gets raw text from the Gutenberg corpus.

from nltk.corpus import gutenberg

raw_text = gutenberg.raw('austen-emma.txt')
print(raw_text[:1000])

nltk.corpus.brown.words

Gets words from the Brown corpus.

from nltk.corpus import brown

brown_words = brown.words()
print(brown_words[:10])

nltk.corpus.reuters.words

Gets words from the Reuters corpus.

from nltk.corpus import reuters

reuters_words = reuters.words()
print(reuters_words[:10])
nltk.corpus.inaugural.words

Gets words from the Inaugural Address corpus.

from nltk.corpus import inaugural

inaugural_words = inaugural.words()
print(inaugural_words[:10])

nltk.corpus.webtext.words

Gets words from the Web Text corpus.

from nltk.corpus import webtext

webtext_words = webtext.words()
print(webtext_words[:10])

nltk.corpus.treebank.parsed_sents

Gets parsed sentences from the Treebank corpus.

from nltk.corpus import treebank

parsed_sents = treebank.parsed_sents()
print(parsed_sents[:1])

nltk.corpus.semcor.tagged_sents
Gets tagged sentences from the SemCor corpus.

from nltk.corpus import semcor

tagged_sents = semcor.tagged_sents()
print(tagged_sents[:1])

nltk.corpus.names.words

Gets names from the Names corpus.

from nltk.corpus import names

names_list = names.words()
print(names_list[:10])

nltk.corpus.sentiwordnet.senti_synset

Gets a SentiSynset from SentiWordNet.

from nltk.corpus import sentiwordnet as swn

senti_synset = swn.senti_synset('dog.n.01')
print(senti_synset)

nltk.corpus.wordnet_ic.ic

Gets information content from the WordNet IC corpus.

from nltk.corpus import wordnet_ic

ic = wordnet_ic.ic('ic-brown.dat')
print(ic)

nltk.corpus.nps_chat.tagged_posts

Gets tagged posts from the NPS Chat corpus.

from nltk.corpus import nps_chat

tagged_posts = nps_chat.tagged_posts()
print(tagged_posts[:1])

nltk.corpus.movie_reviews.words

Gets words from the Movie Reviews corpus.

from nltk.corpus import movie_reviews

movie_reviews_words = movie_reviews.words()
print(movie_reviews_words[:10])
nltk.corpus.twitter_samples.strings

Gets strings from the Twitter Samples corpus.

from nltk.corpus import twitter_samples

tweets = twitter_samples.strings()
print(tweets[:1])

nltk.classify.NaiveBayesClassifier

A Naive Bayes classifier for text classification.

from nltk.classify import NaiveBayesClassifier

from nltk.corpus import movie_reviews
import random

documents = [(list(movie_reviews.words(fileid)), category)

for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]

random.shuffle(documents)

all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())

word_features = list(all_words)[:2000]

def document_features(document):
document_words = set(document)
features = {}
for word in word_features:
features['contains({})'.format(word)] = (word in document_words)
return features

featuresets = [(document_features(d), c) for (d, c) in documents]

train_set, test_set = featuresets[100:], featuresets[:100]
classifier = NaiveBayesClassifier.train(train_set)

print(nltk.classify.accuracy(classifier, test_set))
classifier.show_most_informative_features(5)

nltk.classify.DecisionTreeClassifier
A Decision Tree classifier for text classification.
from nltk.classify import DecisionTreeClassifier
from nltk.corpus import movie_reviews
import random

documents = [(list(movie_reviews.words(fileid

)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]

random.shuffle(documents)

all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())

word_features = list(all_words)[:2000]

def document_features(document):
document_words = set(document)
features = {}
for word in word_features:
features['contains({})'.format(word)] = (word in document_words)
return features

featuresets = [(document_features(d), c) for (d, c) in documents]

train_set, test_set = featuresets[100:], featuresets[:100]
classifier = DecisionTreeClassifier.train(train_set)

print(nltk.classify.accuracy(classifier, test_set))

nltk.tag.PerceptronTagger

A part-of-speech tagger using the Averaged Perceptron algorithm.

from nltk.tag import PerceptronTagger

from nltk.tokenize import word_tokenize

tagger = PerceptronTagger()
text = "NLTK is a leading platform for building Python programs to work with human language data."
tokens = word_tokenize(text)
tagged_tokens = tagger.tag(tokens)
print(tagged_tokens)

nltk.tag.HMMTagger

A Hidden Markov Model part-of-speech tagger.

from nltk.tag import hmm
from nltk.corpus import treebank

trainer = hmm.HiddenMarkovModelTrainer()
tagged_sents = treebank.tagged_sents()
tagger = trainer.train(tagged_sents)

text = "NLTK is a leading platform for building Python programs to work with human language data."
tokens = word_tokenize(text)
tagged_tokens = tagger.tag(tokens)
print(tagged_tokens)

nltk.chunk.RegexpParser

A regular expression based chunk parser.

from nltk.chunk import RegexpParser

from nltk.tokenize import word_tokenize
from nltk import pos_tag

text = "The quick brown fox jumps over the lazy dog."
tokens = word_tokenize(text)
tagged_tokens = pos_tag(tokens)

grammar = "NP: {<DT>?<JJ>*<NN>}"

chunk_parser = RegexpParser(grammar)
tree = chunk_parser.parse(tagged_tokens)
print(tree)

nltk.translate.bleu_score

Calculates BLEU score for machine translation evaluation.

from nltk.translate.bleu_score import sentence_bleu

reference = [['this', 'is', 'a', 'test']]

candidate = ['this', 'is', 'test']
score = sentence_bleu(reference, candidate)
print(score)

nltk.stem.PorterStemmer

Porter Stemmer for stemming words.

from nltk.stem import PorterStemmer

stemmer = PorterStemmer()
word = "running"
stemmed_word = stemmer.stem(word)
print(stemmed_word)

nltk.stem.LancasterStemmer

Lancaster Stemmer for stemming words.

from nltk.stem import LancasterStemmer

stemmer = LancasterStemmer()
word = "running"
stemmed_word = stemmer.stem(word)
print(stemmed_word)

nltk.stem.SnowballStemmer

Snowball Stemmer for stemming words.

from nltk.stem import SnowballStemmer

stemmer = SnowballStemmer("english")
word = "running"
stemmed_word = stemmer.stem(word)
print(stemmed_word)

nltk.stem.WordNetLemmatizer
WordNet Lemmatizer for lemmatizing words.

from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
word = "running"
lemmatized_word = lemmatizer.lemmatize(word, pos='v')
print(lemmatized_word)
Code Examples Using NLTK for Useful Use Cases

1. Sentiment Analysis Using Naive Bayes Classifier

import nltk
from nltk.corpus import movie_reviews
import random
from nltk.classify import NaiveBayesClassifier
from nltk.classify.util import accuracy as nltk_accuracy

# Load the dataset

documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)

# Define a feature extractor function

all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
word_features = list(all_words)[:2000]

def document_features(document):
document_words = set(document)
features = {}
for word in word_features:
features['contains({})'.format(word)] = (word in document_words)
return features

# Create feature sets

featuresets = [(document_features(d), c) for (d, c) in documents]
train_set, test_set = featuresets[100:], featuresets[:100]

# Train a Naive Bayes classifier

classifier = NaiveBayesClassifier.train(train_set)

# Evaluate the classifier

print('Accuracy:', nltk_accuracy(classifier, test_set))

# Show the most informative features

classifier.show_most_informative_features(5)
2. Named Entity Recognition

import nltk
from nltk import ne_chunk, pos_tag
from nltk.tokenize import word_tokenize

text = "Barack Obama was born in Hawaii. He was elected president in 2008."
tokens = word_tokenize(text)
tagged_tokens = pos_tag(tokens)
named_entities = ne_chunk(tagged_tokens)
print(named_entities)

3. Part-of-Speech Tagging with PerceptronTagger

import nltk
from nltk.tag import PerceptronTagger
from nltk.tokenize import word_tokenize

import nltk
from nltk.tokenize import word_tokenize
from nltk.lm import MLE
from nltk.lm.preprocessing import padded_everygram_pipeline

# Sample text
text = "NLTK is a leading platform for building Python programs to work with human language data. I

# Tokenize the text

tokens = word_tokenize(text.lower())

# Prepare the data for language modeling

n = 3
train_data, padded_vocab = padded_everygram_pipeline(n, [tokens])

# Train the language model

model = MLE(n)
model.fit(train_data, padded_vocab)

# Generate text
context = ('nltk', 'is')
print('Generated text:', ' '.join(model.generate(10, text_seed=context)))

5. Synonym Extraction Using WordNet

import nltk
from nltk.corpus import wordnet

word = "happy"
synonyms = []
for syn in wordnet.synsets(word):
for lemma in syn.lemmas():
synonyms.append(lemma.name())

print(set(synonyms))
6. Word Sense Disambiguation

import nltk
from nltk.corpus import wordnet
from nltk.wsd import lesk

sentence = "I went to the bank to deposit money"

ambiguous_word = "bank"
context = sentence.split()

# Get the correct sense

sense = lesk(context, ambiguous_word)
print(sense, sense.definition())

7. Chunking Using Regular Expressions

import nltk
from nltk.chunk import RegexpParser
from nltk.tokenize import word_tokenize
from nltk import pos_tag

text = "The quick brown fox jumps over the lazy dog."
tokens = word_tokenize(text)
tagged_tokens = pos_tag(tokens)

grammar = "NP: {<DT>?<JJ>*<NN>}"

chunk_parser = RegexpParser(grammar)
tree = chunk_parser.parse(tagged_tokens)
print(tree)

8. Calculating BLEU Score

import nltk
from nltk.translate.bleu_score import sentence_bleu

reference = [['this', 'is', 'a', 'test']]

candidate = ['this', 'is', 'a', 'test']
score = sentence_bleu(reference, candidate)
print(score)
9. Text Classification Using DecisionTreeClassifier

import nltk
from nltk.classify import DecisionTreeClassifier
from nltk.corpus import movie_reviews
import random

documents = [(list(movie_reviews.words(fileid)), category)

for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]

random.shuffle(documents)

all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())

word_features = list(all_words)[:2000]

def document_features(document):
document_words = set(document)
features = {}
for word in word_features:
features['contains({})'.format(word)] = (word in document_words)
return features

featuresets = [(document_features(d), c) for (d, c) in documents]

train_set, test_set = featuresets[100:], featuresets[:100]
classifier = DecisionTreeClassifier.train(train_set)

print(nltk.classify.accuracy(classifier, test_set))

10. Stemming with PorterStemmer

import nltk
from nltk.stem import PorterStemmer

stemmer = PorterStemmer()
words = ["running", "jumps", "easily", "fairly"]
stemmed

_words = [stemmer.stem(word) for word in words]

print(stemmed_words)
11. Lemmatization with WordNetLemmatizer

import nltk
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
words = ["running", "jumps", "easily", "fairly"]
lemmatized_words = [lemmatizer.lemmatize(word, pos='v') for word in words]
print(lemmatized_words)

12. Frequency Distribution of Words

import nltk
from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data. I
tokens = word_tokenize(text)
fdist = FreqDist(tokens)
print(fdist.most_common(5))

13. Conditional Frequency Distribution

import nltk
from nltk.probability import ConditionalFreqDist
from nltk.corpus import brown

cfd = ConditionalFreqDist(
(genre, word)
for genre in brown.categories()
for word in brown.words(categories=genre)
)

print(cfd['news'].most_common(10))
14. Finding Synonyms with WordNet

import nltk
from nltk.corpus import wordnet

word = "good"
synonyms = []
for syn in wordnet.synsets(word):
for lemma in syn.lemmas():
synonyms.append(lemma.name())

print(set(synonyms))

15. Part-of-Speech Tagging

import nltk
from nltk import pos_tag
from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data."
tokens = word_tokenize(text)
tagged_tokens = pos_tag(tokens)
print(tagged_tokens)

16. Named Entity Recognition

import nltk
from nltk import ne_chunk, pos_tag
from nltk.tokenize import word_tokenize

text = "Barack Obama was born in Hawaii."

tokens = word_tokenize(text)
tagged_tokens = pos_tag(tokens)
named_entities = ne_chunk(tagged_tokens)
print(named_entities)
17. Concordance

import nltk
from nltk.text import Text
from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data. N
tokens = word_tokenize(text)
text_obj = Text(tokens)
text_obj.concordance('NLTK')

18. Word Similarity

import nltk
from nltk.text import Text
from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data. N
tokens = word_tokenize(text)
text_obj = Text(tokens)
text_obj.similar('NLTK')

19. Common Contexts

import nltk
from nltk.text import Text
from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data. N
tokens = word_tokenize(text)
text_obj = Text(tokens)
text_obj.common_contexts(['NLTK', 'platform'])
20. Dispersion Plot

import nltk
from nltk.draw.dispersion import dispersion_plot
from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data. N
tokens = word_tokenize(text)
dispersion_plot(tokens, ['NLTK', 'platform'])

Follow me on

Sumit Khanna for more updates

Feasts For The Gods
100% (7)
Feasts For The Gods
79 pages
Semi-Detailed Lesson Plan (ENG and TECH)
100% (1)
Semi-Detailed Lesson Plan (ENG and TECH)
7 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
Natural Language Processing With Python's NLTK Package – Real Python
No ratings yet
Natural Language Processing With Python's NLTK Package – Real Python
27 pages
NLTK: The Natural Language Toolkit: Steven Bird Edward Loper
No ratings yet
NLTK: The Natural Language Toolkit: Steven Bird Edward Loper
4 pages
NLTK Documentation: Release 3.2.5
No ratings yet
NLTK Documentation: Release 3.2.5
87 pages
NLTK
No ratings yet
NLTK
16 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
No ratings yet
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
37 pages
UNIT-V-NLP Using NLTK
No ratings yet
UNIT-V-NLP Using NLTK
19 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
UBC Summer School in NLP - VSP 2019 Lecture 10
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 10
33 pages
NLP Notes and Related Questions
No ratings yet
NLP Notes and Related Questions
7 pages
AI Zone: Log in Sign Up
No ratings yet
AI Zone: Log in Sign Up
24 pages
NLTK
No ratings yet
NLTK
4 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Text Preprocessing Stages
No ratings yet
Text Preprocessing Stages
8 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
33 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
NLP Practicals
No ratings yet
NLP Practicals
6 pages
NLP FinAL (1)
No ratings yet
NLP FinAL (1)
27 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Language Engineering - Section
No ratings yet
Language Engineering - Section
24 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
NLP Practicals All
No ratings yet
NLP Practicals All
57 pages
Minorproject Ishant
No ratings yet
Minorproject Ishant
18 pages
NLP
No ratings yet
NLP
9 pages
AP for NLP-LO1
No ratings yet
AP for NLP-LO1
61 pages
NLP Record
No ratings yet
NLP Record
6 pages
NLP Exercises
No ratings yet
NLP Exercises
2 pages
Introduction To Natural Language Processing and NLTK
No ratings yet
Introduction To Natural Language Processing and NLTK
23 pages
Natural Language Processing manual
No ratings yet
Natural Language Processing manual
39 pages
NLP (1)
No ratings yet
NLP (1)
12 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
No ratings yet
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
37 pages
CCS369-LAB EX 3,4,5
No ratings yet
CCS369-LAB EX 3,4,5
8 pages
Text Analytics Basics
No ratings yet
Text Analytics Basics
28 pages
p4
No ratings yet
p4
10 pages
Synonym or Similar Word Detection in Assignment Papers: Gayatri Behera
No ratings yet
Synonym or Similar Word Detection in Assignment Papers: Gayatri Behera
2 pages
NLP Steps Basic
No ratings yet
NLP Steps Basic
26 pages
AP for NLP-Word 2 Vec
No ratings yet
AP for NLP-Word 2 Vec
33 pages
NLP Programs
No ratings yet
NLP Programs
5 pages
NLP Programming
No ratings yet
NLP Programming
39 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
NLP Using Python
No ratings yet
NLP Using Python
50 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
TPLS, 09
No ratings yet
TPLS, 09
9 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
LP Vi Manual
No ratings yet
LP Vi Manual
77 pages
NLP Record
No ratings yet
NLP Record
15 pages
Lab Prgms Weel1-Output
No ratings yet
Lab Prgms Weel1-Output
4 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
Text Analysis With NLTK Cheatsheet
No ratings yet
Text Analysis With NLTK Cheatsheet
3 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
2.2 - Basic NLP Tasks With NLTK
No ratings yet
2.2 - Basic NLP Tasks With NLTK
12 pages
Unraveling The Power of Natural Language Processing
No ratings yet
Unraveling The Power of Natural Language Processing
11 pages
NLP PREP
No ratings yet
NLP PREP
14 pages
AIML_P4
No ratings yet
AIML_P4
12 pages
Implementing Domain-Specific Languages with Xtext and Xtend - Second Edition
From Everand
Implementing Domain-Specific Languages with Xtext and Xtend - Second Edition
Lorenzo Bettini
4/5 (1)
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Proud of You (GP)
No ratings yet
Proud of You (GP)
3 pages
Cuyunon People - Wikipedia
No ratings yet
Cuyunon People - Wikipedia
9 pages
Years 9 - 10 - Conversational Vosa VakaViti and Fiji Hindi
No ratings yet
Years 9 - 10 - Conversational Vosa VakaViti and Fiji Hindi
92 pages
Pud 3 4TH Egb
No ratings yet
Pud 3 4TH Egb
4 pages
11
0% (2)
11
21 pages
TN ThePearl
No ratings yet
TN ThePearl
3 pages
Lesson 3 - Non Verbal Communication
No ratings yet
Lesson 3 - Non Verbal Communication
47 pages
Word Formation 2 ANSWER
No ratings yet
Word Formation 2 ANSWER
2 pages
CS403 Short Notes
No ratings yet
CS403 Short Notes
4 pages
Vâlcea BT B5-37-82
No ratings yet
Vâlcea BT B5-37-82
46 pages
Task 2
No ratings yet
Task 2
8 pages
giáo án tiếng anh 6 unit 2
No ratings yet
giáo án tiếng anh 6 unit 2
111 pages
Haugen, E. (1966). Linguistics and Language Planning
No ratings yet
Haugen, E. (1966). Linguistics and Language Planning
24 pages
Practice Makes Perfect Bab 3
No ratings yet
Practice Makes Perfect Bab 3
9 pages
Verbos Modales
No ratings yet
Verbos Modales
5 pages
M English FAL P2 Sept 2024 (QP)
No ratings yet
M English FAL P2 Sept 2024 (QP)
28 pages
Basa Pilipinas Support To Deped'S Mtb-Mle Policy Reform: March 7, 2018
No ratings yet
Basa Pilipinas Support To Deped'S Mtb-Mle Policy Reform: March 7, 2018
17 pages
Full Download (Ebook) Logic: The Basics by Jc Beall, Shay Allen Logan ISBN 9781138852266, 9781138852273, 9781315723655, 9781786843135, 9781317528616, 1138852260, 1138852279, 1315723654, 1786843137 PDF DOCX
100% (6)
Full Download (Ebook) Logic: The Basics by Jc Beall, Shay Allen Logan ISBN 9781138852266, 9781138852273, 9781315723655, 9781786843135, 9781317528616, 1138852260, 1138852279, 1315723654, 1786843137 PDF DOCX
57 pages
Curriculum and Material Development: The Separate Purpose of A Curriculum and A Syllabus
No ratings yet
Curriculum and Material Development: The Separate Purpose of A Curriculum and A Syllabus
5 pages
Characteristics of Modernism
No ratings yet
Characteristics of Modernism
17 pages
10th English Tenses
No ratings yet
10th English Tenses
40 pages
Yule Pragmatics Notes
100% (1)
Yule Pragmatics Notes
16 pages
COMPLANT
No ratings yet
COMPLANT
8 pages
Look at The Pictures and Write Sentences, Using Future Continuous
No ratings yet
Look at The Pictures and Write Sentences, Using Future Continuous
2 pages
Children_s illusrated Dictionary-67
No ratings yet
Children_s illusrated Dictionary-67
1 page
Spanish Lesson 17 (Homework)
No ratings yet
Spanish Lesson 17 (Homework)
6 pages
Journal 1
No ratings yet
Journal 1
1 page
TSLB3523 Tutorial W1&2 Yenhan Alya Parvin Durga Trisha
No ratings yet
TSLB3523 Tutorial W1&2 Yenhan Alya Parvin Durga Trisha
4 pages