0% found this document useful (0 votes)

5 views3 pages

7 TextAnalysis

The document contains Python code demonstrating the use of the TextBlob and NLTK libraries for text processing tasks such as spelling correction, tokenization, filtering stopwords, stemming, and lemmatization. It includes examples of how to analyze text, visualize word frequency, and perform part-of-speech tagging. Additionally, it highlights the differences between stemming and lemmatization in natural language processing.

Uploaded by

shashank

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views3 pages

7 TextAnalysis

Uploaded by

shashank

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

!

pip install textblob

!pip install nltk

from textblob import TextBlob

import nltk

b = TextBlob("I ahve good spelling")

b.correct()

import nltk
nltk.download('punkt') # will download the Punkt tokenizer models.
b1 = TextBlob("beautifull is bettter level than ugly")
b1.words

b1.sentences

b1.words[3].pluralize()

sen = TextBlob("My name name name is anthony gonsalvis main duniya mmein akela
hoon")
sen.word_counts["name"]

print(sen.parse())

sen[0:19]
#substring

b1.upper()

b1.find("ugly")
# character at which first found

apple = TextBlob("apples")
banana = TextBlob("banana")
apple > banana

b1.ngrams(n=3)
#An n-gram is a contiguous sequence of n items from a given sample of text or
speech.

import nltk
from nltk import tokenize
from nltk.tokenize import sent_tokenize
text = """ Goood day it was today in pune, I loved the weather in Pune. Pune is the
best city to live in."""
text

tokenized_text = sent_tokenize(text)

print(tokenized_text)
# splits a text into a list of sentences using an algorithm that considers
punctuation and capitalization.

from nltk.tokenize import word_tokenize

tokenizer_word = word_tokenize(text)
# splits into words delimiiter " "
print(tokenizer_word)

from nltk.probability import FreqDist

fd = FreqDist(tokenizer_word)
print(fd)

fd.most_common(4)

import matplotlib.pyplot as plt

fd.plot(30, cumulative=False)
#plots the 30 most common items in the frequency

nltk.download('stopwords')
# Stopwords are commonly used words (such as "the", "a", "an", "in", "on", etc.)
# Downloading the stopwords corpus is useful because it allows you to access a
predefined list of stopwords that you can
# use to filter out irrelevant words from your text data during preprocessing

from nltk.corpus import stopwords

st = set(stopwords.words('english'))
print(st)

filtered_sent = []
for w in tokenizer_word:
if w not in st:
filtered_sent.append(w)

print('tokenized sentence : ', tokenizer_word)

print('Filtered sentence :', filtered_sent)

from nltk.stem import PorterStemmer

from nltk.tokenize import sent_tokenize, word_tokenize
ps = PorterStemmer()
stemmed_words = []
for w in filtered_sent:
stemmed_words.append(ps.stem(w))

print("Filtered sent :", filtered_sent)

print("Stemmed sentence: ", stemmed_words)
# removing common word endings to reduce words to their base or root form.
# eg running -> run
# runs -> run
# ran -> ran

nltk.download("wordnet")

from nltk.stem.wordnet import WordNetLemmatizer

lem = WordNetLemmatizer()
from nltk.stem.porter import PorterStemmer
stem = PorterStemmer()
word = "flying"
print("Lemmatizer Word: ", lem.lemmatize(word, "v")) # v means verb here, n->noun,
a->adjective, r->adverb
print("Stemmed Word ", stem.stem(word))

#The WordNet lemmatizer is based on WordNet, a lexical database of the English

language.
# another way of reducing the words to base form
# Unlike stemming, lemmatization takes into account the morphological analysis of
words, ensuring that the resulting lemma is a valid word.

sent = "Albert Einstien was born in Ulm, Germany in 1879."

tokens = nltk.word_tokenize(sent)
print(tokens)

nltk.download('averaged_perceptron_tagger')

nltk.pos_tag(tokens)
# nnp proper noun
# vbd past tense verb
# in preposition
# "," mark
# cd cardinal number

from collections import Counter

sent = "Texas is the city in america i guess i dont know"
fq = Counter(sent) # for letter for words use sent.split()
fw = Counter(sent.split())
fw

La Mule Et Le Serpent
No ratings yet
La Mule Et Le Serpent
3 pages
NLP - Exp 1 11
No ratings yet
NLP - Exp 1 11
29 pages
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
32 pages
Removing Stopwords in NLP
No ratings yet
Removing Stopwords in NLP
32 pages
NLP Lab Programms
No ratings yet
NLP Lab Programms
9 pages
NLPPractical
No ratings yet
NLPPractical
12 pages
NLP Lab
No ratings yet
NLP Lab
7 pages
Overcoming Fundamental Challenges in ELT Classroom: - Ram Nath Neupane
No ratings yet
Overcoming Fundamental Challenges in ELT Classroom: - Ram Nath Neupane
4 pages
7 Exp
No ratings yet
7 Exp
6 pages
NLP Record
No ratings yet
NLP Record
23 pages
Token Ization
No ratings yet
Token Ization
5 pages
Lab 2
No ratings yet
Lab 2
4 pages
20BCP112 - NLP Lab - LAB - Manual
No ratings yet
20BCP112 - NLP Lab - LAB - Manual
65 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization
No ratings yet
1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization
13 pages
NLP Pratical
No ratings yet
NLP Pratical
14 pages
Stylistics (Group D)
No ratings yet
Stylistics (Group D)
11 pages
NLP Notebook
No ratings yet
NLP Notebook
20 pages
20BCP123 - NLP Lab Manual
No ratings yet
20BCP123 - NLP Lab Manual
45 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
123 NLP 456
No ratings yet
123 NLP 456
4 pages
Programs Code
No ratings yet
Programs Code
7 pages
R22 NLP Python Programs
No ratings yet
R22 NLP Python Programs
15 pages
Detailed Explanation of The Code
No ratings yet
Detailed Explanation of The Code
4 pages
AI Lab Manual Aktu
No ratings yet
AI Lab Manual Aktu
11 pages
The Silent Way
No ratings yet
The Silent Way
8 pages
NLP
No ratings yet
NLP
12 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
Assignment#6-1 - 11-Arid-3624 - Jupyter Notebook
No ratings yet
Assignment#6-1 - 11-Arid-3624 - Jupyter Notebook
6 pages
7 Idf
No ratings yet
7 Idf
5 pages
Sahil NLP
No ratings yet
Sahil NLP
16 pages
Practicle 7-Notes
No ratings yet
Practicle 7-Notes
2 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
18 pages
NLP Exp 5, Implement Stemming, Lemmetization, Pos - Tag, Wordnet - Colab
No ratings yet
NLP Exp 5, Implement Stemming, Lemmetization, Pos - Tag, Wordnet - Colab
2 pages
01 NLP - Merged Vinay
No ratings yet
01 NLP - Merged Vinay
27 pages
NLP Intro
No ratings yet
NLP Intro
15 pages
Ir Lab 2 Ir Learning Outcomes: Pyterrier
No ratings yet
Ir Lab 2 Ir Learning Outcomes: Pyterrier
7 pages
Final NLP Lab File
No ratings yet
Final NLP Lab File
28 pages
All Practicals
No ratings yet
All Practicals
33 pages
115 Ir 7
No ratings yet
115 Ir 7
6 pages
COMPARISON of Adjectives and Adverbs
No ratings yet
COMPARISON of Adjectives and Adverbs
5 pages
SK NLP Practical (FS)
No ratings yet
SK NLP Practical (FS)
22 pages
Text Preprocessing For NLP
No ratings yet
Text Preprocessing For NLP
15 pages
UBC Summer School in NLP - VSP 2019 Lecture 10
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 10
33 pages
Soundarya 256 NLP Practs
No ratings yet
Soundarya 256 NLP Practs
14 pages
NLP Projects
No ratings yet
NLP Projects
4 pages
StartUp1 EV U02 Review Test A
No ratings yet
StartUp1 EV U02 Review Test A
7 pages
Group 6 - Unit 2H Writing
No ratings yet
Group 6 - Unit 2H Writing
8 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
Writing 2 Main Types
No ratings yet
Writing 2 Main Types
9 pages
DSBD 7 Ass
No ratings yet
DSBD 7 Ass
9 pages
Aped For Fake News
No ratings yet
Aped For Fake News
6 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
Australian English (Group 4)
No ratings yet
Australian English (Group 4)
23 pages
Love and Relationships Exercise On Phrasal Verbs A 93511
No ratings yet
Love and Relationships Exercise On Phrasal Verbs A 93511
2 pages
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
NLP Using Python
No ratings yet
NLP Using Python
4 pages
NLTK Tutorial
No ratings yet
NLTK Tutorial
33 pages
Skema E Vlerësimit Gjuhë Angleze (Niveli B2) Varianti A: Reading Comprehension
No ratings yet
Skema E Vlerësimit Gjuhë Angleze (Niveli B2) Varianti A: Reading Comprehension
4 pages
Report Minor Project Ucs1173
No ratings yet
Report Minor Project Ucs1173
8 pages
NLP - Practical List
No ratings yet
NLP - Practical List
14 pages
Declinaciones y Gramática Lituana PDF
No ratings yet
Declinaciones y Gramática Lituana PDF
51 pages
Sree017 NLP
No ratings yet
Sree017 NLP
3 pages
Grade 6 Eng Exam Type A 1st Term 2021
No ratings yet
Grade 6 Eng Exam Type A 1st Term 2021
3 pages
JFSNT 29 (1987) 21-56) : Department of Theology, University of Nottingham University Park, Nottingham Ng7 2Rd
No ratings yet
JFSNT 29 (1987) 21-56) : Department of Theology, University of Nottingham University Park, Nottingham Ng7 2Rd
37 pages
Spelling National Curriculum Correlation Chart Year 1
No ratings yet
Spelling National Curriculum Correlation Chart Year 1
2 pages
Teaching Guide
No ratings yet
Teaching Guide
20 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
Similarities and Differences Between DM and ALM in Teaching English
100% (1)
Similarities and Differences Between DM and ALM in Teaching English
2 pages
9th English (Test No 2)
100% (1)
9th English (Test No 2)
2 pages
Ayla Lillian Wing: Sample Lesson Plan 2
No ratings yet
Ayla Lillian Wing: Sample Lesson Plan 2
4 pages
Egen Person Og Familie 1
No ratings yet
Egen Person Og Familie 1
18 pages
Father Reads Newspaper Every Day.: (Ayah Membaca Koran Setiap Hari.)
No ratings yet
Father Reads Newspaper Every Day.: (Ayah Membaca Koran Setiap Hari.)
5 pages
Phrases And Clauses Of Purpose: 2. Mệnh đề chỉ mục đích
No ratings yet
Phrases And Clauses Of Purpose: 2. Mệnh đề chỉ mục đích
6 pages
All Is Relative in 2021
No ratings yet
All Is Relative in 2021
6 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Techniques Teaching Grammar Activity
No ratings yet
Techniques Teaching Grammar Activity
4 pages
Vulgar Latin
100% (1)
Vulgar Latin
14 pages
Kendriya Vidyalaya Sangathan Class: Ii English April Name: L.1: First Day at School I. Complete The Sentences by Writing Ing' Form of Words Given
No ratings yet
Kendriya Vidyalaya Sangathan Class: Ii English April Name: L.1: First Day at School I. Complete The Sentences by Writing Ing' Form of Words Given
3 pages
Idiomatic Expressions: Juan No Tiene Pelos en La Lengua
No ratings yet
Idiomatic Expressions: Juan No Tiene Pelos en La Lengua
4 pages
C++ Functions and tutorial
From Everand
C++ Functions and tutorial
Nino Paiotta
No ratings yet
Introduction To Sight Translation (Difference-Sight&Standard Trans)
No ratings yet
Introduction To Sight Translation (Difference-Sight&Standard Trans)
6 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Smart English 3 Grammar Worksheets PDF
95% (22)
Smart English 3 Grammar Worksheets PDF
44 pages

7 TextAnalysis

Uploaded by

7 TextAnalysis

Uploaded by

!

pip install textblob

from textblob import TextBlob

b = TextBlob("I ahve good spelling")

from nltk.tokenize import word_tokenize

from nltk.probability import FreqDist

import matplotlib.pyplot as plt

from nltk.corpus import stopwords

print('tokenized sentence : ', tokenizer_word)

from nltk.stem import PorterStemmer

print("Filtered sent :", filtered_sent)

from nltk.stem.wordnet import WordNetLemmatizer

#The WordNet lemmatizer is based on WordNet, a lexical database of the English

sent = "Albert Einstien was born in Ulm, Germany in 1879."

from collections import Counter

You might also like