0% found this document useful (0 votes)
17 views14 pages

Omkar Nimbalkar Ass3

The document outlines various Python code snippets for text processing, including sentiment analysis, word frequency analysis, and data visualization using libraries such as NLTK, TextBlob, and WordCloud. It demonstrates techniques for cleaning text, tokenizing sentences and words, and generating word clouds from text data. Additionally, it showcases how to analyze social media and chat data for insights and sentiment classification.

Uploaded by

nikamsakshi13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views14 pages

Omkar Nimbalkar Ass3

The document outlines various Python code snippets for text processing, including sentiment analysis, word frequency analysis, and data visualization using libraries such as NLTK, TextBlob, and WordCloud. It demonstrates techniques for cleaning text, tokenizing sentences and words, and generating word clouds from text data. Additionally, it showcases how to analyze social media and chat data for insights and sentiment classification.

Uploaded by

nikamsakshi13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

In [8]: import re

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from heapq import nlargest

# Download punkt data if not already downloaded
nltk.download('punkt')

# Input text
text = """Artificial Intelligence (AI) is transforming industries by automating tasks,

# Preprocess: Remove special characters and digits
cleaned_text = re.sub(r'[^a-zA-Z\s]', '', text)

# Tokenize sentences
sentences = sent_tokenize(cleaned_text)

# Tokenize words and calculate word frequencies
word_frequencies = {}
for word in word_tokenize(cleaned_text.lower()):
if word.isalpha(): # Only count words (ignore numbers and punctuation)
word_frequencies[word] = word_frequencies.get(word, 0) + 1

# Score sentences based on word frequency
sentence_scores = {sent: sum(word_frequencies.get(word.lower(), 0) for word in word_to

# Extract top 3 sentences for a more concise summary
summary_sentences = nlargest(3, sentence_scores, key=sentence_scores.get)

# Join the summary sentences
summary = ' '.join(summary_sentences)

# Clean summary (spacing fix, restore full forms, etc.)
summary = summary.replace("AI", "Artificial Intelligence") # Ensure proper usage of f
summary = summary.replace("datadriven", "data-driven") # Fix hyphenated words
summary = summary.replace("selfdriving", "self-driving") # Fix hyphenated words
summary = summary.replace("Machine learning", "Machine Learning") # Capitalize Machin

# Remove extra 'Artificial Intelligence' mentions
summary = summary.replace("Artificial Intelligence Artificial Intelligence", "Artifici

# Display the final summary
print("Summary:", summary)

Summary: Artificial Intelligence is transforming industries by automating tasks impr


oving efficiency and enabling data-driven decisionmaking Artificial Intelligence app
lications range from self-driving cars and virtual assistants to healthcare diagnost
ics and financial predictions Machine Learning a subset of Artificial Intelligence a
llows computers to learn patterns and make decisions without explicit programming Ho
wever Artificial Intelligence also raises ethical concerns such as job displacement
and privacy risks As Artificial Intelligence continues to evolve balancing innovatio
n with ethical considerations is crucial for responsible implementation

[nltk_data] Downloading package punkt to


[nltk_data] C:\Users\Practical\AppData\Roaming\nltk_data...
[nltk_data] Package punkt is already up-to-date!
In [10]: !pip install wordcloud

Defaulting to user installation because normal site-packages is not writeable


Collecting wordcloud
Downloading wordcloud-1.9.4-cp312-cp312-win_amd64.whl.metadata (3.5 kB)
Requirement already satisfied: numpy>=1.6.1 in c:\programdata\anaconda3\lib\site-pac
kages (from wordcloud) (1.26.4)
Requirement already satisfied: pillow in c:\programdata\anaconda3\lib\site-packages
(from wordcloud) (10.4.0)
Requirement already satisfied: matplotlib in c:\programdata\anaconda3\lib\site-packa
ges (from wordcloud) (3.9.2)
Requirement already satisfied: contourpy>=1.0.1 in c:\programdata\anaconda3\lib\site
-packages (from matplotlib->wordcloud) (1.2.0)
Requirement already satisfied: cycler>=0.10 in c:\programdata\anaconda3\lib\site-pac
kages (from matplotlib->wordcloud) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in c:\programdata\anaconda3\lib\sit
e-packages (from matplotlib->wordcloud) (4.51.0)
Requirement already satisfied: kiwisolver>=1.3.1 in c:\programdata\anaconda3\lib\sit
e-packages (from matplotlib->wordcloud) (1.4.4)
Requirement already satisfied: packaging>=20.0 in c:\programdata\anaconda3\lib\site-
packages (from matplotlib->wordcloud) (24.1)
Requirement already satisfied: pyparsing>=2.3.1 in c:\programdata\anaconda3\lib\site
-packages (from matplotlib->wordcloud) (3.1.2)
Requirement already satisfied: python-dateutil>=2.7 in c:\programdata\anaconda3\lib
\site-packages (from matplotlib->wordcloud) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\lib\site-package
s (from python-dateutil>=2.7->matplotlib->wordcloud) (1.16.0)
Downloading wordcloud-1.9.4-cp312-cp312-win_amd64.whl (301 kB)
Installing collected packages: wordcloud
Successfully installed wordcloud-1.9.4

WARNING: The script wordcloud_cli.exe is installed in 'C:\Users\Practical\AppData


\Roaming\Python\Python312\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning,
use --no-warn-script-location.
In [11]: #SETA 2

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
from collections import Counter
import matplotlib.pyplot as plt
from wordcloud import WordCloud

nltk.download('punkt')
nltk.download('stopwords')

text = """The rapid advancements in technology have led to the development of innovati

# Tokenize
words = word_tokenize(text)
sentences = sent_tokenize(text)

# Remove stopwords
stop_words = set(stopwords.words("english"))
filtered_words = [word for word in words if word.lower() not in stop_words]

# Word Frequency
word_freq = Counter(filtered_words)

# Plot Word Frequency
plt.figure(figsize=(10,5))
plt.bar(word_freq.keys(), word_freq.values())
plt.xticks(rotation=45)
plt.show()

# WordCloud
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(' '.jo
plt.figure(figsize=(10,5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

[nltk_data] Downloading package punkt to


[nltk_data] C:\Users\Practical\AppData\Roaming\nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data] C:\Users\Practical\AppData\Roaming\nltk_data...
[nltk_data] Unzipping corpora\stopwords.zip.
In [13]: !pip install textblob

Defaulting to user installation because normal site-packages is not writeable


Collecting textblob
Downloading textblob-0.19.0-py3-none-any.whl.metadata (4.4 kB)
Requirement already satisfied: nltk>=3.9 in c:\programdata\anaconda3\lib\site-packag
es (from textblob) (3.9.1)
Requirement already satisfied: click in c:\programdata\anaconda3\lib\site-packages
(from nltk>=3.9->textblob) (8.1.7)
Requirement already satisfied: joblib in c:\programdata\anaconda3\lib\site-packages
(from nltk>=3.9->textblob) (1.4.2)
Requirement already satisfied: regex>=2021.8.3 in c:\programdata\anaconda3\lib\site-
packages (from nltk>=3.9->textblob) (2024.9.11)
Requirement already satisfied: tqdm in c:\programdata\anaconda3\lib\site-packages (f
rom nltk>=3.9->textblob) (4.66.5)
Requirement already satisfied: colorama in c:\programdata\anaconda3\lib\site-package
s (from click->nltk>=3.9->textblob) (0.4.6)
Downloading textblob-0.19.0-py3-none-any.whl (624 kB)
---------------------------------------- 0.0/624.3 kB ? eta -:--:--
---------------------------------------- 0.0/624.3 kB ? eta -:--:--
---------------------------------------- 624.3/624.3 kB 3.9 MB/s eta 0:00:00
Installing collected packages: textblob
Successfully installed textblob-0.19.0

In [14]: #SETA 3

from textblob import TextBlob

messages = [
"I purchased headphones online. I am very happy with the product.",
"I saw the movie yesterday. The animation was really good but the script was ok.",
"I enjoy listening to music",
"I take a walk in the park everyday"
]

for msg in messages:
sentiment = TextBlob(msg).sentiment.polarity
sentiment_label = "Positive" if sentiment > 0 else "Negative" if sentiment < 0 els
print(f"Message: {msg}\nSentiment: {sentiment_label}\n")

Message: I purchased headphones online. I am very happy with the product.


Sentiment: Positive

Message: I saw the movie yesterday. The animation was really good but the script was
ok.
Sentiment: Positive

Message: I enjoy listening to music


Sentiment: Positive

Message: I take a walk in the park everyday


Sentiment: Negative
In [15]: #SETA 4

import re
import nltk
from wordcloud import WordCloud

nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize

# Load WhatsApp chat file
with open("WhatsAppChat.txt", "r", encoding="utf-8") as file:
chat_data = file.read()

# Tokenization
sentences = sent_tokenize(chat_data)
print("Tokenized Sentences:", sentences[:5])

# Stopword Removal and Lemmatization
stop_words = set(stopwords.words("english"))
words = word_tokenize(chat_data)
filtered_words = [word for word in words if word.lower() not in stop_words]

# WordCloud
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(' '.jo
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

[nltk_data] Downloading package stopwords to


[nltk_data] C:\Users\Practical\AppData\Roaming\nltk_data...
[nltk_data] Package stopwords is already up-to-date!

Tokenized Sentences: ['[12/02/24, 10:30 AM] John: Hey, how are you?', "[12/02/24, 1
0:31 AM] Mike: I'm good, just working on a project.", '[12/02/24, 10:35 AM] John: Ni
ce!', 'Need any help?', '[12/02/24, 10:40 AM] Mike: Yeah, I need some suggestions on
data visualization.']
In [18]: #SETB 1

import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt

df = pd.read_csv(r"C:\Users\Practical\Desktop\instagram_global_top_1000.csv")


print(df.columns)

Index(['Country', 'Rank', 'Account', 'Title', 'Link', 'Category', 'Followers',


'Audience Country', 'Authentic engagement', 'Engagement avg',
'Scraped'],
dtype='object')

In [19]: # Top 5 influencers from India


top_indian = df[df['Country'] == 'India'].nlargest(5, 'Followers')
print(top_indian[['Account', 'Followers']])

# Least followed account
least_followed = df.nsmallest(1, 'Followers')
print(least_followed[['Account', 'Followers']])

# WordCloud for categories
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(' '.jo
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

Empty DataFrame
Columns: [Account, Followers]
Index: []
Account Followers
747 yooncy1 2800000.0
In [28]: #SETB 2

df = pd.read_csv(r"C:\Users\Practical\Desktop\covid_2021_1.csv", encoding='ISO-8859-1

from textblob import TextBlob

def analyze_sentiment(text):
polarity = TextBlob(str(text)).sentiment.polarity
return "Positive" if polarity > 0 else "Negative" if polarity < 0 else "Neutral"

df['Sentiment'] = df['comment_text'].apply(analyze_sentiment)
print(df['Sentiment'].value_counts())

Sentiment
Neutral 44657
Positive 12044
Negative 4243
Name: count, dtype: int64
In [29]: #SETB 3

df = pd.read_csv("INvideos.csv")

# Total views, likes, dislikes, comments
print(df[['views', 'likes', 'dislikes', 'comment_count']].sum())

# Least and Most Liked Videos
print(df.nsmallest(1, 'likes')[['title', 'likes']])
print(df.nlargest(1, 'likes')[['title', 'likes']])

---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[29], line 3
1 #SETB 3
----> 3 df = pd.read_csv("INvideos.csv")
5 # Total views, likes, dislikes, comments
6 print(df[['views', 'likes', 'dislikes', 'comment_count']].sum())

File C:\ProgramData\anaconda3\Lib\site-packages\pandas\io\parsers\readers.py:1026, i
n read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dt
ype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skip
footer, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, par
se_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst,
cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, q
uotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dial
ect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storag
e_options, dtype_backend)
1013 kwds_defaults = _refine_defaults_read(
1014 dialect,
1015 delimiter,
(...)
1022 dtype_backend=dtype_backend,
1023 )
1024 kwds.update(kwds_defaults)
-> 1026 return _read(filepath_or_buffer, kwds)

File C:\ProgramData\anaconda3\Lib\site-packages\pandas\io\parsers\readers.py:620, in
_read(filepath_or_buffer, kwds)
617 _validate_names(kwds.get("names", None))
619 # Create the parser.
--> 620 parser = TextFileReader(filepath_or_buffer, **kwds)
622 if chunksize or iterator:
623 return parser

File C:\ProgramData\anaconda3\Lib\site-packages\pandas\io\parsers\readers.py:1620, i
n TextFileReader.__init__(self, f, engine, **kwds)
1617 self.options["has_index_names"] = kwds["has_index_names"]
1619 self.handles: IOHandles | None = None
-> 1620 self._engine = self._make_engine(f, self.engine)

File C:\ProgramData\anaconda3\Lib\site-packages\pandas\io\parsers\readers.py:1880, i
n TextFileReader._make_engine(self, f, engine)
1878 if "b" not in mode:
1879 mode += "b"
-> 1880 self.handles = get_handle(
1881 f,
1882 mode,
1883 encoding=self.options.get("encoding", None),
1884 compression=self.options.get("compression", None),
1885 memory_map=self.options.get("memory_map", False),
1886 is_text=is_text,
1887 errors=self.options.get("encoding_errors", "strict"),
1888 storage_options=self.options.get("storage_options", None),
1889 )
1890 assert self.handles is not None
1891 f = self.handles.handle

File C:\ProgramData\anaconda3\Lib\site-packages\pandas\io\common.py:873, in get_hand


le(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_op
tions)
868 elif isinstance(handle, str):
869 # Check whether the filename is to be opened in binary mode.
870 # Binary mode does not support 'encoding' and 'newline'.
871 if ioargs.encoding and "b" not in ioargs.mode:
872 # Encoding
--> 873 handle = open(
874 handle,
875 ioargs.mode,
876 encoding=ioargs.encoding,
877 errors=errors,
878 newline="",
879 )
880 else:
881 # Binary mode
882 handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: 'INvideos.csv'

In [31]: !pip install tweepy

Defaulting to user installation because normal site-packages is not writeable


Collecting tweepy
Downloading tweepy-4.15.0-py3-none-any.whl.metadata (4.1 kB)
Collecting oauthlib<4,>=3.2.0 (from tweepy)
Downloading oauthlib-3.2.2-py3-none-any.whl.metadata (7.5 kB)
Requirement already satisfied: requests<3,>=2.27.0 in c:\programdata\anaconda3\lib\s
ite-packages (from tweepy) (2.32.3)
Collecting requests-oauthlib<3,>=1.2.0 (from tweepy)
Downloading requests_oauthlib-2.0.0-py2.py3-none-any.whl.metadata (11 kB)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\programdata\anaconda3
\lib\site-packages (from requests<3,>=2.27.0->tweepy) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in c:\programdata\anaconda3\lib\site-pac
kages (from requests<3,>=2.27.0->tweepy) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\programdata\anaconda3\lib\si
te-packages (from requests<3,>=2.27.0->tweepy) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in c:\programdata\anaconda3\lib\si
te-packages (from requests<3,>=2.27.0->tweepy) (2024.8.30)
Downloading tweepy-4.15.0-py3-none-any.whl (99 kB)
Downloading oauthlib-3.2.2-py3-none-any.whl (151 kB)
Downloading requests_oauthlib-2.0.0-py2.py3-none-any.whl (24 kB)
Installing collected packages: oauthlib, requests-oauthlib, tweepy
Successfully installed oauthlib-3.2.2 requests-oauthlib-2.0.0 tweepy-4.15.0
In [32]: #SETC 1

import tweepy
import pandas as pd

BEARER_TOKEN = "your_bearer_token_here"

client = tweepy.Client(bearer_token=BEARER_TOKEN)

# Search tweets
query = "Python"
tweets = client.search_recent_tweets(query=query, max_results=10, tweet_fields=["publi

for tweet in tweets.data:
print(f"Tweet: {tweet.text}")
print(f"Likes: {tweet.public_metrics['like_count']}, Retweets: {tweet.public_metri

---------------------------------------------------------------------------
Unauthorized Traceback (most recent call last)
Cell In[32], line 12
10 # Search tweets
11 query = "Python"
---> 12 tweets = client.search_recent_tweets(query=query, max_results=10, tweet_fiel
ds=["public_metrics"])
14 for tweet in tweets.data:
15 print(f"Tweet: {tweet.text}")

File ~\AppData\Roaming\Python\Python312\site-packages\tweepy\client.py:1270, in Clie


nt.search_recent_tweets(self, query, user_auth, **params)
1178 """search_recent_tweets( \
1179 query, *, end_time=None, expansions=None, max_results=None, \
1180 media_fields=None, next_token=None, place_fields=None, \
(...)
1267 .. _Academic Research Project: https://developer.twitter.com/en/docs/project
s (https://developer.twitter.com/en/docs/projects)
1268 """
1269 params["query"] = query
-> 1270 return self._make_request(
1271 "GET", "/2/tweets/search/recent", params=params,
1272 endpoint_parameters=(
1273 "end_time", "expansions", "max_results", "media.fields",
1274 "next_token", "place.fields", "poll.fields", "query",
1275 "since_id", "sort_order", "start_time", "tweet.fields",
1276 "until_id", "user.fields"
1277 ), data_type=Tweet, user_auth=user_auth
1278 )

File ~\AppData\Roaming\Python\Python312\site-packages\tweepy\client.py:129, in BaseC


lient._make_request(self, method, route, params, endpoint_parameters, json, data_typ
e, user_auth)
123 def _make_request(
124 self, method, route, params={}, endpoint_parameters=(), json=None,
125 data_type=None, user_auth=False
126 ):
127 request_params = self._process_params(params, endpoint_parameters)
--> 129 response = self.request(method, route, params=request_params,
130 json=json, user_auth=user_auth)
132 if self.return_type is requests.Response:
133 return response

File ~\AppData\Roaming\Python\Python312\site-packages\tweepy\client.py:98, in BaseCl


ient.request(self, method, route, params, json, user_auth)
96 raise BadRequest(response)
97 if response.status_code == 401:
---> 98 raise Unauthorized(response)
99 if response.status_code == 403:
100 raise Forbidden(response)

Unauthorized: 401 Unauthorized


Unauthorized
In [33]: #SETC 2

import pandas as pd
import json

# Load JSON
with open("your_posts.json", "r", encoding="utf-8") as file:
data = json.load(file)

df = pd.DataFrame(data)

# Clean Data
df = df.dropna()

# Sentiment Analysis
from textblob import TextBlob

df["Sentiment"] = df["content"].apply(lambda x: "Positive" if TextBlob(str(x)).sentime
print(df["Sentiment"].value_counts())

---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[33], line 7
4 import json
6 # Load JSON
----> 7 with open("your_posts.json", "r", encoding="utf-8") as file:
8 data = json.load(file)
10 df = pd.DataFrame(data)

File C:\ProgramData\anaconda3\Lib\site-packages\IPython\core\interactiveshell.py:32
4, in _modified_open(file, *args, **kwargs)
317 if file in {0, 1, 2}:
318 raise ValueError(
319 f"IPython won't let you open fd={file} by default "
320 "as it is likely to crash IPython. If you know what you are doing, "
321 "you can use builtins' open."
322 )
--> 324 return io_open(file, *args, **kwargs)

FileNotFoundError: [Errno 2] No such file or directory: 'your_posts.json'

In [ ]: ​

You might also like