IDS Assignment Code With Output
IDS Assignment Code With Output
ipynb - Colab
def lowerCaseArray(wordrow):
return [word.lower() for word in wordrow]
def safe_tokenize(text):
try:
return nltk.word_tokenize(text)
except:
return text.split()
# ✅ STEP 5: CONNECT TO REDDIT USING API (Replace with your own credentials)
reddit = praw.Reddit(
client_id='TS-Sh78SbojC7IwZu1vtIw', # ⬅️ Replace with your Reddit client ID
client_secret='2Arkaqqke77PS9HlpwjT4QcYSWbUYw', # ⬅️ Replace with your Reddit client secret
user_agent='reddit_text_analysis_app'
)
print("Hapaxes in 'datascience':")
print(wordfreqs_cat1.hapaxes())
https://colab.research.google.com/drive/1zW5rp4IhdP19iFqE7Wy6-rM8CB6kGK93#scrollTo=sV9h6I2nS1Xb&printMode=true 1/6
3/25/25, 12:02 AM Untitled7.ipynb - Colab
else:
print("⚠ No words to plot for 'datascience'.")
if data['gameofthrones']['all_words']:
wordfreqs_cat2 = nltk.FreqDist(data['gameofthrones']['all_words'])
plt.figure(figsize=(10, 5))
plt.hist(list(wordfreqs_cat2.values()), bins=range(1, 30))
plt.title("Word Frequency Histogram - Game of Thrones")
plt.xlabel("Frequency")
plt.ylabel("Word Count")
plt.show()
print("Hapaxes in 'gameofthrones':")
print(wordfreqs_cat2.hapaxes())
else:
print("⚠ No words to plot for 'gameofthrones'.")
featuresets = []
for label in subreddits:
for words in data[label]['wordMatrix']:
features = extract_features(words, top_words)
featuresets.append((features, label))
https://colab.research.google.com/drive/1zW5rp4IhdP19iFqE7Wy6-rM8CB6kGK93#scrollTo=sV9h6I2nS1Xb&printMode=true 2/6
3/25/25, 12:02 AM Untitled7.ipynb - Colab
Collecting nltk
Downloading nltk-3.9.1-py3-none-any.whl.metadata (2.9 kB)
Collecting praw
Using cached praw-7.8.1-py3-none-any.whl.metadata (9.4 kB)
Collecting matplotlib
Using cached matplotlib-3.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting click (from nltk)
Downloading click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting joblib (from nltk)
Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting regex>=2021.8.3 (from nltk)
Downloading regex-2024.11.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.5/40.5 kB 2.3 MB/s eta 0:00:00
Collecting tqdm (from nltk)
Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.7/57.7 kB 2.7 MB/s eta 0:00:00
Collecting prawcore<3,>=2.4 (from praw)
Using cached prawcore-2.4.0-py3-none-any.whl.metadata (5.0 kB)
Collecting update_checker>=0.18 (from praw)
Using cached update_checker-0.18.0-py3-none-any.whl.metadata (2.3 kB)
Collecting websocket-client>=0.54.0 (from praw)
Downloading websocket_client-1.8.0-py3-none-any.whl.metadata (8.0 kB)
Collecting contourpy>=1.0.1 (from matplotlib)
Downloading contourpy-1.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.4 kB)
Collecting cycler>=0.10 (from matplotlib)
Downloading cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib)
Downloading fonttools-4.56.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (101 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 101.9/101.9 kB 5.1 MB/s eta 0:00:00
Collecting kiwisolver>=1.3.1 (from matplotlib)
Downloading kiwisolver-1.4.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.2 kB)
Collecting numpy>=1.23 (from matplotlib)
Downloading numpy-2.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.0/62.0 kB 4.2 MB/s eta 0:00:00
Collecting packaging>=20.0 (from matplotlib)
Downloading packaging-24.2-py3-none-any.whl.metadata (3.2 kB)
Collecting pillow>=8 (from matplotlib)
Downloading pillow-11.1.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (9.1 kB)
Collecting pyparsing>=2.3.1 (from matplotlib)
Downloading pyparsing-3.2.2-py3-none-any.whl.metadata (5.0 kB)
Collecting python-dateutil>=2.7 (from matplotlib)
Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting requests<3.0,>=2.6.0 (from prawcore<3,>=2.4->praw)
Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting six>=1.5 (from python-dateutil>=2.7->matplotlib)
Downloading six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Collecting charset-normalizer<4,>=2 (from requests<3.0,>=2.6.0->prawcore<3,>=2.4->praw)
Downloading charset_normalizer-3.4.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (35 kB)
Collecting idna<4,>=2.5 (from requests<3.0,>=2.6.0->prawcore<3,>=2.4->praw)
Downloading idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting urllib3<3,>=1.21.1 (from requests<3.0,>=2.6.0->prawcore<3,>=2.4->praw)
Downloading urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests<3.0,>=2.6.0->prawcore<3,>=2.4->praw)
Downloading certifi-2025.1.31-py3-none-any.whl.metadata (2.5 kB)
Downloading nltk-3.9.1-py3-none-any.whl (1.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 27.2 MB/s eta 0:00:00
Using cached praw-7.8.1-py3-none-any.whl (189 kB)
Using cached matplotlib-3.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.6 MB)
Downloading contourpy-1.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (326 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 326.2/326.2 kB 16.9 MB/s eta 0:00:00
Downloading cycler-0.12.1-py3-none-any.whl (8.3 kB)
Downloading fonttools-4.56.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 53.5 MB/s eta 0:00:00
Downloading kiwisolver-1.4.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 56.1 MB/s eta 0:00:00
Downloading numpy-2.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.4/16.4 MB 54.2 MB/s eta 0:00:00
Downloading packaging-24.2-py3-none-any.whl (65 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.5/65.5 kB 4.8 MB/s eta 0:00:00
Downloading pillow-11.1.0-cp311-cp311-manylinux_2_28_x86_64.whl (4.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.5/4.5 MB 72.2 MB/s eta 0:00:00
Using cached prawcore-2.4.0-py3-none-any.whl (17 kB)
Downloading pyparsing-3.2.2-py3-none-any.whl (111 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 111.1/111.1 kB 7.8 MB/s eta 0:00:00
Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.9/229.9 kB 15.4 MB/s eta 0:00:00
Downloading regex-2024.11.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (792 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 792.7/792.7 kB 39.0 MB/s eta 0:00:00
Using cached update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Downloading websocket_client-1.8.0-py3-none-any.whl (58 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.8/58.8 kB 4.4 MB/s eta 0:00:00
Downloading click-8.1.8-py3-none-any.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.2/98.2 kB 6.3 MB/s eta 0:00:00
Downloading joblib-1.4.2-py3-none-any.whl (301 kB)
301 8/301 8 kB 20 6 MB/ t 0 00 00
https://colab.research.google.com/drive/1zW5rp4IhdP19iFqE7Wy6-rM8CB6kGK93#scrollTo=sV9h6I2nS1Xb&printMode=true 3/6
3/25/25, 12:02 AM Untitled7.ipynb - Colab
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 301.8/301.8 kB 20.6 MB/s eta 0:00:00
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 5.8 MB/s eta 0:00:00
Downloading requests-2.32.3-py3-none-any.whl (64 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB 5.0 MB/s eta 0:00:00
Downloading six-1.17.0-py2.py3-none-any.whl (11 kB)
Downloading certifi-2025.1.31-py3-none-any.whl (166 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.4/166.4 kB 13.2 MB/s eta 0:00:00
Downloading charset_normalizer-3.4.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (143 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 143.9/143.9 kB 11.4 MB/s eta 0:00:00
Downloading idna-3.10-py3-none-any.whl (70 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.4/70.4 kB 5.1 MB/s eta 0:00:00
Downloading urllib3-2.3.0-py3-none-any.whl (128 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 128.4/128.4 kB 9.3 MB/s eta 0:00:00
Installing collected packages: websocket-client, urllib3, tqdm, six, regex, pyparsing, pillow, packaging, numpy, kiwisolver, joblib,
Attempting uninstall: websocket-client
Found existing installation: websocket-client 1.8.0
Uninstalling websocket-client-1.8.0:
Successfully uninstalled websocket-client-1.8.0
Attempting uninstall: urllib3
Found existing installation: urllib3 2.3.0
Uninstalling urllib3-2.3.0:
Successfully uninstalled urllib3-2.3.0
Attempting uninstall: tqdm
Found existing installation: tqdm 4.67.1
Uninstalling tqdm-4.67.1:
Successfully uninstalled tqdm-4.67.1
Attempting uninstall: six
Found existing installation: six 1.17.0
Uninstalling six-1.17.0:
Successfully uninstalled six-1.17.0
Attempting uninstall: regex
Found existing installation: regex 2024.11.6
Uninstalling regex-2024.11.6:
Successfully uninstalled regex-2024.11.6
Attempting uninstall: pyparsing
Found existing installation: pyparsing 3.2.1
Uninstalling pyparsing-3.2.1:
Successfully uninstalled pyparsing-3.2.1
Attempting uninstall: pillow
Found existing installation: pillow 11.1.0
Uninstalling pillow-11.1.0:
Successfully uninstalled pillow-11.1.0
Attempting uninstall: packaging
Found existing installation: packaging 24.2
Uninstalling packaging-24.2:
Successfully uninstalled packaging-24.2
Attempting uninstall: numpy
Found existing installation: numpy 1.26.4
Uninstalling numpy-1.26.4:
Successfully uninstalled numpy-1.26.4
Attempting uninstall: kiwisolver
Found existing installation: kiwisolver 1.4.8
Uninstalling kiwisolver-1.4.8:
Successfully uninstalled kiwisolver-1.4.8
Attempting uninstall: joblib
Found existing installation: joblib 1.4.2
Uninstalling joblib-1.4.2:
Successfully uninstalled joblib-1.4.2
Attempting uninstall: idna
Found existing installation: idna 3.10
Uninstalling idna-3.10:
Successfully uninstalled idna-3.10
Attempting uninstall: fonttools
Found existing installation: fonttools 4.56.0
Uninstalling fonttools-4.56.0:
Successfully uninstalled fonttools-4.56.0
Attempting uninstall: cycler
Found existing installation: cycler 0.12.1
Uninstalling cycler-0.12.1:
Successfully uninstalled cycler-0.12.1
Attempting uninstall: click
Found existing installation: click 8.1.8
Uninstalling click-8.1.8:
Successfully uninstalled click-8.1.8
Attempting uninstall: charset-normalizer
Found existing installation: charset-normalizer 3.4.1
Uninstalling charset-normalizer-3.4.1:
Successfully uninstalled charset-normalizer-3.4.1
Attempting uninstall: certifi
Found existing installation: certifi 2025.1.31
Uninstalling certifi-2025.1.31:
Successfully uninstalled certifi-2025.1.31
Attempting uninstall: requests
Found existing installation: requests 2.32.3
https://colab.research.google.com/drive/1zW5rp4IhdP19iFqE7Wy6-rM8CB6kGK93#scrollTo=sV9h6I2nS1Xb&printMode=true 4/6
3/25/25, 12:02 AM Untitled7.ipynb - Colab
Uninstalling requests-2.32.3:
Successfully uninstalled requests-2.32.3
Attempting uninstall: python-dateutil
Found existing installation: python-dateutil 2.8.2
Uninstalling python-dateutil-2.8.2:
Successfully uninstalled python-dateutil-2.8.2
Attempting uninstall: nltk
Found existing installation: nltk 3.9.1
Uninstalling nltk-3.9.1:
Successfully uninstalled nltk-3.9.1
Attempting uninstall: contourpy
Found existing installation: contourpy 1.3.1
Uninstalling contourpy-1.3.1:
Successfully uninstalled contourpy-1.3.1
Attempting uninstall: update_checker
Found existing installation: update-checker 0.18.0
Uninstalling update-checker-0.18.0:
Successfully uninstalled update-checker-0.18.0
Attempting uninstall: prawcore
Found existing installation: prawcore 2.4.0
Uninstalling prawcore-2.4.0:
Successfully uninstalled prawcore-2.4.0
Attempting uninstall: matplotlib
Found existing installation: matplotlib 3.10.1
Uninstalling matplotlib-3.10.1:
Successfully uninstalled matplotlib-3.10.1
Attempting uninstall: praw
Found existing installation: praw 7.8.1
Uninstalling praw-7.8.1:
Successfully uninstalled praw-7.8.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the sour
numba 0.60.0 requires numpy<2.1,>=1.22, but you have numpy 2.2.4 which is incompatible.
tensorflow 2.18.0 requires numpy<2.1.0,>=1.26.0, but you have numpy 2.2.4 which is incompatible.
Successfully installed certifi-2025.1.31 charset-normalizer-3.4.1 click-8.1.8 contourpy-1.3.1 cycler-0.12.1 fonttools-4.56.0 idna-3.1
WARNING: The following packages were previously imported in this runtime:
[PIL,certifi,charset_normalizer,cycler,dateutil,joblib,kiwisolver,nltk,regex,requests,six,update_checker,websocket]
You must restart the runtime in order to use newly installed versions.
RESTART SESSION
Hapaxes in 'datascience':
['31', '27', '"hey,', 'second', 'quick', 'call?', 'minute"', '(wrong', 'feedback!', 'r/datascience,', 'cryptocurrency.', 'emerging',
https://colab.research.google.com/drive/1zW5rp4IhdP19iFqE7Wy6-rM8CB6kGK93#scrollTo=sV9h6I2nS1Xb&printMode=true 5/6
3/25/25, 12:02 AM Untitled7.ipynb - Colab
Hapaxes in 'gameofthrones':
['bastards,', 'victors', 'dispose', 'mountains', 'corpses', 'quickly?', 'episode,', 'single', 'body', 'ground.', 'humiliated', 'begga
https://colab.research.google.com/drive/1zW5rp4IhdP19iFqE7Wy6-rM8CB6kGK93#scrollTo=sV9h6I2nS1Xb&printMode=true 6/6