Trends Merged
Trends Merged
LAB 4
Spell Correction
Dataset link
https://www.kaggle.com/datasets/bittlingmayer/spelling?select=big.txt
In [ ]: # read the document which is "The Project Gutenberg EBook of The Adventures of S
with open('spell_corrector/big.txt', 'r') as file:
document = file.read()
[('the', 79809), ('of', 40024), ('and', 38312), ('to', 28765), ('in', 22023),
('a', 21124), ('that', 12512), ('he', 12401)]
file:///E:/Edu/NITJ/NLP/Lab4/spell_check.html 1/3
2/18/25, 6:02 PM spell_check
def wordsWithinEdits1(word):
splits = []
inserts = []
deletes = []
substitutes = []
# Delete a character
if right:
deletes.append(left + right[1:])
# Insert a character
for c in letters:
inserts.append(left + c + right)
else:
file:///E:/Edu/NITJ/NLP/Lab4/spell_check.html 2/3
2/18/25, 6:02 PM spell_check
All guesses are: {'tgi', 'thij', 'gthi', 'phi', 'thji', 'thb', 'tqhi', 'uhi', 't
fi', 'thv', 'hi', 'tghi', 'thxi', 'khi', 'thoi', 'tha', 'hthi', 'thh', 'thk', 'to
hi', 'ithi', 'othi', 'tuhi', 'tqi', 'zhi', 'thx', 'wthi', 'tfhi', 'thri', 'thyi',
'jhi', 'tdhi', 'tsi', 'txi', 'thia', 'tehi', 'tli', 'fhi', 'tahi', 'sthi', 'tzi',
'thiw', 'thqi', 'qthi', 'thmi', 'yhi', 'thj', 'tkhi', 'thiq', 'tho', 'thq', 'kth
i', 'dhi', 'shi', 'thbi', 'tlhi', 'thfi', 'thpi', 'this', 'zthi', 'lthi', 'tdi',
'thci', 'tki', 'tvi', 'ghi', 'bhi', 'ahi', 'thy', 'tti', 'twhi', 'thii', 'thio',
'vhi', 'thc', 'mhi', 'tyhi', 'nhi', 'thz', 'pthi', 'thgi', 'thui', 'tihi', 'th',
'thi', 'ethi', 'cthi', 'thim', 'thf', 'thvi', 'tbi', 'thn', 'thki', 'ehi', 'thi
h', 'tphi', 'dthi', 'thsi', 'thai', 'tthi', 'tnhi', 'thie', 'tri', 'tci', 'thzi',
'tni', 'tai', 'thd', 'ohi', 'thix', 'vthi', 'ythi', 'thit', 'tht', 'thw', 'thin',
'thiy', 'tui', 'thp', 'tji', 'xthi', 'nthi', 'chi', 'xhi', 'thni', 'thip', 'thi
b', 'mthi', 'thil', 'thiv', 'thg', 'tei', 'thdi', 'thid', 'tzhi', 'tvhi', 'jthi',
'athi', 'thwi', 'thig', 'rhi', 'thiu', 'tjhi', 'tii', 'lhi', 'bthi', 'tmhi', 'uth
i', 'thir', 'the', 'thhi', 'thif', 'ti', 'thu', 'tyi', 'twi', 'hhi', 'thic', 'to
i', 'fthi', 'tchi', 'tbhi', 'thei', 'tshi', 'thiz', 'thti', 'thr', 'trhi', 'thi
k', 'ihi', 'thli', 'tpi', 'qhi', 'rthi', 'tmi', 'txhi', 'ths', 'thm', 'whi', 'th
l'}
Guesses available in document: ['hi', 'this', 'thy', 'th', 'thin', 'chi', 'the',
'ti', 'toi']
file:///E:/Edu/NITJ/NLP/Lab4/spell_check.html 3/3
2/18/25, 6:03 PM trends
Dataset link:
https://www.kaggle.com/datasets/dhruvildave/google-trends-dataset
In [ ]: import pandas as pd
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26955 entries, 0 to 26954
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 location 26955 non-null object
1 year 26955 non-null int64
2 category 26955 non-null object
3 rank 26955 non-null int64
4 query 26955 non-null object
dtypes: int64(2), object(3)
memory usage: 1.0+ MB
Out[ ]: location 0
year 0
category 0
rank 0
query 0
dtype: int64
file:///E:/Edu/NITJ/NLP/Lab4/trends.html 1/7
2/18/25, 6:03 PM trends
Out[ ]: array([2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011,
2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020], dtype=int64)
file:///E:/Edu/NITJ/NLP/Lab4/trends.html 2/7
2/18/25, 6:03 PM trends
file:///E:/Edu/NITJ/NLP/Lab4/trends.html 3/7
2/18/25, 6:03 PM trends
file:///E:/Edu/NITJ/NLP/Lab4/trends.html 4/7
2/18/25, 6:03 PM trends
plt.figure(figsize=(10, 5))
event_trend.plot(kind='line', marker='o', color='red')
plt.title('Searches Related to Disasters Over the Years')
plt.xlabel('Year')
plt.ylabel('Number of Searches')
plt.grid()
plt.show()
plt.figure(figsize=(10, 5))
event_trend.plot(kind='line', marker='o', color='red')
plt.title('Searches Related to Trump Over the Years')
plt.xlabel('Year')
plt.ylabel('Number of Searches')
plt.grid()
plt.show()
file:///E:/Edu/NITJ/NLP/Lab4/trends.html 5/7
2/18/25, 6:03 PM trends
plt.figure(figsize=(10, 5))
top_10_india.plot(kind='bar' , color="red")
plt.title('Top Categories in India')
plt.xlabel('Category')
plt.ylabel('Number of Searches')
plt.xticks(rotation=45)
plt.grid()
plt.show()
In [39]: # get top 10 searched queries in India for the year 2020
year = 2020
location = 'India'
file:///E:/Edu/NITJ/NLP/Lab4/trends.html 6/7
2/18/25, 6:03 PM trends
# get top 10 searched queries in United States for the year 2020
year = 2020
location = 'United States'
top_10_queries_us = df[(df['year'] == year) & (df['location'] == location)]['que
print("\nTop 10 search queries in", year, "and", location + ":")
print(top_10_queries_us)
In [ ]: # get top words searched in the query column for the year 2020
year = 2020
top_words = df[df['year'] == year]['query'].str.split().explode().value_counts()
print("Top words searched in the query column for the year", year, ":")
print(top_words)
Top words searched in the query column for the year 2020 :
query
to 108
Coronavirus 94
How 89
de 82
2020 58
Kobe 48
Joe 48
Bryant 46
en 46
el 45
Name: count, dtype: int64
file:///E:/Edu/NITJ/NLP/Lab4/trends.html 7/7