Mlds5 Code
Mlds5 Code
In [3]: train
Out[3]:
id label tweet
In [4]: test
Out[4]:
id tweet
1 of 7 10/10/24, 11:04
MLDS_A5_41233_Rutuja - Jupyter Notebook http://localhost:8888/notebooks/MLDS_A5_41233_Rutu...
In [5]: sns.displot(train['label'])
/home/fm-d/anaconda3/lib/python3.11/site-packages/seaborn/axisgri
d.py:118: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)
Out[6]: label
0 29720
1 2242
Name: count, dtype: int64
Out[7]: label
0 0.929854
1 0.070146
Name: count, dtype: float64
2 of 7 10/10/24, 11:04
MLDS_A5_41233_Rutuja - Jupyter Notebook http://localhost:8888/notebooks/MLDS_A5_41233_Rutu...
Out[8]:
id tweet
combi
Out[12]:
id tweet
3 of 7 10/10/24, 11:04
MLDS_A5_41233_Rutuja - Jupyter Notebook http://localhost:8888/notebooks/MLDS_A5_41233_Rutu...
count_words = tweets.str.findall(r'(\w+)').str.len()
print(count_words.sum())
681137
tweets = pd.Series(tweets)
tweets = tweets.str.lower()
stopwords_set = set(stopwords.words("english"))
tweets = tweets.apply(lambda x: " ".join(word for word in x.split() if
count_words = tweets.str.findall(r'(\w+)').str.len()
4 of 7 10/10/24, 11:04
MLDS_A5_41233_Rutuja - Jupyter Notebook http://localhost:8888/notebooks/MLDS_A5_41233_Rutu...
count_words = tweets.str.findall(r'(\w+)').str.len()
print(count_words.sum())
user 27008
love 4217
day 3471
happy 2630
amp 2433
time 1745
life 1719
today 1555
new 1546
like 1527
positive 1423
get 1406
thankful 1403
people 1331
bihday 1327
good 1313
cant 1239
one 1219
see 1136
fathers 1134
dont 1133
smile 1077
want 986
healthy 962
take 945
Name: count, dtype: int64
296750
v = tweets.str.split().tolist()
c = Counter(chain.from_iterable(v))
total_word = 0
for x,word in enumerate(tweets):
num_word = len(word.split())
296750
5 of 7 10/10/24, 11:04
MLDS_A5_41233_Rutuja - Jupyter Notebook http://localhost:8888/notebooks/MLDS_A5_41233_Rutu...
train_tfIdf = vectorizer_tfidf.fit_transform(X_train.astype('U'))
val_tfIdf = vectorizer_tfidf.transform(X_val.astype('U'))
print(vectorizer_tfidf.get_feature_names_out()[:5])
0.9303177937692755
0.9293982688497237
print(confusion_matrix(y_val, y_pred))
[[8911 5]
[ 672 1]]
6 of 7 10/10/24, 11:04
MLDS_A5_41233_Rutuja - Jupyter Notebook http://localhost:8888/notebooks/MLDS_A5_41233_Rutu...
In [ ]:
7 of 7 10/10/24, 11:04