0% found this document useful (0 votes)
12 views

Lab2.ipynb - Colaboratory

This document contains code to analyze an email dataset using machine learning algorithms. It loads the dataset, explores the data distribution and features, splits the data into training and test sets, trains a support vector machine classifier on the training set, predicts labels on the test set, and calculates the accuracy of the predictions at 81%.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Lab2.ipynb - Colaboratory

This document contains code to analyze an email dataset using machine learning algorithms. It loads the dataset, explores the data distribution and features, splits the data into training and test sets, trains a support vector machine classifier on the training set, predicts labels on the test set, and calculates the accuracy of the predictions at 81%.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

10/29/23, 10:57 PM Lab2.

ipynb - Colaboratory

1 import pandas as pd
2 import numpy as np
3 import matplotlib.pyplot as plt
4 import seaborn as sns
5 from sklearn.model_selection import train_test_split

1 df = pd.read_csv('emails.csv')

1 df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5172 entries, 0 to 5171
Columns: 3002 entries, Email No. to Prediction
dtypes: int64(3001), object(1)
memory usage: 118.5+ MB

1 df.shape

(5172, 3002)

1 df.head()

output Email
the to ect and for of a you hou ... connevey jay valued lay i
No.

Email
0 0 0 1 0 0 0 2 0 0 ... 0 0 0 0
1

Email
1 8 13 24 6 6 2 102 1 27 ... 0 0 0 0
2

Email
2 0 0 1 0 0 0 8 0 0 ... 0 0 0 0
3

E il

1 df.isnull()

Email No. the to ect and for of a you hou ... connevey jay valued lay infrastructure military

0 False False False False False False False False False False ... False False False False False False

1 False False False False False False False False False False ... False False False False False False

2 False False False False False False False False False False ... False False False False False False

3 False False False False False False False False False False ... False False False False False False

4 False False False False False False False False False False ... False False False False False False

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .

5167 False False False False False False False False False False ... False False False False False False

5168 False False False False False False False False False False ... False False False False False False

5169 False False False False False False False False False False ... False False False False False False

5170 False False False False False False False False False False ... False False False False False False

5171 False False False False False False False False False False ... False False False False False False

5172 rows × 3002 columns

1 df.isnull().sum()

Email No. 0
the 0
to 0
ect 0
and 0
..
military 0
allowing 0
ff 0
dry 0
Prediction 0
Length: 3002, dtype: int64

1 df.duplicated().sum()

https://colab.research.google.com/drive/1KCznbsGxVrKTR0dg9xipgLeY4kz4iWr1#scrollTo=9e7040e0&printMode=true 1/2
10/29/23, 10:57 PM Lab2.ipynb - Colaboratory
1 df.drop(columns=['Email No.'],inplace=True)

1 df['Prediction'].unique()

array([0, 1])

1 y = df['Prediction']

1 X = df.drop(columns=['Prediction'])
2
3
4

1 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=101)

1 from sklearn.svm import SVC #svm is package & svc is a class

1 classifier = SVC() #SVC is a class in which classifier is a object in it

1 classifier.fit(X_train,y_train)

▾ SVC
SVC()

1 y_pred = classifier.predict(X_test)

1 from sklearn.metrics import accuracy_score

1 accuracy_score(y_test,y_pred) #test the accuracy

0.8106280193236715

https://colab.research.google.com/drive/1KCznbsGxVrKTR0dg9xipgLeY4kz4iWr1#scrollTo=9e7040e0&printMode=true 2/2

You might also like