0% found this document useful (0 votes)

32 views3 pages

Loading The Dataset: Import As Import As Import As Import As From Import From Import From Import From Import From Import

This document loads an email dataset, cleans it by dropping null columns and separating features from labels. It then splits the data into training and test sets, and fits and evaluates 5 machine learning models (KNearest Neighbors, Linear SVM, Polynomial SVM, RBF SVM, Sigmoid SVM) on the test set, reporting the accuracy of each model. The Linear SVM achieved the highest accuracy at 97.55%, while Sigmoid SVM had the lowest at 62.37%.

Uploaded by

Divyani Chavan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views3 pages

Loading The Dataset: Import As Import As Import As Import As From Import From Import From Import From Import From Import

Uploaded by

Divyani Chavan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

In [1]: import pandas as pd

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC, LinearSVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
from sklearn import preprocessing

Loading the Dataset

First we load the dataset and ﬁnd out the number of columns, rows, NULL values, etc.

In [2]: df = pd.read_csv('emails.csv')

In [3]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5172 entries, 0 to 5171
Columns: 3002 entries, Email No. to Prediction
dtypes: int64(3001), object(1)
memory usage: 118.5+ MB

In [4]: df.head()

Out[4]:
Email
the to ect and for of a you hou ... connevey jay valued lay infrastructure military
No.

Email
0 0 0 1 0 0 0 2 0 0 ... 0 0 0 0 0
1

Email
1 8 13 24 6 6 2 102 1 27 ... 0 0 0 0 0
2

Email
2 0 0 1 0 0 0 8 0 0 ... 0 0 0 0 0
3

Email
3 0 5 22 0 5 1 51 2 10 ... 0 0 0 0 0
4

Email
4 7 6 17 1 5 2 57 0 9 ... 0 0 0 0 0
5

5 rows × 3002 columns

In [5]: df.dtypes

Out[5]: Email No. object

the int64
to int64
ect int64
and int64
...
military int64
allowing int64
ff int64
dry int64
Prediction int64
Length: 3002, dtype: object

Cleaning
In [6]: df.drop(columns=['Email No.'], inplace=True)

In [7]: df.isna().sum()

Out[7]: the 0
to 0
ect 0
and 0
for 0
..
military 0
allowing 0
ff 0
dry 0
Prediction 0
Length: 3001, dtype: int64

In [8]: df.describe()

Out[8]:
the to ect and for of a

count 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.0000

mean 6.640565 6.188128 5.143852 3.075599 3.124710 2.627030 55.517401

std 11.745009 9.534576 14.101142 6.045970 4.680522 6.229845 87.574172

min 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000

25% 0.000000 1.000000 1.000000 0.000000 1.000000 0.000000 12.000000

50% 3.000000 3.000000 1.000000 1.000000 2.000000 1.000000 28.000000

75% 8.000000 7.000000 4.000000 3.000000 4.000000 2.000000 62.250000

max 210.000000 132.000000 344.000000 89.000000 47.000000 77.000000 1898.000000

8 rows × 3001 columns

Separating the features and the labels

In [9]: X=df.iloc[:, :df.shape[1]-1] #Independent Variables
y=df.iloc[:, -1] #Dependent Variable
X.shape, y.shape

Out[9]: ((5172, 3000), (5172,))

Splitting the Dataset

Training and Test Set

In [10]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=

Machine Learning models

The following 5 models are used:

1. K-Nearest Neighbors
2. Linear SVM
3. Polynomial SVM
4. RBF SVM
5. Sigmoid SVM

In [11]: models = {
"K-Nearest Neighbors": KNeighborsClassifier(n_neighbors=2),
"Linear SVM":LinearSVC(random_state=8, max_iter=900000),
"Polynomical SVM":SVC(kernel="poly", degree=2, random_state=8),
"RBF SVM":SVC(kernel="rbf", random_state=8),
"Sigmoid SVM":SVC(kernel="sigmoid", random_state=8)
}

Fit and predict on each model

Each model is trained using the train set and predictions are made based on the test set. Accuracy
scores are calculated for each model.

In [12]: for model_name, model in models.items():

y_pred=model.fit(X_train, y_train).predict(X_test)
print(f"Accuracy for {model_name} model \t: {metrics.accuracy_score(y_test, y_pred)}

Accuracy for K-Nearest Neighbors model : 0.8878865979381443

Accuracy for Linear SVM model : 0.9755154639175257
Accuracy for Polynomical SVM model : 0.7615979381443299
Accuracy for RBF SVM model : 0.8182989690721649
Accuracy for Sigmoid SVM model : 0.6237113402061856

NPM-D3A en 25 0101
No ratings yet
NPM-D3A en 25 0101
4 pages
Manual Altamira
No ratings yet
Manual Altamira
18 pages
Data Anonymization - SAP
No ratings yet
Data Anonymization - SAP
4 pages
Vector Graphics Algo
No ratings yet
Vector Graphics Algo
24 pages
Aveva™ - Engineering - Commands - 2024 09 26 13 33 05
No ratings yet
Aveva™ - Engineering - Commands - 2024 09 26 13 33 05
5 pages
Machine Learning Algorithms From Scratch
No ratings yet
Machine Learning Algorithms From Scratch
9 pages
A Guide To UX Design and Development: Developer's Journey Through The UX Process 1st Edition Tom Green All Chapters Instant Download
100% (5)
A Guide To UX Design and Development: Developer's Journey Through The UX Process 1st Edition Tom Green All Chapters Instant Download
66 pages
2015 Summer Model Answer Paper
No ratings yet
2015 Summer Model Answer Paper
40 pages
Protecting Personal Data in Epidemiological Research: Datashield and Uk Law
No ratings yet
Protecting Personal Data in Epidemiological Research: Datashield and Uk Law
9 pages
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
100% (1)
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
8 pages
Bi 6 New
No ratings yet
Bi 6 New
6 pages
DLD Lab 7
No ratings yet
DLD Lab 7
9 pages
Python Full Stack
No ratings yet
Python Full Stack
37 pages
Facial Recognition Attendance System Using Python and OpenCv
No ratings yet
Facial Recognition Attendance System Using Python and OpenCv
16 pages
Absenteeism Module
No ratings yet
Absenteeism Module
2 pages
Machine Learning Lab Manual - Record
No ratings yet
Machine Learning Lab Manual - Record
61 pages
Cse Machine Learning Lab Manual
No ratings yet
Cse Machine Learning Lab Manual
22 pages
DSP N211010 1
No ratings yet
DSP N211010 1
25 pages
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
No ratings yet
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
13 pages
Loading The Dataset: 'Diabetes - CSV'
No ratings yet
Loading The Dataset: 'Diabetes - CSV'
4 pages
Wifi Pasword
No ratings yet
Wifi Pasword
1 page
10900320024-Arnab Basak-OE-EC506B-ECE-3A-24
No ratings yet
10900320024-Arnab Basak-OE-EC506B-ECE-3A-24
8 pages
Class Xii PDF For Practical
No ratings yet
Class Xii PDF For Practical
24 pages
Exp5 - Naive - Ipynb - Colab
No ratings yet
Exp5 - Naive - Ipynb - Colab
4 pages
Emotion Classification With DistilBERT
No ratings yet
Emotion Classification With DistilBERT
25 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
34 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
X20CP13xx-en V1.28
No ratings yet
X20CP13xx-en V1.28
58 pages
WireBeings An Expandable Robot Chassis For Arduino
No ratings yet
WireBeings An Expandable Robot Chassis For Arduino
9 pages
Session-1 DataFrame
No ratings yet
Session-1 DataFrame
13 pages
Email Spam Classification
No ratings yet
Email Spam Classification
4 pages
Hduud
No ratings yet
Hduud
55 pages
Ml-Exp-2 - Jupyter Notebook
No ratings yet
Ml-Exp-2 - Jupyter Notebook
2 pages
ML Assignment8
No ratings yet
ML Assignment8
4 pages
P2) Code Email Spam Detection
No ratings yet
P2) Code Email Spam Detection
3 pages
ML Practical 2D
No ratings yet
ML Practical 2D
6 pages
Emails ml2 - Jupyter Notebook
No ratings yet
Emails ml2 - Jupyter Notebook
2 pages
02 - Email - Spam - Ipynb - Colab
No ratings yet
02 - Email - Spam - Ipynb - Colab
11 pages
Siddhesh Asati: #Group: B (ML) #Assignment: 7
No ratings yet
Siddhesh Asati: #Group: B (ML) #Assignment: 7
9 pages
Modern Database Management Systems Edition 8-Answers Ch1
67% (3)
Modern Database Management Systems Edition 8-Answers Ch1
13 pages
Adobe Media Encoder Log-Last
No ratings yet
Adobe Media Encoder Log-Last
2 pages
ML 2 16
No ratings yet
ML 2 16
6 pages
Praveen Ai
No ratings yet
Praveen Ai
6 pages
Kazrog Avalon VT747SP User Guide
No ratings yet
Kazrog Avalon VT747SP User Guide
18 pages
ML Lab
No ratings yet
ML Lab
46 pages
Nilai Uh Statistika
No ratings yet
Nilai Uh Statistika
14 pages
Lab Manual 5
No ratings yet
Lab Manual 5
5 pages
LOan Final
No ratings yet
LOan Final
6 pages
DA Lab
No ratings yet
DA Lab
27 pages
Indexdw
No ratings yet
Indexdw
34 pages
Modelling and Simmulation Assignment - Ipynb - Colab
No ratings yet
Modelling and Simmulation Assignment - Ipynb - Colab
7 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
Machine Learning Lab New
No ratings yet
Machine Learning Lab New
14 pages
1.PPQA-SEPG Roles and Responsibilities PDF
No ratings yet
1.PPQA-SEPG Roles and Responsibilities PDF
2 pages
Machine Learning With PySpark and MLlib - Solving A Binary Classification Problem - by Susan Li - Towards Data Science
No ratings yet
Machine Learning With PySpark and MLlib - Solving A Binary Classification Problem - by Susan Li - Towards Data Science
10 pages
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
No ratings yet
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
9 pages
Types of Event: What Is An Event?
No ratings yet
Types of Event: What Is An Event?
6 pages
DWDM Pavan Final
No ratings yet
DWDM Pavan Final
10 pages
ML Practical 2
No ratings yet
ML Practical 2
6 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
SIM7100 SIM7500 SIM7600 Sleep Mode Application Note V1.01
No ratings yet
SIM7100 SIM7500 SIM7600 Sleep Mode Application Note V1.01
11 pages
Data Frames Pandas, Handout 1
No ratings yet
Data Frames Pandas, Handout 1
16 pages
2 - Jupyter Notebook
No ratings yet
2 - Jupyter Notebook
6 pages
Medonic M-Series M32 Innovation Built of Total Quality: For Today'S Hematology Labs
No ratings yet
Medonic M-Series M32 Innovation Built of Total Quality: For Today'S Hematology Labs
4 pages
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
From Everand
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
Equity Press
No ratings yet
MOD-3 Dap
No ratings yet
MOD-3 Dap
41 pages
PMT2 24
No ratings yet
PMT2 24
56 pages
Lab2.ipynb - Colaboratory
No ratings yet
Lab2.ipynb - Colaboratory
2 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
Rimjhim
No ratings yet
Rimjhim
21 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
Brochure SRT 4930 - en
No ratings yet
Brochure SRT 4930 - en
2 pages
Preprocessing ch.1
No ratings yet
Preprocessing ch.1
24 pages
Kendriya Vidyalaya Sangathan Kolkata Region Pre-Board Examination 2020-21 Class - Xii Subject:Computer Science Time: 3Hrs M.M.-70
No ratings yet
Kendriya Vidyalaya Sangathan Kolkata Region Pre-Board Examination 2020-21 Class - Xii Subject:Computer Science Time: 3Hrs M.M.-70
8 pages
ML Lab Programs For Exam
No ratings yet
ML Lab Programs For Exam
10 pages
Loan Prediction
No ratings yet
Loan Prediction
26 pages
1 - DataPreparation - Ipynb - Colaboratory
No ratings yet
1 - DataPreparation - Ipynb - Colaboratory
8 pages
Numpy Module
No ratings yet
Numpy Module
10 pages
Machine Learning Project Spam SMS Classification 1684945672
No ratings yet
Machine Learning Project Spam SMS Classification 1684945672
18 pages
Information Assurance and Security
No ratings yet
Information Assurance and Security
4 pages
Memory Subsystems - Types of Memory - Memory Connections (Pin Assignments) - Memory Devices - Memory Capacity and Organizations - Address Decoding
No ratings yet
Memory Subsystems - Types of Memory - Memory Connections (Pin Assignments) - Memory Devices - Memory Capacity and Organizations - Address Decoding
46 pages
Assignment 3
No ratings yet
Assignment 3
7 pages
Machine File
No ratings yet
Machine File
27 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
Student - Linear Regression Example - Colaboratory
No ratings yet
Student - Linear Regression Example - Colaboratory
6 pages
Peoplesoft Enterprise Recruiting Solutions 9.0
No ratings yet
Peoplesoft Enterprise Recruiting Solutions 9.0
68 pages
Data Mining Presentation
No ratings yet
Data Mining Presentation
13 pages
Procedure For Top Management and Management Review: Input / Output Diagram Controls
No ratings yet
Procedure For Top Management and Management Review: Input / Output Diagram Controls
4 pages
DS Capestone PDF
No ratings yet
DS Capestone PDF
41 pages
Manual
No ratings yet
Manual
48 pages
Project Report
100% (3)
Project Report
36 pages
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
No ratings yet
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
1 page
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
12 pages
Preprocessing Data For Machine Learning: Sarah Guido
No ratings yet
Preprocessing Data For Machine Learning: Sarah Guido
21 pages

Loading The Dataset: Import As Import As Import As Import As From Import From Import From Import From Import From Import

Uploaded by

Loading The Dataset: Import As Import As Import As Import As From Import From Import From Import From Import From Import

Uploaded by

In [1]: import pandas as pd

Loading the Dataset

5 rows × 3002 columns

Out[5]: Email No. object

count 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.0000

mean 6.640565 6.188128 5.143852 3.075599 3.124710 2.627030 55.517401

std 11.745009 9.534576 14.101142 6.045970 4.680522 6.229845 87.574172

min 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000

25% 0.000000 1.000000 1.000000 0.000000 1.000000 0.000000 12.000000

50% 3.000000 3.000000 1.000000 1.000000 2.000000 1.000000 28.000000

75% 8.000000 7.000000 4.000000 3.000000 4.000000 2.000000 62.250000

max 210.000000 132.000000 344.000000 89.000000 47.000000 77.000000 1898.000000

8 rows × 3001 columns

Separating the features and the labels

Out[9]: ((5172, 3000), (5172,))

Splitting the Dataset

In [10]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=

Machine Learning models

Fit and predict on each model

In [12]: for model_name, model in models.items():

Accuracy for K-Nearest Neighbors model : 0.8878865979381443

You might also like