0% found this document useful (0 votes)

10 views2 pages

Scaling in One Range: 5172 Rows × 3002 Columns

The document outlines a data analysis process using a dataset of emails, where a KNN classifier is implemented to predict outcomes based on features extracted from the emails. It includes data preprocessing steps such as scaling and splitting the dataset, followed by model training and evaluation using accuracy metrics. Additionally, different SVM kernels are tested to compare their performance on the same prediction task.

Uploaded by

sahildhene06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views2 pages

Scaling in One Range: 5172 Rows × 3002 Columns

Uploaded by

sahildhene06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

In [1]:

import pandas as pd
import numpy as np

In [2]:
df=pd.read_csv("emails.csv")

In [3]:
df

Out[3]:

Email
the to ect and for of a you hou
No.

Email
0 0 0 1 0 0 0 2 0 0
1

Email
1 8 13 24 6 6 2 102 1 27
2

Email
2 0 0 1 0 0 0 8 0 0
3

Email
3 0 5 22 0 5 1 51 2 10
4

Email
4 7 6 17 1 5 2 57 0 9
5

... ... ... ... ... ... ... ... ... ... ...

Email
5167 2 2 2 3 0 0 32 0 0
5168

Email
5168 35 27 11 2 6 5 151 4 3
5169

Email
5169 0 0 1 1 0 0 11 0 0
5170

Email
5170 2 7 1 0 2 1 28 2 0
5171

Email
5171 22 24 5 1 6 5 148 8 2
5172

5172 rows × 3002 columns

In [5]:
df.isnull().sum()

Out[5]:
Email No. 0
the 0
to 0
ect 0
and 0
..
military 0
allowing 0
ff 0
dry 0
Prediction 0
Length: 3002, dtype: int64

In [6]:
df.shape

Out[6]:
(5172, 3002)

In [7]:
x=df.drop(['Email No.','Prediction'],axis=1)
y=df['Prediction']
x.shape

Out[7]:
(5172, 3000)

In [8]:
y.shape

Out[8]:
(5172,)

Scaling in one range

In [10]:
from sklearn.model_selection import train_test
from sklearn.preprocessing import MinMaxScaler

In [12]:
scaler=MinMaxScaler()
x_scale=scaler.fit_transform(x)
x_scale.shape

Out[12]:
(5172, 3000)

In [14]:
x_train,x_test,y_train,y_test=train_test_split

In [16]:
set(x.dtypes)

Out[16]:
{dtype('int64')}

In [18]:
import seaborn as sns
sns.countplot(x=y)

Out[18]:
<Axes: xlabel='Prediction', ylabel='count'>

KNN Classifier (Elbow

Method)
In [20]:
from sklearn.neighbors import KNeighborsClassi

In [21]:
k_values = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19,
accuracy_values = []

In [24]:
from tqdm.notebook import tqdm
from sklearn import metrics
for i in tqdm(range(len(k_values))):
model = KNeighborsClassifier(n_neighbors=k
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
accuracy = metrics.accuracy_score(y_test,
accuracy_values.append(accuracy)

0%| | 0/15 [00:00<?, ?it/s]

In [25]:
accuracy_values

Out[25]:
[0.9033255993812839,
0.8631090487238979,
0.8460943542150039,
0.8368136117556071,
0.8213457076566125,
0.8012374323279196,
0.7880897138437741,
0.7710750193348801,
0.7610208816705336,
0.7470997679814385,
0.7393658159319412,
0.7293116782675947,
0.7138437741686001,
0.7068832173240526,
0.6960556844547564]

In [27]:
import plotly.express as px
px.line(x=k_values, y=accuracy_values)

0.9

0.85

0.8
y

0.75

0.7

10 20

In [28]:
optimal_k = -1
optimal_accuracy = -1
for i in list(zip(k_values, accuracy_values))
if i[1] > optimal_accuracy:
optimal_k = i[0]
optimal_accuracy = i[1]

In [29]:
knn_model = KNeighborsClassifier(n_neighbors=o

In [30]:
knn_model.fit(x_train, y_train)

Out[30]:
▾ KNeighborsClassifier

KNeighborsClassifier(n_neighbors=1)

In [31]:
y_pred = knn_model.predict(x_test)

In [32]:
print(metrics.classification_report(y_test, y_

precision recall f1-score

support

0 0.96 0.90 0.93

913
1 0.79 0.91 0.85
380

accuracy 0.90
1293
macro avg 0.88 0.91 0.89
1293
weighted avg 0.91 0.90 0.91
1293

In [47]:
from sklearn.metrics import ConfusionMatrixDis
ConfusionMatrixDisplay.from_predictions(y_test

Out[47]:
<sklearn.metrics._plot.confusion_matrix.Confu
sionMatrixDisplay at 0x265260408d0>

Svm
In [34]:
from sklearn.svm import SVC
svm_model = SVC(kernel='sigmoid')

In [35]:
svm_model.fit(x_train,y_train)

Out[35]:
▾ SVC

SVC(kernel='sigmoid')

In [40]:
from sklearn.metrics import accuracy_score
y_predict=svm_model.predict(x_test)
print("accuracy" ,accuracy_score(y_test,y_pred

accuracy 0.8561484918793504

In [41]:
svm_model = SVC(kernel='linear')

In [42]:
svm_model.fit(x_train,y_train)

Out[42]:
▾ SVC

SVC(kernel='linear')

In [43]:
y_predict=svm_model.predict(x_test)
print("accuracy" ,accuracy_score(y_test,y_pred

accuracy 0.9659706109822119

In [44]:
svm_model = SVC(kernel='rbf')
svm_model.fit(x_train,y_train)
y_predict=svm_model.predict(x_test)
print("accuracy" ,accuracy_score(y_test,y_pred

accuracy 0.9505027068832174

In [45]:
svm_model = SVC(kernel='poly')
svm_model.fit(x_train,y_train)
y_predict=svm_model.predict(x_test)
print("accuracy" ,accuracy_score(y_test,y_pred

accuracy 0.7548337200309359

In [ ]:

Unit 3 - Operating System - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Operating System - WWW - Rgpvnotes.in
38 pages
13 Electrical System
100% (1)
13 Electrical System
133 pages
Phonics Instruction
No ratings yet
Phonics Instruction
9 pages
Cylinder Liner - Production Recommendation 0742048 3
No ratings yet
Cylinder Liner - Production Recommendation 0742048 3
17 pages
Cesc 12 - Q1 - M5 PDF
No ratings yet
Cesc 12 - Q1 - M5 PDF
14 pages
High Level Design Service For Prisma SD Wan
No ratings yet
High Level Design Service For Prisma SD Wan
7 pages
K-Nearest Neighbors
100% (1)
K-Nearest Neighbors
32 pages
DVP&R
No ratings yet
DVP&R
2 pages
Breeding Scheme
No ratings yet
Breeding Scheme
15 pages
Approaches of Educational Planning: 1. Social Demand Approach
100% (3)
Approaches of Educational Planning: 1. Social Demand Approach
4 pages
Files Reviewer
No ratings yet
Files Reviewer
24 pages
4a - Training
No ratings yet
4a - Training
38 pages
EX - NO:3: Algorithm
No ratings yet
EX - NO:3: Algorithm
11 pages
ML
No ratings yet
ML
11 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
MOFP - Families Fabaceae, Brassicaceae, Malvaceae
No ratings yet
MOFP - Families Fabaceae, Brassicaceae, Malvaceae
2 pages
Pratham ML
No ratings yet
Pratham ML
14 pages
Data Preprocessing
No ratings yet
Data Preprocessing
9 pages
Aiml 5-8
No ratings yet
Aiml 5-8
19 pages
DM ML Practical
No ratings yet
DM ML Practical
13 pages
In Teachers We Trust The Finnish Way To Worldclass Schools Pasi Sahlberg Download
100% (1)
In Teachers We Trust The Finnish Way To Worldclass Schools Pasi Sahlberg Download
37 pages
ML Lab Programs
No ratings yet
ML Lab Programs
23 pages
02 - Email - Spam - Ipynb - Colab
No ratings yet
02 - Email - Spam - Ipynb - Colab
11 pages
Prakhar - Week 5
No ratings yet
Prakhar - Week 5
8 pages
Python Code For KNN Classifier 1. Initial Message
No ratings yet
Python Code For KNN Classifier 1. Initial Message
7 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
7 pages
Codes For Project
No ratings yet
Codes For Project
8 pages
Implementing KNN Algorithm On The Iris Dataset
No ratings yet
Implementing KNN Algorithm On The Iris Dataset
7 pages
Bi 6 New
No ratings yet
Bi 6 New
6 pages
Banking Theory Law and Practice
No ratings yet
Banking Theory Law and Practice
17 pages
Classification Review
No ratings yet
Classification Review
8 pages
PCA Codebase
No ratings yet
PCA Codebase
6 pages
Practical 7
No ratings yet
Practical 7
6 pages
KNN - Predictive Analysis
No ratings yet
KNN - Predictive Analysis
6 pages
Osaka Paper LV Final
No ratings yet
Osaka Paper LV Final
13 pages
ML 2 16
No ratings yet
ML 2 16
6 pages
Risss ML Record 6
No ratings yet
Risss ML Record 6
6 pages
Monitoring Sheet MR Sia Opv Campaign Final 2023 Doc Grace
No ratings yet
Monitoring Sheet MR Sia Opv Campaign Final 2023 Doc Grace
12 pages
Machine Learning Model Building
No ratings yet
Machine Learning Model Building
6 pages
ML
No ratings yet
ML
7 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Slip
No ratings yet
Slip
5 pages
ML Practical 2
No ratings yet
ML Practical 2
6 pages
Assignment No 2 - ML - Output
No ratings yet
Assignment No 2 - ML - Output
4 pages
Exp9 10
No ratings yet
Exp9 10
4 pages
Dhanashree ML Report
No ratings yet
Dhanashree ML Report
3 pages
SPPUML5
No ratings yet
SPPUML5
4 pages
I Avaliação Parcial - 25.0 PTS - Gabarito
No ratings yet
I Avaliação Parcial - 25.0 PTS - Gabarito
9 pages
Assignment 5 - SourceCode - Ipynb - Colab
No ratings yet
Assignment 5 - SourceCode - Ipynb - Colab
4 pages
Lab06 KNN 01
No ratings yet
Lab06 KNN 01
3 pages
Import As Import As Import As Import As From Import
No ratings yet
Import As Import As Import As Import As From Import
3 pages
ML Lab6
No ratings yet
ML Lab6
4 pages
ML 5
No ratings yet
ML 5
3 pages
ML Lab2 PGM
No ratings yet
ML Lab2 PGM
3 pages
TS 3.01.01 RES I1
No ratings yet
TS 3.01.01 RES I1
5 pages
It - S All About Neighbors - Completed
No ratings yet
It - S All About Neighbors - Completed
14 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
K-Means Clustering From Scratch
No ratings yet
K-Means Clustering From Scratch
3 pages
ICT Project Creation Process
No ratings yet
ICT Project Creation Process
3 pages
ML Lab Prgms Split
No ratings yet
ML Lab Prgms Split
3 pages
Antenna Fundamentals
No ratings yet
Antenna Fundamentals
36 pages
B 8145 C 694
No ratings yet
B 8145 C 694
42 pages
KnnClassifier - Jupyter Notebook
No ratings yet
KnnClassifier - Jupyter Notebook
2 pages
KNN SVM
No ratings yet
KNN SVM
2 pages
4ME Brochure Update V2657
No ratings yet
4ME Brochure Update V2657
12 pages
Ml-Exp-2 - Jupyter Notebook
No ratings yet
Ml-Exp-2 - Jupyter Notebook
2 pages
Practical - 5 - 52
No ratings yet
Practical - 5 - 52
4 pages
Machine Learning Assignment 3
No ratings yet
Machine Learning Assignment 3
7 pages
DATA SCI Ex12 KNN-correct and Wrong Predictions
No ratings yet
DATA SCI Ex12 KNN-correct and Wrong Predictions
1 page
Project-4 (KNN CLASSIFICATION) (2) PRANAB
No ratings yet
Project-4 (KNN CLASSIFICATION) (2) PRANAB
2 pages
211 CRT Cable Disconnected Loc1 SM 4 139 Scanner Power Cable Out Loc3 LRG 2 149 Printer Paper Jam Loc2 MED 3
No ratings yet
211 CRT Cable Disconnected Loc1 SM 4 139 Scanner Power Cable Out Loc3 LRG 2 149 Printer Paper Jam Loc2 MED 3
7 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
Panel Cybercat - 50
No ratings yet
Panel Cybercat - 50
6 pages
Name: Mussab Bin Shahid Sap-Id: 2024 Assignment: Machine-Learning
No ratings yet
Name: Mussab Bin Shahid Sap-Id: 2024 Assignment: Machine-Learning
5 pages
Pbyr2545ct CTB Cte-05
No ratings yet
Pbyr2545ct CTB Cte-05
14 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
5 pages
Catalogo Pompe 2014
No ratings yet
Catalogo Pompe 2014
2 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
5 pages
KNN Model Find Optimanl K
No ratings yet
KNN Model Find Optimanl K
3 pages
Maximilian Steinberg
No ratings yet
Maximilian Steinberg
5 pages
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
No ratings yet
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
2 pages
Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
100% (1)
Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
1 page
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Reading - Toefl
100% (1)
Reading - Toefl
10 pages
Diseases Parasites and Predators Management and Control
No ratings yet
Diseases Parasites and Predators Management and Control
7 pages
Flange Pad Calcs
No ratings yet
Flange Pad Calcs
4 pages
5th Grade Colonial Village Unit Plan
100% (1)
5th Grade Colonial Village Unit Plan
25 pages
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
From Everand
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
Equity Press
No ratings yet

Scaling in One Range: 5172 Rows × 3002 Columns

Uploaded by

Scaling in One Range: 5172 Rows × 3002 Columns

Uploaded by

In [1]:

5172 rows × 3002 columns

Scaling in one range

KNN Classifier (Elbow

0%| | 0/15 [00:00<?, ?it/s]

precision recall f1-score

0 0.96 0.90 0.93

You might also like