0% found this document useful (0 votes)

30 views6 pages

Assignment B 2 EmailClassification

Uploaded by

Mahesh Kadam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views6 pages

Assignment B 2 EmailClassification

Uploaded by

Mahesh Kadam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

B.E.

(COMP) Sinhgad Institute of Technology, Lonavala LP_III

a
Name of the Student: __________________________________ Roll No: ____
CLASS: - B. E. [COMP] Division: A, B, C Course: LP-III
Machine Learning
Assignment No. 02
EMAIL SPAM CLASSIFICATION
Marks: /10

Date of Performance: / /2023

2024 Sign with Date:

Title : Classify the email using the binary classification method

Objectives:
• To classify email using binary classification method.
• To analyse performance of KNN and SVM classifiers.

Outcomes:
• Predict the class of user.

PEOs, POs, PSOs and COs satisfied

PEOs: I, III POs: 1, 2, 3, 4, 5 PSOs: 1, 2 COs: 1

Problem Statement:
Classify the email using the binary classification method. Email Spam detection has two states:
a) Normal State – Not Spam, b) Abnormal State – Spam. Use K-Nearest Neighbors and Support
Vector Machine for classification. Analyze their performance.
Dataset link: The emails.csv dataset on the Kaggle
https://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv

Theory:
K-Nearest Neighbors

KNN is a non-parametric and lazy learning algorithm. Non-parametric means there is no

assumption for underlying data distribution. In other words, the model structure determined
from the dataset. This will be very helpful in practice where most of the real world datasets do
not follow mathematical theoretical assumptions. Lazy algorithm means it does not need any
training data points for model generation. All training data used in the testing phase. This makes
training faster and testing phase slower and costlier. Costly testing phase means time and
memory. In the worst case, KNN needs more time to scan all data points and scanning all data
points will require more memory for storing training data.

1 | Department of Computer Engineering, SIT, Lonavala

B.E. (COMP) Sinhgad Institute of Technology, Lonavala LP_III

How does the KNN algorithm work?

In KNN, K is the number of nearest neighbors. The number of neighbors is the core deciding
factor. K is generally an odd number if the number of classes is 2. When K=1, then the
algorithm is known as the nearest neighbor algorithm. This is the simplest case. Suppose P1 is
the point, for which label needs to predict. First, you find the one closest point to P1 and then
the label of the nearest point assigned to P1.

Suppose P1 is the point, for which label needs to predict. First, you find the k closest point to
P1 and then classify points by majority vote of its k neighbors. Each object votes for their class
and the class with the most votes is taken as the prediction. For finding closest similar points,
you find the distance between points using distance measures such as Euclidean distance,
Hamming distance, Manhattan distance and Minkowski distance.

KNN Classifier Building in Scikit-learn

Generating Model

First, import the KNeighborsClassifier module and create KNN classifier object by passing
argument number of neighbors in KNeighborsClassifier() function.

2 | Department of Computer Engineering, SIT, Lonavala

B.E. (COMP) Sinhgad Institute of Technology, Lonavala LP_III

Then, fit your model on the train set using fit() and perform prediction on the test set using
predict().

From sklearn.neighbors import KNeighborsClassifier

Model= KNeighborsClassifier(n_neighbors=3)

# Train the model using the training sets

Mpdel.fit(features,label)

#Predict Output
predicted=model.predict([[0,2]])# 0:Overcast, 2:Mild
print(predicted)

Support Vector Machine Algorithm

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is
used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in the
correct category in the future. This best decision boundary is called a hyperplane.

3 | Department of Computer Engineering, SIT, Lonavala

B.E. (COMP) Sinhgad Institute of Technology, Lonavala LP_III

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.

How does SVM works?

Linear SVM:

The working of the SVM algorithm can be understood by using an example. Suppose we have
a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We
want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue.
Consider the below image:

So as it is 2-d space so by just using a straight line, we can easily separate these two classes.
But there can be multiple lines that can separate these classes. Consider the below image:

Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary
or region is called as a hyperplane. SVM algorithm finds the closest point of the lines from
both the classes. These points are called support vectors. The distance between the vectors and
the hyperplane is called as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.

4 | Department of Computer Engineering, SIT, Lonavala

B.E. (COMP) Sinhgad Institute of Technology, Lonavala LP_III

Support Vector Machine Classifier Building in Scikit-learn

1. from sklearn.svm import SVC # "Support vector classifier"

2. classifier = SVC(kernel='linear', random_state=0)
3. classifier.fit(x_train, y_train)
#Predicting the test set result
y_pred= classifier.predict(x_test)
#Creating the Confusion matrix
from sklearn.metrics import confusion_matrix
cm= confusion_matrix(y_test, y_pred)

Evaluating Model
Accuracy can be computed by comparing actual test set values and predicted values.

# Model Accuracy, how often is the classifier correct?

print("Accuracy:",metrics.accuracy_score(y_test,
y_pred))

Conclusion:
Thus we implemented SVM and KNN classifiers using PYTHON scikit-learn library.

5 | Department of Computer Engineering, SIT, Lonavala

B.E. (COMP) Sinhgad Institute of Technology, Lonavala LP_III

A. Write short answer of following questions :

1. Explain the Confusion Matrix with Respect to Machine Learning Algorithms.
2. Explain the K Nearest Neighbor Algorithm.
3. Explain Support Vector Machine Algorithm.
4. Explain following Classification evaluation metrics.
a. Accuracy
b. Precision
c. Recall
d. AUC-ROC
e. F-beta score

6 | Department of Computer Engineering, SIT, Lonavala

TOEIC
No ratings yet
TOEIC
23 pages
16 MM MS Plate 355 JR - India-MTC
No ratings yet
16 MM MS Plate 355 JR - India-MTC
1 page
Chapter 6 ML Classifications
100% (1)
Chapter 6 ML Classifications
51 pages
Unit 2
No ratings yet
Unit 2
16 pages
Day 4 Content
No ratings yet
Day 4 Content
35 pages
ML and Ai Unit 04 and Unit 05
No ratings yet
ML and Ai Unit 04 and Unit 05
58 pages
Classification
No ratings yet
Classification
7 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
Unit 5
No ratings yet
Unit 5
28 pages
Chapter 4. Classification Algorithms-Stud
No ratings yet
Chapter 4. Classification Algorithms-Stud
43 pages
Unit 1
No ratings yet
Unit 1
15 pages
Machine Learning Midterm
No ratings yet
Machine Learning Midterm
18 pages
Module 3
No ratings yet
Module 3
79 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
ML04 KNN-SVM 2024-2025
No ratings yet
ML04 KNN-SVM 2024-2025
57 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
INT354 - Unit 3
No ratings yet
INT354 - Unit 3
60 pages
Unit 5
No ratings yet
Unit 5
73 pages
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
Presentation UNIT-2
No ratings yet
Presentation UNIT-2
96 pages
Unit 3 Ds
No ratings yet
Unit 3 Ds
10 pages
CH 7
No ratings yet
CH 7
33 pages
Comparative Study of Four Supervised Machine Learning Techniques For Classification
No ratings yet
Comparative Study of Four Supervised Machine Learning Techniques For Classification
15 pages
L6 Lecture Image - Classification.fundemental v4
No ratings yet
L6 Lecture Image - Classification.fundemental v4
66 pages
Sem7 Aml PB Batch2021
No ratings yet
Sem7 Aml PB Batch2021
25 pages
Sem7 Aml PB Batch2021
No ratings yet
Sem7 Aml PB Batch2021
46 pages
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
No ratings yet
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
44 pages
4.0 Supervised Learning 4.1 Discuss Classification Model
No ratings yet
4.0 Supervised Learning 4.1 Discuss Classification Model
48 pages
Supervised Learning - SVM - DT
No ratings yet
Supervised Learning - SVM - DT
43 pages
SVM7
No ratings yet
SVM7
53 pages
Unit 3 Aam
No ratings yet
Unit 3 Aam
30 pages
Week 8. Supervised Learning. Classification
No ratings yet
Week 8. Supervised Learning. Classification
45 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
Unit 6 Ai
No ratings yet
Unit 6 Ai
28 pages
A Study On Support Vector Machine Based Linear and Non-Linear Pattern Classification
No ratings yet
A Study On Support Vector Machine Based Linear and Non-Linear Pattern Classification
5 pages
Unit 5 Learning With Algorithm
No ratings yet
Unit 5 Learning With Algorithm
7 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
Evaluation of Different Classifier
No ratings yet
Evaluation of Different Classifier
4 pages
Statistic Inference Unit 2 Notes
No ratings yet
Statistic Inference Unit 2 Notes
34 pages
Session 5
No ratings yet
Session 5
36 pages
Supervised Classification Notes
No ratings yet
Supervised Classification Notes
31 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
KNN & Support Vector Machines: Dr.S.Vasantharathna
No ratings yet
KNN & Support Vector Machines: Dr.S.Vasantharathna
22 pages
DWDM PPT
No ratings yet
DWDM PPT
35 pages
Unit 3
No ratings yet
Unit 3
100 pages
ML Unit 3 Part B Material
No ratings yet
ML Unit 3 Part B Material
15 pages
UNIT-II-Support Vector Machine Algorithm
No ratings yet
UNIT-II-Support Vector Machine Algorithm
13 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
9 pages
2.1 SVM
No ratings yet
2.1 SVM
16 pages
ML Unit 4
No ratings yet
ML Unit 4
76 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
ML Unit-2 (CEC)
No ratings yet
ML Unit-2 (CEC)
96 pages
ML Chapter 3
No ratings yet
ML Chapter 3
45 pages
MachineLearning Unit-III
No ratings yet
MachineLearning Unit-III
26 pages
Algorithm
No ratings yet
Algorithm
27 pages
U21amg05 Aif and ML Unit 04 Notes
No ratings yet
U21amg05 Aif and ML Unit 04 Notes
42 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
DL PPR3
No ratings yet
DL PPR3
57 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
4.4-InstanceBasedLearning Part 2
No ratings yet
4.4-InstanceBasedLearning Part 2
16 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
CBR VS MR
No ratings yet
CBR VS MR
80 pages
77777
No ratings yet
77777
29 pages
3.4 Diaphragm Wall
No ratings yet
3.4 Diaphragm Wall
16 pages
(LSE Monographs On Social Anthropology 63) Andre Beteille - Society and Politics in India - Essays in A Comparative Perspective-Athlone Press - Routledge (1991) (Z-Lib - Io)
No ratings yet
(LSE Monographs On Social Anthropology 63) Andre Beteille - Society and Politics in India - Essays in A Comparative Perspective-Athlone Press - Routledge (1991) (Z-Lib - Io)
326 pages
Development of Presentation Media Design Based On Google Slides Add-On Pear-Deck On High School Sequences and Series Material
No ratings yet
Development of Presentation Media Design Based On Google Slides Add-On Pear-Deck On High School Sequences and Series Material
9 pages
Formatting Tags Available in ArcMap
No ratings yet
Formatting Tags Available in ArcMap
11 pages
Datasheet - Cios Connect
No ratings yet
Datasheet - Cios Connect
16 pages
Business Model Evolution-Nnzvdw
No ratings yet
Business Model Evolution-Nnzvdw
23 pages
Performance Evaluation Form
No ratings yet
Performance Evaluation Form
1 page
Samba de Verão & Wave - Sax
No ratings yet
Samba de Verão & Wave - Sax
2 pages
YIP 6.0 Students
No ratings yet
YIP 6.0 Students
86 pages
37dl Plus - en
No ratings yet
37dl Plus - en
4 pages
TDS Nitocote EP410
No ratings yet
TDS Nitocote EP410
3 pages
111747920
No ratings yet
111747920
61 pages
Problem Solving in Organizations A Methodological Handbook For Business Students 1st Edition Van Aken 2024 Scribd Download
100% (11)
Problem Solving in Organizations A Methodological Handbook For Business Students 1st Edition Van Aken 2024 Scribd Download
84 pages
9.2.8 Lab - Investigate Dissolved Oxygen Levels (Wet Lab)
No ratings yet
9.2.8 Lab - Investigate Dissolved Oxygen Levels (Wet Lab)
9 pages
Allegory of The Cave Analysis
No ratings yet
Allegory of The Cave Analysis
4 pages
Design, Characterization and Use of Custom Standard Cells
No ratings yet
Design, Characterization and Use of Custom Standard Cells
18 pages
Organisational Change Od Assignment
No ratings yet
Organisational Change Od Assignment
12 pages
Solution Focussed Approach
No ratings yet
Solution Focussed Approach
14 pages
Romantic Love and Intimacy in Relationships
100% (1)
Romantic Love and Intimacy in Relationships
85 pages
Cloud Computing Tutorial
No ratings yet
Cloud Computing Tutorial
6 pages
PC300 (350) - 7 20001-Up Engine Control
No ratings yet
PC300 (350) - 7 20001-Up Engine Control
6 pages
Centenary of 'A Portrait of The Artist As A Young Man' (ABEI Journal, Vol.18-2016)
No ratings yet
Centenary of 'A Portrait of The Artist As A Young Man' (ABEI Journal, Vol.18-2016)
206 pages
Mekanisme Pengelolaan Persediaan Sparepart Sepeda Motor Honda Pada PT. Bintang Motor Jaya, TBK Cabang Cirebon
No ratings yet
Mekanisme Pengelolaan Persediaan Sparepart Sepeda Motor Honda Pada PT. Bintang Motor Jaya, TBK Cabang Cirebon
35 pages
General HR Interview Questions With Possible Answers
No ratings yet
General HR Interview Questions With Possible Answers
7 pages
3M Petrifilm Yeast Molds
No ratings yet
3M Petrifilm Yeast Molds
8 pages
Grade 10 Work Sheet w5 q1
100% (2)
Grade 10 Work Sheet w5 q1
2 pages

Assignment B 2 EmailClassification

Uploaded by

Assignment B 2 EmailClassification

Uploaded by

B.E.

(COMP) Sinhgad Institute of Technology, Lonavala LP_III

Date of Performance: / /2023

Title : Classify the email using the binary classification method

PEOs, POs, PSOs and COs satisfied

KNN is a non-parametric and lazy learning algorithm. Non-parametric means there is no

1 | Department of Computer Engineering, SIT, Lonavala

How does the KNN algorithm work?

KNN Classifier Building in Scikit-learn

2 | Department of Computer Engineering, SIT, Lonavala

From sklearn.neighbors import KNeighborsClassifier

# Train the model using the training sets

Support Vector Machine Algorithm

3 | Department of Computer Engineering, SIT, Lonavala

How does SVM works?

4 | Department of Computer Engineering, SIT, Lonavala

Support Vector Machine Classifier Building in Scikit-learn

1. from sklearn.svm import SVC # "Support vector classifier"

# Model Accuracy, how often is the classifier correct?

5 | Department of Computer Engineering, SIT, Lonavala

A. Write short answer of following questions :

6 | Department of Computer Engineering, SIT, Lonavala

You might also like