Assignment B 2 EmailClassification
Assignment B 2 EmailClassification
a
Name of the Student: __________________________________ Roll No: ____
CLASS: - B. E. [COMP] Division: A, B, C Course: LP-III
Machine Learning
Assignment No. 02
EMAIL SPAM CLASSIFICATION
Marks: /10
Objectives:
• To classify email using binary classification method.
• To analyse performance of KNN and SVM classifiers.
Outcomes:
• Predict the class of user.
Problem Statement:
Classify the email using the binary classification method. Email Spam detection has two states:
a) Normal State – Not Spam, b) Abnormal State – Spam. Use K-Nearest Neighbors and Support
Vector Machine for classification. Analyze their performance.
Dataset link: The emails.csv dataset on the Kaggle
https://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv
Theory:
K-Nearest Neighbors
Suppose P1 is the point, for which label needs to predict. First, you find the k closest point to
P1 and then classify points by majority vote of its k neighbors. Each object votes for their class
and the class with the most votes is taken as the prediction. For finding closest similar points,
you find the distance between points using distance measures such as Euclidean distance,
Hamming distance, Manhattan distance and Minkowski distance.
Generating Model
First, import the KNeighborsClassifier module and create KNN classifier object by passing
argument number of neighbors in KNeighborsClassifier() function.
Then, fit your model on the train set using fit() and perform prediction on the test set using
predict().
Model= KNeighborsClassifier(n_neighbors=3)
#Predict Output
predicted=model.predict([[0,2]])# 0:Overcast, 2:Mild
print(predicted)
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is
used for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in the
correct category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Linear SVM:
The working of the SVM algorithm can be understood by using an example. Suppose we have
a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We
want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue.
Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these two classes.
But there can be multiple lines that can separate these classes. Consider the below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary
or region is called as a hyperplane. SVM algorithm finds the closest point of the lines from
both the classes. These points are called support vectors. The distance between the vectors and
the hyperplane is called as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.
Evaluating Model
Accuracy can be computed by comparing actual test set values and predicted values.
print("Accuracy:",metrics.accuracy_score(y_test,
y_pred))
Conclusion:
Thus we implemented SVM and KNN classifiers using PYTHON scikit-learn library.