0% found this document useful (0 votes)

2 views3 pages

Email Spam Detection

The document outlines a spam detection model using a dataset of emails, which is processed and cleaned before being split into training and testing sets. A K-Nearest Neighbors classifier is trained on the TF-IDF representation of the text data, achieving an accuracy of approximately 92%. The model's performance is evaluated using a confusion matrix and a classification report, indicating strong precision for ham emails but lower recall for spam emails.

Uploaded by

eashithasaibezawada7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views3 pages

Email Spam Detection

Uploaded by

eashithasaibezawada7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

import pandas as pd

import numpy as np
import re
import string
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("C:/Users/dhamini_eashitha/Downloads/mail_data.csv",
encoding='latin-1')
df.columns = ['label', 'message']

df['label'] = df['label'].map({'ham': 0, 'spam': 1}) # Convert labels

to binary (0 = ham, 1 = spam)

# Text Cleaning Function

def clean_text(text):
text = text.lower()
text = re.sub(f"[{string.punctuation}]", "", text) # Remove
punctuation
text = re.sub(r"\d+", "", text) # Remove numbers
return text

df['message'] = df['message'].apply(clean_text)

print(df)

label message
0 0 go until jurong point crazy available only in ...
1 0 ok lar joking wif u oni
2 1 free entry in a wkly comp to win fa cup final...
3 0 u dun say so early hor u c already then say
4 0 nah i dont think he goes to usf he lives aroun...
... ... ...
5567 1 this is the nd time we have tried contact u u...
5568 0 will ã¼ b going to esplanade fr home
5569 0 pity was in mood for that soany other suggest...
5570 0 the guy did some bitching but i acted like id ...
5571 0 rofl its true to its name

[5572 rows x 2 columns]

# Splitting the dataset

X_train, X_test, y_train, y_test = train_test_split(df['message'],
df['label'], test_size=0.2, random_state=42)
# Convert text into numerical representation using TF-IDF
vectorizer = TfidfVectorizer(stop_words='english')
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# Train the K-Nearest Neighbors (KNN) Model

model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train_tfidf, y_train)

KNeighborsClassifier()

# Predictions
y_pred = model.predict(X_test_tfidf)

# Evaluation
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test,
y_pred))

Accuracy: 0.9210762331838565
Classification Report:
precision recall f1-score support

0 0.92 1.00 0.96 966

1 1.00 0.41 0.58 149

accuracy 0.92 1115

macro avg 0.96 0.70 0.77 1115
weighted avg 0.93 0.92 0.91 1115

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6,4))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=['Ham', 'Spam'], yticklabels=['Ham', 'Spam'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()
# Example Prediction
sample_email = ["Congratulations! You've won a free iPhone. Click here
to claim your prize."]
sample_email_tfidf = vectorizer.transform(sample_email)
prediction = model.predict(sample_email_tfidf)
print("Spam" if prediction[0] == 1 else "Ham")

Ham

Email Spam Detection Final Presentation-21BSCHH010002
No ratings yet
Email Spam Detection Final Presentation-21BSCHH010002
17 pages
Hatespeech Code Ipynb
No ratings yet
Hatespeech Code Ipynb
31 pages
Machine Learning Lab New
No ratings yet
Machine Learning Lab New
14 pages
Cp4252 Machine Learning Lab Manual
No ratings yet
Cp4252 Machine Learning Lab Manual
40 pages
Text Classification Using Decision Forests and Pretrained Embeddings - 1716327972920
No ratings yet
Text Classification Using Decision Forests and Pretrained Embeddings - 1716327972920
12 pages
Lab Report 8
No ratings yet
Lab Report 8
11 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
13 pages
02 - Email - Spam - Ipynb - Colab
No ratings yet
02 - Email - Spam - Ipynb - Colab
11 pages
ML Assignment 4
No ratings yet
ML Assignment 4
10 pages
ML 2 16
No ratings yet
ML 2 16
6 pages
ML Week10.1
No ratings yet
ML Week10.1
5 pages
Random Forest
No ratings yet
Random Forest
5 pages
ML Practical Kiranjot 6-10
No ratings yet
ML Practical Kiranjot 6-10
10 pages
Program 4-6
No ratings yet
Program 4-6
7 pages
ML Practical Lovepreet 6-10
No ratings yet
ML Practical Lovepreet 6-10
10 pages
ML Practical Manjot 6-10
No ratings yet
ML Practical Manjot 6-10
10 pages
Fam PR-10
No ratings yet
Fam PR-10
4 pages
Assignment No 2 - ML - Output
No ratings yet
Assignment No 2 - ML - Output
4 pages
Sample Code
No ratings yet
Sample Code
9 pages
Bi 6 New
No ratings yet
Bi 6 New
6 pages
Exp 9
No ratings yet
Exp 9
2 pages
Project-4 (KNN CLASSIFICATION) (2) PRANAB
No ratings yet
Project-4 (KNN CLASSIFICATION) (2) PRANAB
2 pages
KNN SVM
No ratings yet
KNN SVM
2 pages
8-Text Classification - Jupyter Notebook
No ratings yet
8-Text Classification - Jupyter Notebook
2 pages
Remaining ML Program
No ratings yet
Remaining ML Program
12 pages
ML 2
No ratings yet
ML 2
1 page
Detect Fake Profiles in Online Social Networks Using Support Vector Machine
No ratings yet
Detect Fake Profiles in Online Social Networks Using Support Vector Machine
8 pages
Spam Detection
No ratings yet
Spam Detection
10 pages
Lab 78
No ratings yet
Lab 78
6 pages
Ai Project File
No ratings yet
Ai Project File
11 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
7 pages
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
9 pages
Jee Main 2021: Ultimate Guide To
No ratings yet
Jee Main 2021: Ultimate Guide To
42 pages
Aiml Practical
No ratings yet
Aiml Practical
17 pages
ML Python Exercises UOM BDS Classification
No ratings yet
ML Python Exercises UOM BDS Classification
18 pages
Notebook - Text Classification
No ratings yet
Notebook - Text Classification
7 pages
Project Ali Huzaifa
No ratings yet
Project Ali Huzaifa
6 pages
AI Report
No ratings yet
AI Report
8 pages
Spamdetection
No ratings yet
Spamdetection
6 pages
DWDM Pavan Final
No ratings yet
DWDM Pavan Final
10 pages
Implemention of Sms Spam Filtering
No ratings yet
Implemention of Sms Spam Filtering
27 pages
Import As Import As Import As Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import
No ratings yet
Import As Import As Import As Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import
8 pages
Email Spam Classifier
No ratings yet
Email Spam Classifier
22 pages
Python CA 4
No ratings yet
Python CA 4
9 pages
Dsbda 10
No ratings yet
Dsbda 10
5 pages
Styles of Citation
No ratings yet
Styles of Citation
4 pages
ADS - Phase 3
No ratings yet
ADS - Phase 3
34 pages
ML Lab Report
No ratings yet
ML Lab Report
8 pages
New Chat: 1. Predicting Uber Ride Prices
No ratings yet
New Chat: 1. Predicting Uber Ride Prices
16 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
8 pages
Lab5 Example Fall 23
No ratings yet
Lab5 Example Fall 23
4 pages
Introduction To Intelligent Systems
No ratings yet
Introduction To Intelligent Systems
3 pages
Email Spam Classifier Phase1
No ratings yet
Email Spam Classifier Phase1
4 pages
Code
No ratings yet
Code
6 pages
Competencies Proficiency Scale
100% (1)
Competencies Proficiency Scale
2 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
Taekwondo
100% (1)
Taekwondo
4 pages
FND Imp Points
No ratings yet
FND Imp Points
6 pages
21st Module 5
No ratings yet
21st Module 5
5 pages
Machine Learning Learning With Email Spam Detection
No ratings yet
Machine Learning Learning With Email Spam Detection
5 pages
Abnormal Beh Lara
No ratings yet
Abnormal Beh Lara
5 pages
University of Bristol Postgraduate Prospectus 2018 - Web
No ratings yet
University of Bristol Postgraduate Prospectus 2018 - Web
128 pages
Neutrosophic TreeSoft Expert Set and ForestSoft Set
No ratings yet
Neutrosophic TreeSoft Expert Set and ForestSoft Set
12 pages
ML Program Output
No ratings yet
ML Program Output
22 pages
Transformational Leadership. VK
No ratings yet
Transformational Leadership. VK
6 pages
Machine Learning Lab (17CSL76)
No ratings yet
Machine Learning Lab (17CSL76)
48 pages
Lesson Plan For Position and Movement Mathematics 8 Lesson 1
No ratings yet
Lesson Plan For Position and Movement Mathematics 8 Lesson 1
5 pages
Building Guardrails For Large Language Models
No ratings yet
Building Guardrails For Large Language Models
20 pages
Phil Iri Sample
No ratings yet
Phil Iri Sample
5 pages
Manual
No ratings yet
Manual
48 pages
Tuition Fees For 2019/20: A: Senior School Validated Programmes U
No ratings yet
Tuition Fees For 2019/20: A: Senior School Validated Programmes U
5 pages
ML Lab Programs
No ratings yet
ML Lab Programs
18 pages
Pharma Sales Executives Across Tamilnadu
No ratings yet
Pharma Sales Executives Across Tamilnadu
1 page
Qualitative Data Worksheet: Historical Design Teacher's Feedback Working Title
No ratings yet
Qualitative Data Worksheet: Historical Design Teacher's Feedback Working Title
5 pages
Captain's Skills
No ratings yet
Captain's Skills
174 pages
Eapp Week 6
No ratings yet
Eapp Week 6
5 pages
The Concept of Competitive Advantages. Logic, Sources and Durability
No ratings yet
The Concept of Competitive Advantages. Logic, Sources and Durability
14 pages
Essential Communication Skills For Conflict Resolution
No ratings yet
Essential Communication Skills For Conflict Resolution
15 pages
Homework - Grammar - Week 11
No ratings yet
Homework - Grammar - Week 11
3 pages
Pasacao Central School
No ratings yet
Pasacao Central School
2 pages
Cooperative-Learning-In-Foreign-Language-Teaching (SS)
No ratings yet
Cooperative-Learning-In-Foreign-Language-Teaching (SS)
10 pages
Dr. B. R. AMBEDKAR AND MAKING OF THE CONSTITUTION - A Case Study of Indian Federalism
No ratings yet
Dr. B. R. AMBEDKAR AND MAKING OF THE CONSTITUTION - A Case Study of Indian Federalism
13 pages
Curriculum Vitae
No ratings yet
Curriculum Vitae
5 pages
De Reading - Writing Skills
No ratings yet
De Reading - Writing Skills
6 pages
EW3, Scenario, Act
No ratings yet
EW3, Scenario, Act
2 pages
Top Collages To Pursue Aerospace
No ratings yet
Top Collages To Pursue Aerospace
6 pages
11-Multi-Layer Perceptron, Feed-Forward Network, Feedback Network-05-08-2024
No ratings yet
11-Multi-Layer Perceptron, Feed-Forward Network, Feedback Network-05-08-2024
11 pages
Bny Sec Acr 2505191205 1100644398 1 1
No ratings yet
Bny Sec Acr 2505191205 1100644398 1 1
50 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet

Email Spam Detection

Uploaded by

Email Spam Detection

Uploaded by

import pandas as pd

df['label'] = df['label'].map({'ham': 0, 'spam': 1}) # Convert labels

# Text Cleaning Function

[5572 rows x 2 columns]

# Splitting the dataset

# Train the K-Nearest Neighbors (KNN) Model

0 0.92 1.00 0.96 966

accuracy 0.92 1115

You might also like