Spam Email. Classifier

Uploaded by

moammadmc23048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

235 views16 pages

Spam Email. Classifier

Uploaded by

moammadmc23048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Email Spam Filtering

Using Machine Learning

Project Overview

under guidance of
Miss Naina Devi

Kavya Jaiswal
Mohammad Afzal
(2301220140044
2301220140049)
What is Email Spam Filtering?
• Email spam filtering is a technique used to detect and block unwanted or
malicious emails (spam) from entering a user’s inbox.
Why is it Important?
• Helps reduce security risks, enhances productivity, and prevents exposure to
harmful content
Problem Statement
The Challenge:
• With the growing amount of emails,
distinguishing between legitimate (ham)
emails and spam is crucial.
Need for Automation:
• Manual email filtering is inefficient; machine
learning offers a scalable, automated
solution.
Technologies Used
• Python: Primary programming language for data processing
and model training.
• Scikit-learn: For machine learning algorithms (Logistic
Regression).
• Natural Language Processing (NLP): Techniques like TF-IDF for
text feature extraction.
• Pandas, Numpy: For data handling and manipulation.
• Matplotlib, Seaborn: For data visualization.
• Streamlit: Streamlit is an open-source Python framework for
data scientists and AI/ML engineers to deliver interactive data
apps – in only a few lines of code.
Brief Intro of Key Technologies
Logistic Regression:
• A supervised learning algorithm for binary
classification tasks, predicting whether an email is
spam or ham.
TF-IDF (Term Frequency-Inverse Document Frequency):
• A technique used to convert text data into numerical
values based on the frequency of words, helping the
machine learning model understand the importance
of terms.
Methodology
Data Collection:
• Email data collected with labels: "spam" or "ham".
Data Preprocessing:
• Cleaning, handling missing values, and label encoding.
Feature Extraction:
• TF-IDF Vectorizer to convert text into features.
Model Training:
• Logistic Regression model trained on the extracted
features.
Model Evaluation:
• Accuracy evaluated on test data.
Prediction:
• The system predicts whether a new email is spam or
ham.
Application of Spam Filtering
• Personal Use: Automatic filtering of spam in
email accounts.
• Enterprise Use: Enhances corporate security
by preventing phishing attacks and spam
emails.
• Email Service Providers: Used by Gmail,
Outlook, and other email services to reduce
spam for users.
Advantages of the Model
• High Accuracy: Achieved approximately 96% accuracy on training
and test data.
• Automation: Reduces manual effort in filtering out spam.
• Scalability: Can handle large volumes of email data.
• Efficiency: Quick predictions using machine learning techniques.
Disadvantages and Limitations
Limited Feature Extraction:
• TF-IDF doesn’t capture word context (e.g., meaning or
sequence of words).
Imbalance Issue:
• If the dataset is imbalanced, the model may have biased
predictions.
Static Learning:
• The model doesn’t adapt to new types of spam unless retrained
periodically.
Future Scope
Advanced NLP Techniques:
• Using models like Word2Vec or BERT to
better understand the context of emails.
Improved Models:
• Experimenting with Random Forests, SVMs,
or deep learning models (e.g., LSTM).
Real-time Spam Detection:
• Deploying the model in real-time email
systems for dynamic spam filtering.
Multiclass Classification:
• Extending beyond spam and ham to detect
promotional, social, and update emails.
Test Cases
Case 1:
• WINNER!! As a valued network customer you have been selected to receive Â£900 prize reward! To claim
call 09061701461. Claim code KL341. Valid 12 hours only.
Case 2:
• Thanks for your subscription to Ringtone UK your mobile will be charged Â£5/month Please confirm by
replying YES or NO. If you reply NO you will not be charged
Case 3:
• Hi. Wk been ok - on hols now! Yes on for a bit of a run. Forgot that i have hairdressers appointment at
four so need to get home n shower beforehand. Does that cause prob for u?
Case 4:
• I've been searching for the right words to thank you for this breather. I promise i wont take your help for
granted and will fulfil my promise. You have been wonderful and a blessing at all times.
Conclusion
• The project successfully demonstrated how
machine learning can be applied to email spam
filtering.
• Logistic Regression combined with TF-IDF yielded
a high-accuracy model.
• The project lays the groundwork for future
enhancements, such as using advanced NLP and
deploying the system in real-time environments.

Matrimonialsitemanagementsystem
No ratings yet
Matrimonialsitemanagementsystem
78 pages
R22 SkillDevelopmentCourse
No ratings yet
R22 SkillDevelopmentCourse
21 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
33 pages
Transform and Conquer, Presorting
100% (1)
Transform and Conquer, Presorting
2 pages
5.hyperparameters and Validation Sets (C)
No ratings yet
5.hyperparameters and Validation Sets (C)
3 pages
CS3591 Computer Networks Lab Manual Finalized
No ratings yet
CS3591 Computer Networks Lab Manual Finalized
67 pages
Data Mining Question Bank U3 & U4
No ratings yet
Data Mining Question Bank U3 & U4
3 pages
Ai Chat Bot Unit - 2
No ratings yet
Ai Chat Bot Unit - 2
31 pages
CCS361 - Robotic Process Automation - Lab Manual
No ratings yet
CCS361 - Robotic Process Automation - Lab Manual
28 pages
Types of Pipeline
100% (1)
Types of Pipeline
2 pages
F.Y.M.Sc. (CS) Sem-I AI Pract Slip
No ratings yet
F.Y.M.Sc. (CS) Sem-I AI Pract Slip
22 pages
Flat Unit 5 Notes
No ratings yet
Flat Unit 5 Notes
10 pages
ML Unit-1
No ratings yet
ML Unit-1
32 pages
Lab Manual
No ratings yet
Lab Manual
59 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Exam CAU302: IT Certification Guaranteed, The Easy Way!
No ratings yet
Exam CAU302: IT Certification Guaranteed, The Easy Way!
14 pages
Unit-5 Alt
No ratings yet
Unit-5 Alt
15 pages
CS 3 - Problem Solving Agent
No ratings yet
CS 3 - Problem Solving Agent
80 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
98 pages
Concepts of Applets
No ratings yet
Concepts of Applets
15 pages
AI Lab Manual
No ratings yet
AI Lab Manual
37 pages
Analysis Modeling
No ratings yet
Analysis Modeling
39 pages
Ad3251 Unit 2 Notes Edu Engg
No ratings yet
Ad3251 Unit 2 Notes Edu Engg
35 pages
Program 7
100% (1)
Program 7
4 pages
Dbms Unit II
No ratings yet
Dbms Unit II
49 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages
Lab Program
100% (1)
Lab Program
15 pages
Senuous Updated Paypal2023 2
No ratings yet
Senuous Updated Paypal2023 2
5 pages
RTRP Lab Project
No ratings yet
RTRP Lab Project
13 pages
ccs341 Data Warehousing Lab Manual2021
No ratings yet
ccs341 Data Warehousing Lab Manual2021
48 pages
Cs3481 - Dbms Record
No ratings yet
Cs3481 - Dbms Record
63 pages
Absabank-Uganda Fin103
100% (1)
Absabank-Uganda Fin103
4 pages
Notes - Unit 3 - Map Reduce Applications
No ratings yet
Notes - Unit 3 - Map Reduce Applications
11 pages
Is Unit 4
No ratings yet
Is Unit 4
97 pages
Using Predicate Logic: Representation of Simple Facts in Logic
No ratings yet
Using Predicate Logic: Representation of Simple Facts in Logic
10 pages
Da Unit-2
No ratings yet
Da Unit-2
23 pages
A Model For Network Security
No ratings yet
A Model For Network Security
1 page
CC Unit-5
No ratings yet
CC Unit-5
19 pages
PPL I-GGoyal U2.1 Structured - Data - Objects 2022-11-18 20 - 07 Office Lens
100% (1)
PPL I-GGoyal U2.1 Structured - Data - Objects 2022-11-18 20 - 07 Office Lens
49 pages
Tikona Febbill Update
No ratings yet
Tikona Febbill Update
2 pages
Jntuk Machine Learning 3-2 Unit-4
No ratings yet
Jntuk Machine Learning 3-2 Unit-4
32 pages
Regular Expressions and Its Applications
No ratings yet
Regular Expressions and Its Applications
6 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
CS302 Unit1-III
No ratings yet
CS302 Unit1-III
18 pages
Chpater 1 - Unit 2
No ratings yet
Chpater 1 - Unit 2
31 pages
3-1 Bigdata (Spark)
No ratings yet
3-1 Bigdata (Spark)
3 pages
Unit 5 1
No ratings yet
Unit 5 1
18 pages
CN Unit-3
No ratings yet
CN Unit-3
32 pages
SM 6th-Sem Cse Internet-Of-Things
No ratings yet
SM 6th-Sem Cse Internet-Of-Things
76 pages
Deep Learning r18 Jntuh Lab Manual
No ratings yet
Deep Learning r18 Jntuh Lab Manual
20 pages
Unit-1 Cyber Laws
No ratings yet
Unit-1 Cyber Laws
21 pages
Al3452 Os Notes
No ratings yet
Al3452 Os Notes
280 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
46 pages
Email Spam Filtering Using Machine Learning.1
No ratings yet
Email Spam Filtering Using Machine Learning.1
16 pages
Cs-3491-Ai-Ml-Lab RECORD
No ratings yet
Cs-3491-Ai-Ml-Lab RECORD
59 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
PPL Unit 4
No ratings yet
PPL Unit 4
25 pages
Java - Lab - Manual-21csl35 - Skit
No ratings yet
Java - Lab - Manual-21csl35 - Skit
30 pages
System Introduction: For Shell Lubeanalyst
No ratings yet
System Introduction: For Shell Lubeanalyst
6 pages
Bonus+-+Online+Interview+Course+Access+Instructions Tracked User 090391
No ratings yet
Bonus+-+Online+Interview+Course+Access+Instructions Tracked User 090391
3 pages
ML Unit-3
No ratings yet
ML Unit-3
92 pages
Final Report (Saie)
No ratings yet
Final Report (Saie)
38 pages
BScCSIT Transaction DBMS
No ratings yet
BScCSIT Transaction DBMS
30 pages
Anna University OOPS Question Bank Unit 2
100% (1)
Anna University OOPS Question Bank Unit 2
6 pages
Job Application Cover Letter Example Uk
100% (1)
Job Application Cover Letter Example Uk
8 pages
Remote Driver's Licence/ID Card Application: Part A - Applicant Information
100% (1)
Remote Driver's Licence/ID Card Application: Part A - Applicant Information
1 page
Essentials of Business Communication 9e: Positive Messages
No ratings yet
Essentials of Business Communication 9e: Positive Messages
52 pages
Purposive Communication
No ratings yet
Purposive Communication
35 pages
16 Mark Questions OOAD
100% (2)
16 Mark Questions OOAD
9 pages
How To Access The Parent Portal
No ratings yet
How To Access The Parent Portal
9 pages
Assignment Template
100% (1)
Assignment Template
4 pages
Short Workplace Messages and Digital Media: Business Communication: Process and Product
No ratings yet
Short Workplace Messages and Digital Media: Business Communication: Process and Product
29 pages
Orus s1905 Edihon
No ratings yet
Orus s1905 Edihon
1 page
Sample Cover Letter Addressed To Whom It May Concern
100% (1)
Sample Cover Letter Addressed To Whom It May Concern
7 pages
IPTVBoss 3.5 User Manual
No ratings yet
IPTVBoss 3.5 User Manual
32 pages
Backend Task - Internship - Tanx - Fi
No ratings yet
Backend Task - Internship - Tanx - Fi
2 pages
IOS 18 All New Features Sept 2024
No ratings yet
IOS 18 All New Features Sept 2024
22 pages
Unit 1: Database Management System (DBMS) Historical Perspective
100% (1)
Unit 1: Database Management System (DBMS) Historical Perspective
30 pages
Overall Guidelines and Student Manual For Online Review
No ratings yet
Overall Guidelines and Student Manual For Online Review
4 pages
MS Outlook Set Guide
No ratings yet
MS Outlook Set Guide
3 pages
Cs - Ans
No ratings yet
Cs - Ans
11 pages
Samsung Account Personal Data 20240414002405
No ratings yet
Samsung Account Personal Data 20240414002405
30 pages
Sample Paper 3 - E2 Reading
No ratings yet
Sample Paper 3 - E2 Reading
8 pages
ForwardMails PDF
No ratings yet
ForwardMails PDF
7 pages
Figma Basics
No ratings yet
Figma Basics
14 pages
Gmail - Congratulations BSM MLAP Tanbaram MR - Premkumar
No ratings yet
Gmail - Congratulations BSM MLAP Tanbaram MR - Premkumar
3 pages
How To Backup Android Phone Directly To PC With TWRP
No ratings yet
How To Backup Android Phone Directly To PC With TWRP
8 pages
Kotak Email Addresses
No ratings yet
Kotak Email Addresses
11 pages
KFS Full Freelancer Conventional
No ratings yet
KFS Full Freelancer Conventional
1 page

Spam Email. Classifier

Uploaded by

Spam Email. Classifier

Uploaded by

Email Spam Filtering

Using Machine Learning

You might also like