0% found this document useful (0 votes)

198 views8 pages

Spam Detection in Email Using Machine Le

This document discusses spam detection in emails using machine learning. It notes that spam emails are a major issue, comprising about 40% of social network accounts. The document proposes using machine learning, specifically logistic regression, to classify emails as spam or ham (not spam) based on features extracted from the headers, subjects, and bodies. The model achieved an accuracy of around 97% on a dataset of over 5500 emails. Related works that used other machine learning algorithms like naive Bayes and random forests are also discussed, but the proposed logistic regression approach achieved high accuracy while avoiding the high false positive rates of other methods.

Uploaded by

Rahul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

198 views8 pages

Spam Detection in Email Using Machine Le

Uploaded by

Rahul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Spam Detection in Email using Machine Learning

IT19154404 R. A. Shehan Sanjula

Department of Computer Systems Engineering
Sri Lanka Institute of Information Technology
Malabe, Sri Lanka
[email protected]

Abstract—In today's world, email is used in almost every leading to increased spam email problems. In order to detect
industry, from business to education. Emails can be categorized and filter spam and spammers, the researchers proposed a
into two categories: ham and spam. Junk emails, also known as variety of spam detection methods. Currently, Spam email
spam messages, are emails that have been designed to harm detection methods mainly fall into two categories: those based
recipients by wasting their time, computing resources, and on behaviour patterns and those based on semantic patterns.
stealing their valuable information. It is estimated that spam Each type of approach has its drawbacks and limitations.
emails are increasing at a rapid rate. One of the most important Since the advent of the Internet and increased communication
and prominent spam prevention techniques is filtering email.
around the globe, spam emails have grown significantly [5].
Naive Bayes, Decision Trees, Neural Networks, and Random
Through the Internet, spammers can send spam from
Forests are among the methods used for this purpose by
researchers. In this project, I examine the Logistic Regression
anywhere in the world by hiding their identities. Spam mail
machine learning model for spam filtering in email by still dominates the internet despite all the antispam tools and
categorizing messages into appropriate groups. This study also techniques available. Those attacks most commonly involve
compares the techniques based on accuracy, precision, recall, malicious emails containing links to malicious websites that
etc. The accuracy level for this project was around 97%. can cause harm to the victim's personal information. The
Towards the end, these insights and future research directions, memory or capacity of servers can also be occupied by spam
and challenges are outlined. emails, slowing down their response times. All organizations
carefully evaluate the tools available to battle spam in their
Keywords—Machine Learning, Logistic Regression, Spam environment to accurately detect spam emails and avoid the
Email Filtering, TfidfVectorizer, Random State, Deployment increasing issue of spam in emails. Whitelists and blacklists,
mail header analysis, keyword checking, and spam detection
I. INTRODUCTION are some of the popular mechanisms for analyzing incoming
In the modern era of information technology, information emails [6].
sharing is easier than ever. Users can exchange information on
B. Solution Proposed
a variety of platforms from anywhere across the globe. Email
has become currently the easiest, cheapest, and most rapid According to researchers, 40% of social networks have
method of transmitting information in the world among all accounts that are utilised for spam [7]. By sending hidden
information sharing mediums. Emails, on the other hand, are links in the text, spammers target specific segments, review
vulnerable to a variety of attacks, the most popular and pages, or fan pages to promote pornographic or other product
destructive of which is spam due to their simplicity [1]. Aside sites from fraudulent accounts. The same kinds of noxious
from wasting recipients' time and resources, receiving emails emails are sent to the same kinds of individuals or associations
that are not related to their interests may contain malicious on a regular basis. A better detection of these types of emails
content in the form of attachments or URLs which may can be achieved by investigating these highlights. In order to
compromise the host system's security [2]. The term spam differentiate between spam and non-spam emails, artificial
refers to any irrelevant and unwanted messages or emails sent intelligence (AI) can be used [8].
by an attacker to significant numbers of recipients by email or Headers, subjects, and bodies of the messages can be used
any other means of communication [2]. to extract feature information for this solution. These data can
Therefore, the security of the email system requires a great then be grouped into spam and ham based on their nature.
deal of attention. In spam emails, viruses, rats, and Trojans Detecting spam today is commonplace by using learning-
may be contained. Users are often lured towards online based classifiers. Using learning-based classification, spam
services by this technique. It is possible for attackers to send emails are suspected of having a set of specific features to
spam emails with attachments that contain multiple-file distinguish them from legitimate emails. In learning-based
extensions, link to malicious, spamming websites, and worse, models, identifying spam has become more complex due to
result in data and financial fraud and identity theft [3, 4]. It is many factors. There are several factors contributing to spam
possible to create keywords-based rules that serve as filters for subjectivity, including idea drift, language problems,
email messages with many email providers. Even so, this overhead processing, and latency in texting. According to my
method is not very practical because it is difficult, and users proposed method, 97% of the accuracy rate of emails is
do not want to customize their email messages, which leads to classified as spam and ham based on their nature, which is an
spammers attacking their accounts [4]. outstanding achievement since existing systems lack such
precision.
A. The importance of Spam Detection in Email using
Machine Learning II. RELATED WORK
IoT has become a part of our daily lives over the last few Email spam is defined as unsolicited fake bulk emails sent
decades and is growing rapidly. The emergence of IoT is from any account or automated system. Spam emails are
becoming more widespread by today, and it has become a III. DATASET
major issue over the last decade. Typically, Spambots (a In this project, I am using a dataset obtained from Kaggle.
computerized application that crawls email addresses through I have named it “mail_data.csv”. In this dataset, there are
the Internet) are used to collect Email IDs to send spam 5500+ raw mail data in CSV format. I will discuss the basic
emails. In the detection of spam emails, machine learning has properties of the dataset in the methodology section, and how
been playing a vital role recently. A supervised approach with I used it in my machine learning project with feature
feature selection on email spam detection was presented by extraction, cleanups, removing redundant values, filling in
Kaur and Verma [9]. For spam detection systems, they missing values, etc.
introduced the knowledge discovery process. This poll also
addresses the selection of characteristics based on N-Gram. IV. METHODOLOGY
After detecting N - 1 terms in a sentence or text corpus, N-
Gram is a predictive-based method that predicts the Our primary objective with this product is to differentiate
probability of the following word occurrence [09, 10]. They spam from ham emails that we receive daily. It determines
compare nonmachine learning (Signatures, Blacklist and which emails come to your inbox and which emails should go
Whitelist, and mail header checking) with machine learning to the spam folder in real life more effective way. Here I will
(Nave Bayes, Support Vector Machine, multilayer perceptron be using the Logistic Regression model to build this project.
Neural Network) techniques for detecting spam emails. Since It is because the Logistic Regression model is the best model
they are using all these supervised machine learning we can use when it comes to the binary classification problem.
algorithms and evaluate the results based on precision, recall, Further, we already discussed related works. My plan is to
and accuracy false positives are generated at a high rate customize the code we built so that it can be used to its
depending on the dataset. maximum potential.

DeBarr and Wechsler introduced another spam filtering

system that uses Random Forest algorithms to categorize
spam emails and active learning to refine the categorization
[11]. In their approach, each email has divided into two
sections. For that, they have used email messages from RFC
822 (Internet) [11]. For training the dataset, the researchers
have used Support Vector Machine, Random Forest, Naive
Bayes, and KNN [11]. However, since the research solely
depends on term frequency and inverse document frequency
of all features of each email which leads to stopping the
accuracy of the model at 95.2%.
Takhmiri and Haroonabadi [12] use a fuzzy Decision Tree
and the Naive Bayes algorithm to provide a new method for
detecting spam. They extract spam behaviour patterns using
the baked voting algorithm. They did this because, in the real
world, apparent features do not exist. For spam and ham Figure 1: Logical Flow
email classification, decision trees utilise fuzzy Mamdani
rules according to the research. They next use the Nave Bayes We start with raw spam and ham email data, according to
classifier [12,13] in the dataset. Eventually, by separating the above flow. After the data has been collected, we will train
votes into smaller portions, the baking approach is applied. our machine learning model using the data. Nevertheless, they
This method provides them with an optimum weight to aren’t directly applicable to our project. To accomplish this,
improve accuracy. The study utilised a dataset of 1000 our data needs to be preprocessed. Our text data will be
emails, of which 350 (35%) were spam and 650 (65%) were converted into numbers during data preprocessing, since we
ham emails [14] which kind a short dataset. know that machines can only understand numbers. Following
that, I will split our data into training and test data, which will
To identify the emails as junk mail or ham, Verma and be used in training and evaluating our model. Once I have
Sofat utilized the supervised machine learning method ID3 to done that, I will feed the data into our Logistic Regression
construct the decision trees of the study [15]. Further, the model. A trained model will eventually predict whether a mail
hidden Markov model was used to calculate the odds of many is spam or ham by analyzing its contents.
events occurring at the same time [16]. The proposed
A. Integrating Machine Learning into the project
approach classifies all emails as spam or valid by calculating
the total chance of each e-mail using previously classified We already have the Kaggle dataset, so let's explore how
email phrases. This study makes use of the Enron dataset, I'll use it in my Machine Learning project. I will start by
which contains 5172 emails together [15, 16]. 2086 of the importing dependencies (the libraries and functions) we will
5172 emails were spam, while the other 2086 were legitimate. be using.
Using the feature set gathered from the Enron dataset, their
algorithm can classify emails as spam or ham. Using the
fitness function from the Scikit-Learn package in the
suggested model, they got an 11% error. On the given dataset,
their model had an accuracy rate of 89%.
Figure 2: Importing dependencies
Model selection, features extraction, splitting the data and Let's find out how many columns and rows we have in our
matrix import are all done through the sklearn learn library. dataset.
As our algorithm always expects the input to be an integer or
a float, we must insert a feature extraction layer in the middle
to convert the words to integers or floats.
There are a couple of methods of doing this: Figure 5: Checking the number of rows and columns
TfidfVectorizer, CountVectorizer, and Word Embedding.
Counting words is good, but can we do better? The problem As you can see, this is a fairly large dataset. We have a
with simple word counts is that some words, such as “the” and significant number of data here which is 5572 emails. We now
“and”, appear repeatedly without adding any meaningful see that the category column represents two labels. Therefore,
information. The word embedding technique attempts to in our next step, we need to encode those labels. In this case,
convert a word into a vector-based format, and this vector I will label spam as 0 and ham as 1.
describes where this word resides within a higher dimensional
space. When two words have similar meanings, their cosine
distances will be shorter, and they will be closer to one
another. However, our purpose will not be achieved by doing
so. When that happens, TfidfVectorizer comes into play. In
addition to counting each word, the vectorizer will try to
downscale words that appear across multiple documents or
Figure 6: Label Encoding
sentences.
Now, I provide this message data and the labels separately
to the machine learning model. It's like giving X-axis and Y-
axis values. So, the features or message data will be the input,
and the output or the target column will be the category. For
this purpose, let's make two variables.

Figure 3: Data collection and Pre-Processing

The next step will be loading the dataset into a pandas data
frame. The raw data is shown in the above figure. Since the
dataset contains null values & missing values, this will pose a
problem. This issue will be resolved by converting them into
null strings in the next step.

Figure 7: Separating the data as text and label

The next part is the most important since we use one set of
data to test our model and another set to evaluate it. In other
Figure 4: Replace the null values with null string words, part of the X will be our training data, and the other
part will be our test data. The same applies to Y. In this
instance, we will take advantage of the train split function we
imported above. A total of 80% of the 5572 emails will be
used for the training data; the remaining 20% will be used for
the test data. With the random state, I can be sure that our
train_test_split will return the same split every time, which
will give consistency to our model.

Figure 11: Training the model

Figure 8: Splitting the data into training data and test data B. Evaluating the model
A training model must be evaluated before we proceed to
The next step will be feature extraction which we convert build a predictive system. An array called
text values to feature vectors (which has meaningful prediction_on_training_data stores the values predicted by the
numerical values) where Logistic Regression model can trained model. We then compare the predicted values. Here I
understand. max_df is used to remove terms that appear too will utilize the accuracy_score function. We need to provide
frequently. A parameter stop_words = “english” will ignore two parameters. In one, we have the “true” value, which is
words in English that add little meaning to a sentence. In the Y_train, and in the other, we have the “prediction_on_training
next step, fit_transform will convert it to feature vectors. Since data”.
we still have object data type in the data frame, it needs to
convert to integer eventually.

Figure 12: Evaluating the training data and prediction

Our model has been tested using training data, so let’s try
Figure 9: Feature extraction and transform text data into feature it with test data as well. Sometimes a model can overfit.
vectors Therefore, I am testing my model with test data as well as
training data.

Figure 13: Evaluating the test data and prediction

Now that I am confident in my model, let's build a

predictive system. This can be achieved by submitting a
random sample of emails to the model, which can then predict
whether it is spam or not. Here are some examples based on
some emails I selected from my dataset. The value of the label
is predicted using the predict function.

Figure 10: Displaying the transformed data

Now I am going to train my model. It will require

importing the Logistic Regression model. Next, I will feed
the model the training data (X-axis and Y-axis values). Figure 14: Building a Predictive system
C. Local Deployment on Ubuntu LTS 20.04.1 x64

Figure 15: The Python script developed by me

To deploy my model as a web application on Ubuntu, I

developed a Python script (app.py). Upon running the script,
Flask (a Python-based web app framework) is imported to figure, we can see that the flag server has successfully been
render the model into a web application. In the following started on http://127.0.0.1:5000.
Figure 16: Local Deployment on Ubuntu 20.04.1 x64

D. Commercializing the product on the Internet

Figure 17: Hosting the commercial product on the internet

As a commercial product, the Web Application is hosted
on the internet. You can find it at https://spam-email-filtering-
system.shehansanjula.dev
GitHub Repository is at
https://github.com/ShehanSanjula/Spam-Email-Filtering-
System-Public
V. RESULTS
The system will produce the following results. Let's test
the system by sending a spam email.

Figure 19: Determining if the mail is Ham mail

In both cases, the application accurately identified the

emails.
VI. CONCLUSION
Researchers have become increasingly interested in spam
detection and filtering over the last two decades. Several
studies have been conducted in this area because of its
substantial impact on a variety of areas, such as consumer
behavior or fake reviews. In the study, lessons learned from
each machine learning category are compared with previous
approaches. Additionally, spam filters find it challenging to
evaluate features from multiple angles, including temporal,
writing style, semantic and statistical ones. Models are trained
primarily on balanced datasets, while self-learning models are
not feasible. Deep fake is another challenge facing spam
detection systems. According to the findings of this study,
most proposed spam email detection techniques are based on
supervised machine learning techniques. This project provides
an in-depth analysis of these Logistic Regression algorithm
and some future directions for searching and detecting spam
email.
ACKNOWLEDGEMENT
I thank Dr Lakmal Rupasinghe, the lecturer in charge of
the Machine Learning for Cyber Security - IE4092, Ms
Chethana Liyanapathirana, the senior lecturer, and Ms
Figure 18: Checking the content of the mail to see if it's spam
Laneesha Ruggahakotuwa, the assistant lecturer and all
The application works as expected. Let's see if it works when associated lecturers and instructors of Sri Lanka Institute of
a ham mail is used. Information Technology, for granting me an opportunity to
conduct this Machine Learning Project Report with guidance.
This work was supported in part by the Research Groups
Faculty of Computing, Department of Computer Systems
Engineering under Grant Machine Learning for Cyber
Security - IE4092.
REFERENCES
[1] H. Faris, A. M. Al-Zoubi, A. A. Heidari et al., “An intelligent system
for spam detection and identification of the most relevant features
based on evolutionary random weight networks,” Information Fusion,
vol. 48, pp. 67–83, 2019
[2] S. O. Olatunji, “Extreme Learning machines and Support Vector
Machines models for email spam detection,” in Proceedings of the
2017 IEEE 30th Canadian Conference on Electrical and Computer
Engineering (CCECE), IEEE, Windsor, Canada, April 2017.
[3] A. Alghoul, S. Al Ajrami, G. Al Jarousha, G. Harb, and S. S. Abu-
Naser, “Email classification using artificial neural
network,” International Journal for Academic Development, vol. 2,
2018.
[4] M. A. Ferrag, L. Maglaras, S. Moschoyiannis, and H. Janicke, “Deep International Conference, ISC High Performance 2020, vol. 69, pp.
learning for cyber security intrusion detection: approaches, datasets, 170–179, Frankfurt, Germany, 2020.
and comparative study,” Journal of Information Security and [11] Z. Guo, Y. Shen, A. K. Bashir et al., “Robust spammer detection using
Applications, vol. 50, Article ID 102419, 2020. collaborative neural network in Internet of thing applications,” IEEE
[5] H. Bhuiyan, A. Ashiquzzaman, T. Islam Juthi, S. Biswas, and J. Ara, Internet of Things Journal, vol. 8, 2020.
“A survey of existing e-mail spam filtering methods considering [12] Y. Dou, G. Ma, P. S. Yu, and S. Xie, “Robust spammer detection by
machine learning techniques,” Global Journal of Computer Science nash reinforcement learning,” in Proceedings of the 26th ACM
and Technology, vol. 18, 2018. SIGKDD International Conference on Knowledge Discovery & Data
[6] T. Vyas, P. Prajapati, and S. Gadhwal, “A survey and evaluation of Mining, ACM, Virtual Event CA, USA, July 2020.
supervised machine learning techniques for spam e-mail filtering,” [13] M. H. Arif, J. Li, M. Iqbal, and K. Liu, “Sentiment analysis and spam
in Proceedings of the 2015 IEEE international conference on detection in short informal text using learning classifier systems,” Soft
electrical, computer and communication technologies (ICECCT), Computing, vol. 22, no. 21, pp. 7281–7291, 2018.
IEEE, Tamil Nadu, India, March 2015.
[14] N. Kumar and S. Sonowal, “Email spam detection using machine
[7] A. K. Jain and B. B. Gupta, “A novel approach to protect against learning algorithms,” in Proceedings of the 2020 Second International
phishing attacks at client side using auto-updated white-list,” EURASIP Conference on Inventive Research in Computing Applications
Journal on Information Security, vol. 2016, no. 1, p. 9, 2016. (ICIRCA), pp. 108–113, Coimbatore, India, 2020.
[8] A. Subasi, S. Alzahrani, A. Aljuhani, and M. Aljedani, “Comparison of [15] A. J. Saleh, A. Karim, B. Shanmugam et al., “An intelligent spam
decision tree algorithms for spam E-mail filtering,” in Proceedings of detection model based on artificial immune system,” Information, vol.
the 2018 1st International Conference on Computer Applications & 10, no. 6, p. 209, 2019.
Information Security (ICCAIS), IEEE, Riyadh, Saudi Arabia, April
2018. [16] W. Peng, L. Huang, J. Jia, and E. Ingram, “Enhancing the naive bayes
spam filter through intelligent text modification detection,”
[9] H. Faris, I. Aljarah, and B. Al-Shboul, “A hybrid approach based on in Proceedings of the 2018 17th IEEE International Conference on
particle swarm optimization and random forests for e-mail spam Trust, Security And Privacy In Computing And Communications/12th
filtering,” in Proceedings of the International Conference on IEEE International Conference on Big Data Science And Engineering
Computational Collective Intelligence, Springer, Halkidiki, Greece, (TrustCom/BigDataSE), IEEE, New York, NY, USA, August 2018.
September 2016.
[10] N. Sutta, Z. Liu, and X. Zhang, “A study of machine learning
algorithms on email spam classification,” in Proceedings of the 35th

Ithaca Gun Company - M1911a1 Shipping Records
No ratings yet
Ithaca Gun Company - M1911a1 Shipping Records
7 pages
Sms Spam Detectionn
No ratings yet
Sms Spam Detectionn
63 pages
PROJECT REPORT For Machine Learning
100% (1)
PROJECT REPORT For Machine Learning
22 pages
Own Cryptography System: A Project Report
No ratings yet
Own Cryptography System: A Project Report
52 pages
Password Generator - 20240401 - 185026 - 0000
100% (1)
Password Generator - 20240401 - 185026 - 0000
14 pages
Full ML Viva Questions Answers Q1 To Q70
No ratings yet
Full ML Viva Questions Answers Q1 To Q70
6 pages
Project Report Emaildetection
No ratings yet
Project Report Emaildetection
44 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
17 pages
Mail Server System
100% (1)
Mail Server System
62 pages
SMS Spam Detection Using Machine Learning
No ratings yet
SMS Spam Detection Using Machine Learning
9 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
11 pages
Mtech Project Final Document - Sindhuri
100% (1)
Mtech Project Final Document - Sindhuri
119 pages
Dbms
No ratings yet
Dbms
40 pages
Ooad Record Abinash
No ratings yet
Ooad Record Abinash
241 pages
Competitive Analysis of Bridge Snack Category (Kurkure)
100% (1)
Competitive Analysis of Bridge Snack Category (Kurkure)
46 pages
College Alumani System
100% (1)
College Alumani System
51 pages
Reviewer: Industrial Organizational Psychology
100% (3)
Reviewer: Industrial Organizational Psychology
35 pages
SMS SPAM FILTERING Report
No ratings yet
SMS SPAM FILTERING Report
38 pages
SMS Spam Detection and Classification Using NLP Thesis
No ratings yet
SMS Spam Detection and Classification Using NLP Thesis
14 pages
Flight Delay Prediction: Project Synopsis On
No ratings yet
Flight Delay Prediction: Project Synopsis On
13 pages
Cambridge O Level: Mathematics (Syllabus D) 4024/11
No ratings yet
Cambridge O Level: Mathematics (Syllabus D) 4024/11
16 pages
Final Report Airline Management System
No ratings yet
Final Report Airline Management System
80 pages
Department of Cse (Artificial Intelligence & Data Science) : Sms Spam Detection
No ratings yet
Department of Cse (Artificial Intelligence & Data Science) : Sms Spam Detection
27 pages
Transfer of Ownership Package
100% (10)
Transfer of Ownership Package
12 pages
Review 2
100% (1)
Review 2
29 pages
Health Care Final Project
No ratings yet
Health Care Final Project
78 pages
Aarthi Report
100% (1)
Aarthi Report
28 pages
ETABS-Example-RC Building Seismic Load - Response
50% (2)
ETABS-Example-RC Building Seismic Load - Response
35 pages
Spam Detection Synopsis
No ratings yet
Spam Detection Synopsis
8 pages
A System To Filter Unwanted Messages From Osn User Walls
0% (1)
A System To Filter Unwanted Messages From Osn User Walls
19 pages
CSE35 Project Report
No ratings yet
CSE35 Project Report
111 pages
Submitted To Bharathiar University in Partial Fulfillment of The Requirements For The Award of The Degree of
No ratings yet
Submitted To Bharathiar University in Partial Fulfillment of The Requirements For The Award of The Degree of
63 pages
Report of Industrial Training
No ratings yet
Report of Industrial Training
22 pages
Big Data
No ratings yet
Big Data
30 pages
Principles of Accounting (ACC-1101)
No ratings yet
Principles of Accounting (ACC-1101)
4 pages
App Java Report-Eb Ocr
No ratings yet
App Java Report-Eb Ocr
42 pages
Accomation of Inter College Event (Repaired)
No ratings yet
Accomation of Inter College Event (Repaired)
71 pages
Amrit Science Campus: Submitted by
No ratings yet
Amrit Science Campus: Submitted by
35 pages
Coatings Word May 2013
No ratings yet
Coatings Word May 2013
44 pages
Spammer Detect Project Document
No ratings yet
Spammer Detect Project Document
45 pages
Smart City PDF
No ratings yet
Smart City PDF
51 pages
Visvesvaraya Technological University BELGAUM-590014: "Online Agriculture Products Marketing"
100% (1)
Visvesvaraya Technological University BELGAUM-590014: "Online Agriculture Products Marketing"
30 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Cracking Non-Hashed Passwords
No ratings yet
Cracking Non-Hashed Passwords
15 pages
IJRPR8167
No ratings yet
IJRPR8167
7 pages
Woot23 Paper22
No ratings yet
Woot23 Paper22
15 pages
Spam Mail Detection Using Machine Learning
No ratings yet
Spam Mail Detection Using Machine Learning
14 pages
E Mart
No ratings yet
E Mart
7 pages
Utility Bills Exercises
No ratings yet
Utility Bills Exercises
4 pages
REPORT FILE of FACE MASK DETECTION
No ratings yet
REPORT FILE of FACE MASK DETECTION
45 pages
Final CPP Project
No ratings yet
Final CPP Project
19 pages
Review (2) - Machine Learning For SPAM Detection 2023
No ratings yet
Review (2) - Machine Learning For SPAM Detection 2023
13 pages
Sms Spam Detection
No ratings yet
Sms Spam Detection
23 pages
(KAVYA R SHETTY)
No ratings yet
(KAVYA R SHETTY)
21 pages
Railway Reservation System Report
No ratings yet
Railway Reservation System Report
89 pages
A Real-World Case Study in Information Technology For Undergraduate Students
No ratings yet
A Real-World Case Study in Information Technology For Undergraduate Students
11 pages
Munger 100 Quotes
No ratings yet
Munger 100 Quotes
6 pages
M55 LE1 Review Handout
No ratings yet
M55 LE1 Review Handout
33 pages
Intac Reviewer 2
No ratings yet
Intac Reviewer 2
10 pages
Final Project Report - Pet Orphnage
No ratings yet
Final Project Report - Pet Orphnage
43 pages
Astrophotography Nightscape Lens Rating
No ratings yet
Astrophotography Nightscape Lens Rating
26 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
10 pages
Grievance Portal
No ratings yet
Grievance Portal
44 pages
Planner
No ratings yet
Planner
2 pages
Product Specifications
No ratings yet
Product Specifications
2 pages
Book Record Management Project Using C
No ratings yet
Book Record Management Project Using C
62 pages
Brembo - P 50 067
No ratings yet
Brembo - P 50 067
4 pages
Acf Filter
No ratings yet
Acf Filter
6 pages
Secure File Storage On Cloud Using Hybrid Cryptography
No ratings yet
Secure File Storage On Cloud Using Hybrid Cryptography
2 pages
E-Mail Spam Filtering
No ratings yet
E-Mail Spam Filtering
7 pages
Karnika Borah - Updated Resume
No ratings yet
Karnika Borah - Updated Resume
2 pages
Cuple Bosal
No ratings yet
Cuple Bosal
9 pages
Analysis of Spam Email Filtering Through Naive Bayes Algorithm Across Different Datasets
No ratings yet
Analysis of Spam Email Filtering Through Naive Bayes Algorithm Across Different Datasets
4 pages
Secure File Storage On Cloud Using Hybrid Cryptography
No ratings yet
Secure File Storage On Cloud Using Hybrid Cryptography
5 pages
Nf7Fk Standard Compressor R134a 115-127V 60Hz: General
No ratings yet
Nf7Fk Standard Compressor R134a 115-127V 60Hz: General
2 pages
Summer Internship Report On: Aws Data Engineering (Topic)
No ratings yet
Summer Internship Report On: Aws Data Engineering (Topic)
21 pages
Pme 826 Westcott Mod 1 Minor Task 2
No ratings yet
Pme 826 Westcott Mod 1 Minor Task 2
2 pages
Intercropping Cereals and Grain Legumes - A Farmers Perspective
No ratings yet
Intercropping Cereals and Grain Legumes - A Farmers Perspective
2 pages
Business Environment Notes
No ratings yet
Business Environment Notes
4 pages
annotated-BUREAU OF CORRECTIONS
No ratings yet
annotated-BUREAU OF CORRECTIONS
6 pages
(Ebook PDF) Politics in The Developing World 5th Edition All Chapters Instant Download
100% (4)
(Ebook PDF) Politics in The Developing World 5th Edition All Chapters Instant Download
46 pages
Detection of Fake Online Reviews Using Semi Supervised and Supervised Learning
No ratings yet
Detection of Fake Online Reviews Using Semi Supervised and Supervised Learning
4 pages
Kaleen Butterfield Resume Summer 2014
No ratings yet
Kaleen Butterfield Resume Summer 2014
2 pages
Automatic Time-Table Generator: Department of Computer Science and Engineering
No ratings yet
Automatic Time-Table Generator: Department of Computer Science and Engineering
6 pages
Republic of The Philippines Province of Isabela Municipality of Gamu BARANGAY - Office of The Punong Barangay
No ratings yet
Republic of The Philippines Province of Isabela Municipality of Gamu BARANGAY - Office of The Punong Barangay
2 pages
Notes Management System: A Synopsis On
No ratings yet
Notes Management System: A Synopsis On
8 pages
WTA Mini Project Format
100% (3)
WTA Mini Project Format
21 pages
Answer Bigamy
No ratings yet
Answer Bigamy
3 pages
Email Client Application Implementing SMTP and POP - DOC
No ratings yet
Email Client Application Implementing SMTP and POP - DOC
103 pages
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet

Spam Detection in Email Using Machine Le

Uploaded by

Spam Detection in Email Using Machine Le

Uploaded by

Spam Detection in Email using Machine Learning

IT19154404 R. A. Shehan Sanjula

DeBarr and Wechsler introduced another spam filtering

Figure 3: Data collection and Pre-Processing

Figure 7: Separating the data as text and label

Figure 11: Training the model

Figure 12: Evaluating the training data and prediction

Figure 13: Evaluating the test data and prediction

Now that I am confident in my model, let's build a

Figure 10: Displaying the transformed data

Now I am going to train my model. It will require

Figure 15: The Python script developed by me

To deploy my model as a web application on Ubuntu, I

D. Commercializing the product on the Internet

Figure 17: Hosting the commercial product on the internet

Figure 19: Determining if the mail is Ham mail

In both cases, the application accurately identified the

You might also like