0% found this document useful (0 votes)

252 views5 pages

E-Mail Spam Detection Using Machine Learning KNN

This document discusses using a KNN machine learning model to detect email spam. It notes that Google's email spam filter is highly effective, only allowing 1 in 1000 spam emails through. Different machine learning approaches can be used for spam detection, but KNN has become prominent recently. The paper aims to research spam classification algorithms and how they determine if an email is spam. It focuses on using multi-class classification to categorize strings, like labeling emails as spam or not spam.

Uploaded by

mittakola shivaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

252 views5 pages

E-Mail Spam Detection Using Machine Learning KNN

Uploaded by

mittakola shivaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2022 5th International Conference on Contemporary Computing and Informatics (IC3I)

E-mail Spam Detection Using Machine Learning –

KNN
2022 5th International Conference on Contemporary Computing and Informatics (IC3I) | 979-8-3503-9826-7/22/$31.00 ©2022 IEEE | DOI: 10.1109/IC3I56241.2022.10072628

Ajay Reddy Yeruva Deepika Kamboj Dr. Poorna Shankar

Independent Researcher Assistant Professor MCA Department
1326 Hopyard Road Department of Information Technology Indira College of Engineering and
Apt #62, Pleasanton KIET Group of Institutions Management
CA, USA, 94566 Nuradnagar, Ghaziasbad, UP. Pune, Maharashtra, India.
[email protected] [email protected] [email protected]

Upendra Singh Aswal Dr. A Kakoli Rao Somu C S

Associate Professor Professor and HOD Assistant Professor
Department of Computer Science and Department of CSE Department of Computer Science and
Engineering Lloyd Institute of Engineering and Engineering
Graphic Era Deemed to be University Technology Rajalakshmi Institute of Technology
Dehradun, Uttarakhand, India Greater Noida, Uttar Pradesh, India. Chennai, Tamilnadu, India.
[email protected] [email protected] [email protected]

Abstract--Email is a primary mode of communication for categorized as either cancerous or noncancerous depending
software developers due of its convenience. Employing a spam on its severity. When it comes to determining whether or not
filter is required in order to maintain efficient communication. a financial transaction is legal, there is no room for ambiguity.
This examination will primarily concentrate on a spam When there are more than two different categories of labels,
prevention software programme. This article discusses how the
a multi-class classification should be utilized. One such
Machine Learning model that Google updated on Collab can
recognize and prevent almost all spam and phishing emails. This strategy would be to divide reviews of movies into three
indicates that their email spam filter is so efficient that only one categories: positive, negative, and neutral.
message out of one thousand is allowed to pass through. There One of the most common and prevalent difficulties
are various different approaches to machine learning that can
associated with natural language processing is the
be used to identify spam; but, in recent years, the "KNN"
method has become increasingly prominent. In order to classification of strings. Examples of this include the
accomplish the goals of this post, we will do research into the automatic classification of emails as either spam or non-
operation of spam classification algorithms and attempt to spam, as well as the categorization of movies and news pieces
determine how these systems arrive at their findings. The into various genres. In this essay, I will focus more on the
challenge of deciding whether an email should be classified as third example and study it in further detail.
spam or not is referred to as "spam detection."
Keywords: E-mail, Spam Classification, Machine Learning – Problem Description:
KNN. In this article, authors look at the process of deciding
I. INTRODUCTION whether an email is spam or not and try to understand it. This
is called Spam Detection, and it is a problem of putting things
The process by which voice-activated devices respond to into two groups.
questions. As well as the process that determines whether or
not a communication is considered to be spam during the This is done for a simple reason: if one can find
preliminary review.Natural Language Processing (NLP), anonymous and unwanted emails, people can stop spam
which converts text into insights that can be put to use with messages from getting into the user's inbox, making the user's
following data, is the tool that is responsible for carrying out experience better.
all of this work. One of the most difficult areas of AI study is
natural language processing (NLP) because of the contextual
nature of text input. It needs to be changed before it can be
understood by machines, and the feature extraction process
needs to be broken down into steps.
Classification problems can be divided into two primary Figure 1. Emails are sent through a spam detector. If an email is
groups: those with only two classes, sometimes known as detected as spam, it is sent to the spam folder, else to the inbox. (Image
binary classification problems, and those with three or more Source: Ramya Vidyalaya [5]).
groups (multi-class classification problems). In a binary The unsolicited commercial email messages known as
classification system, the labels can only take on one of two spam have become a significant problem on the internet. The
possible forms. For example, a patient's condition can be act of sending unwanted commercial electronic messages is

1024
979-8-3503-9826-7/22/$31.00 ©2022 IEEE

Authorized licensed use limited to: b-on: UNIVERSIDADE DE AVEIRO. Downloaded on July 31,2023 at 15:21:31 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Contemporary Computing and Informatics (IC3I)

known as "spamming," and individuals who engage in this

practice are referred to as "spammers." This category of user
trolls the internet for email addresses, searching through chat
rooms and employing malicious software [1] in order to find
them. People have a more difficult time optimizing their use
of resources like time, space, and bandwidth as a direct result
of spam. Spam can take many forms. The overwhelming
quantity of spam emails that is currently flooding computer
networks is a significant strain on server resources, such as
storage space, communication speed, processing time, and
user attention [2]. Over seventy-seven percent of all email
traffic across the world can be attributed to spam [3] and the Fig 2. Percentage of emails marked as Spam (Source: Statista)
situation is only getting worse with each passing year.
There is a rapid increase in the interest being shown by
II. RELATED WORK the global research community on email spam filtering. In
this section, we present similar reviews that have been
This section discusses the article related to email, spam presented in the literature in this domain. This method is
mails, junks, etc., followed so as to articulate the issues that are yet to be
addressed and to highlight the differences with our current
In recent years, there has been a significant increase in review. Lueg [4 - 5] presented a brief survey to explore the
the number of academics working on the issue of filtering gaps in whether information filtering and information
unwanted email messages known as spam. You will find in retrieval technology can be applied to postulate email spam
this section reviews that are comparable to those of others that detection in a logical, theoretically grounded manner, in order
have been published in this field previously. This strategy is to facilitate the introduction of spam filtering technique that
utilized to provide a detailed account of the unresolved could Fig. 2. Pictorial Representation of the Structure of this
concerns and to explain how those issues differ from the paper. E.G. Dada et al. Helion 5 (2019).
status quo of our current evaluation. Lueg [4] carried out a
quick survey to assess whether or not it is possible to identify There is a wide variety of commercially available
spam email through the utilization of information filtering technology that can be utilized to locate spam emails. When
and information retrieval technologies in a manner that is it comes to constructing broad classifications, you have five
rational and principled. The objective was to make the alternatives to choose from. In order for algorithms to
building of a practical spam-filtering method as easy as determine whether or not an email is spam, these strategies
possible. are utilized.

In the present day that we live in, many individuals use A. Methods for Sorting Data According to the Content of
e-mails as a method of communication for business, for their Its Records:
personal lives, and for their professional lives. In 2018, an
estimated 296 billion emails were sent, which breaks down to The phrases that an email contains, the number of times
an average of 130 emails per person, every day. those phrases appear, and how they are dispersed throughout
the message all play a role in determining whether or not the
Spam is growing more widespread as more people use email is considered spam.
the internet and send more emails than ever before. This is
mostly responsible for the proliferation of spam. Historically, B. Techniques for the Filtration of Spam Derived from
spam has made up more than fifty percent of all email traffic. Previous Experience:
Each and every day, fraud is still responsible for the loss of
millions of dollars. The algorithms that classify incoming emails into spam
and non-spam categories have been trained on the content of
On the other hand, as shown in the graph that follows, emails that have been classified in the past.
the volume of emails of this kind has dramatically fallen since
the year 2016. This is due to the fact that anti-spam tools have C. Heuristics- or rule-based approaches to spam filtering
undergone consistent development throughout the course of can be considered:
recent years.
A "regular expression" is a predetermined set of criteria that
is utilizedby algorithms in order to provide a score for each
individual email message. The scores that emails obtain
determine whether or not they are considered spam by the
system.

1025

D. Methods for identifying spam that were formerly based below are quite popular because they contain many emails.
on similarities between messages have been updated: First, the 2006 Enron corpus datasets, which are 55% spam
emails, Second, the Trec 2007 dataset contains 67% spam
After the attributes of each newly received email have emails and was created in 2007. The dataset is divided into a
been used to generate a vector in a multidimensional space, "train" and a "test" subset using the train/test split. Make sure
those vectors are then applied to the process of plotting points that both sets have the same total number of emails and that
that represent the email. The KNN algorithm is applied in there is an equal number of spam and "ham" emails.
order to identify the spam and non-spam groups that are
geographically closest to these fresh data points before
classifying them.

E. Adaptive spam filtering technology that can respond to

changes in its environment:

Messages that appear to be spam and those that are

legitimate are separated into their own separate categories
when incoming emails are sorted based on their ratings.

III. PROPOSED LEARNING TECHNIQUES

One of the most popular spam detection algorithms is K-

Nearest Neighbor, and this article discusses about the
utilization of a content-based filtering system Fig 3.4. Shows the training data phase in view of percentage.
(KNN).Algorithms based on k-NN are widely used for
clustering problems. The overall architecture of KNN is c) Pre-processing of E-mail content:
shown in figure 3.1. Totally there are 5 steps in order to
implement the process [12]. Tokenizing emails should be the primary focus of
attention. By analyzing lengthy emails into its component
a. Training - Testing Phase: words, which is a process known as tokenization, lengthy
emails can be shortened to a string of symbols called tokens.
These tokens are taken from the various components of the
email, such as the text, the header, the topic, and the
attachment.

Fig 3.2. Shows the training data phase

Email senders currently have the ability to include inline
photos in their messages. Because these emails have no text
and only photographs as attachments, we can safely classify
them as spam. This was a difficult task before Google made
the Tesseract library available as open source. Google's
release simplified the procedure. This resource allows for the
automatic extraction of words from images with a level of
Fig3.3. Shows the Proposed New E-mail Classification precision that varies depending on the level of accuracy
required. In spite of this, both Times New Roman and the
Let's use the current situation of the globe as an Captcha test the limits of a computer's comprehension of
illustration of a pattern that is not typical, in which a large human language in written form [14].
number of companies and other industries are suffering
severe setbacks. A good example of an extraordinary event
that nobody could have anticipated is the epidemic that
spread all over the world. When anything like this happens as
a result of a natural disaster or some other event, it will have
an effect not only on trading prices but also on stock price
charts, businesses, and companies [13].

b) Data Set:

The datasets in a sample are useful for assessing a spam

filter's performance. In the public domain, you can find a
plethora of open-source datasets. The two datasets down Fig3.5. Shows the target data set

1026

d) Feature Extraction and Selection: A. Classification of Machine Learning Techniques

Once the initial processing is complete, researchers focus a) KNN (K-Nearest Neighbor) Implementation:
to have an access of significant vocabulary [6-11]. This is the The clustering algorithm K-Nearest Neighbor is
place where you may keep track of every occurrence of the comparable to Nearest Neighbor. However, it does not just
terms that appear in each column. There are a number of other give the instance that is closest; rather, it takes into account
classes that these traits are eligible for, including the the K instances that are closest to the new one. K-NN assigns
following: a ranking to each new occurrence depending on the frequency
with which the first K cases appear. The value that is assigned
The amount of one-of-a-kind words, the number of one- to K is frequently considered to be a tuning hyperparameter.
of-a-kind meanings for those words, the existence or absence The tried-and-true Hit-and-Try method can be utilized during
of a bag of words labelled "adult material," and so on are all the process of fine-tuning. With this method, we will
significant aspects. arbitrarily generate new values for K and assess how these
new values affect the overall performance of the model [15].
Additional account features include the sender's age, the
total number of recipients, the number of answers they've The first 80% of the data will be used to train the model,
received, and the URL of the sender's website. and 20% will be used to validate it. The data set was taken
from Kaggle.
It is essential to keep in mind that the actual URLs have
not been modified; only the language has changed. Instead of The Euclidean distance can be used to find the case that
using "HTTP google," you would type is closest. The K-NN algorithm can be used for this task with
"https://www.google.com/" if you wanted to use HTTPS. The the Scikit-learn library.
phenomenon being described is one that is frequently referred
to by the term "normalization." There is less weight placed on
the recipient's age, the sender's date of birth, the account's (1)
age, the sender's sexual orientation, and the recipient's age.
Removing stop words, noise, and stemming are three
effective strategies that can be utilized in order to cut down
on the size of these enormous features. The Porter Stemmer
Algorithm is well-known within the field of stemming
algorithms. In most cases, the following is what happens
when we stem:

• Eliminating Final Clauses (-ed, -ing, -full, -ness,

etc.) Fig 3.8. Shows the sample E-mail list and its label.

• The Elimination of Prefixes (Un-, Re-, Pre-, etc.) This section discusses the performance of proposed K-
NN algorithm, It only takes a user to accidentally delete one
essential communication before they begin to question
whether or not the time and effort spent on spam filtering is
truly worthwhile. As a result of this, we have an obligation to
ensure that our algorithm reaches the maximum level of
Figure 3.6. Shows the List of Stop Words precision that is technically feasible. However, there are
academics who contend that accuracy is not the only essential
Considering the below example, parameter that should be considered when evaluating the
performance of spam filtering [16].

How productively this model can be used. It only takes a

user to accidentally delete one essential communication
before they begin to question whether or not the time and
effort spent on spam filtering is truly worthwhile. As a result
of this, we have an obligation to ensure that our algorithm
reaches the maximum level of precision that is technically
feasible. However, there are academics who contend that
Fig3.7. Shows the sample E-mail list and it’s label. accuracy is not the only essential parameter that should be

1027

considered when evaluating the performance of spam that still remained in spam filtering; it has an accuracy rate of
filtering. 93.18%.
REFERENCES
TABLE 3.1. SHOWS THE POSITIVE AND NEGATIVE PREDICTION OF KNN
ALGORITHM.
[1] Srivastava, A., Singh, A., Joseph, S.G.Borole, Y.D., Singh,
H.K.,WSN-IoT Clustering for Secure Data Transmission in E-Health
Sector using Green Computing Strategy,2021 9th International
Conference on Cyber and IT Service Management, CITSM 2021, 2021
[2] Shrivastava, A.; Ranga, J.; Narayana, V.N.S.L.; Chiranjivi; Borole,
Y.D., Green Energy Powered Charging Infrastructure for Hybrid EVs,
2021 9th International Conference on Cyber and IT Service
In order to evaluate the performance of our model for Management, CITSM
classifying spam, authors will make use of the confusion 2021,DOI: 10.1109/CITSM52892.2021.9589027.
matrix, which can be seen below, to compare it against four [3] M. Awad, M. Foqaha, Email spam classification using hybrid approach
of RBF neural network and particle swarm optimization, Int. J. Netw.
distinct criteria. Secur. Appl. 8 (4) (2016).
[4] D.M. Fonseca, O.H. Fazzion, E. Cunha, I. Las-Casas, P.D. Guedes, W.
Meira, M. Chaves, measuring characterizing, and avoiding spam traffic
costs, IEEE Int. Comp. 99 (2016).
[5] Visited on May 15, 2017, Kaspersky Lab Spam Report, 2017, 2012,
https://www.
securelist.com/en/analysis/204792230/Spam_Report_April_2012.
[6] E.M. Bahgat, S. Rady, W. Gad, an e-mail filtering approach using
classification techniques, in: The 1st International Conference on
Advanced Intelligent System and Informatics (AISI2015), November
28-30, 2015, Springer International Publishing, BeniSuef, Egypt, 2016,
pp. 321–331.
[7] C.P. Lueg, from spam filtering to information retrieval and back:
seeking conceptual foundations for spam filtering, Proc. Assoc. Inf.
Sci. Technol. 42 (1) (2005).
[8] Emmanuel Gbenga Dada, Joseph Stephen Bassi, Machine learning for
email spam filtering: review, approaches, and open research problems.
[9] Loredana Fire, Camelia Lemnaru, Spam Detection Filter using KNN
Algorithm and Resampling
[10] Anurag Shrivastava; Chinmaya Kumar Nayak; R. Dilip; Soumya
Fig 3.9 Confusion Matrix of Spam and Ordinary mail.
Ranjan Samal; Sandeep Rout; Shaikh Mohd Ashfaque, Automatic
robotic system design and development for vertical hydroponic
TABLE 3.2 SHOWS THE ACCURACY OF PROPOSED KNN ALGORITHM. farming using IoT and big data analysis,Materials Today: Proceedings,
Data Support Precision Recall F1- Score 2021-07,DOI: 10.1016/j.matpr.2021.07.294
949 0.96 [11] Anurag Shrivastava; Rajneesh Sharma; Mohit Kumar Saxena; V.
Ham 0.99 0.93
Shanmugasundaram; Moti Lal Rinawa; Ankit, Solar energy capacity
166 0.80 assessment and performance evaluation of a standalone PV system
Spam 0.71 0.93
using PVSYST,Materials Today: Proceedings,2021.
0.94 [12] Chawla, P., Chana, I. & Rana, A. A novel strategy for automatic test
avg / total 0.94 0.93
1115 data generation using soft computing technique. Front. Comput. Sci. 9,
Accuracy 0.931835650224216 346–363 (2015). https://doi.org/10.1007/s11704-014-3496-9
[13] G. Dubey, A. Rana and N. K. Shukla, "User reviews data analysis using
opinion mining on web," 2015 International Conference on Futuristic
IV. CONCLUSION Trends on Computational Analysis and Knowledge Management
(ABLAZE), Greater Noida, India, 2015, pp. 603-612, doi:
This research examines the use of machine learning 10.1109/ABLAZE.2015.7154934.
strategies to the process of spam filtering. Recent [14] Dash, Y., Dubey, S. K., & Rana, A., Maintainability prediction of
object oriented software system by using artificial neural network
classification methods used to sort messages into the approach. International Journal of Soft Computing and Engineering
categories of spam or ham are dissected here. It was discussed (IJSCE),2012, 2(2), 420-423.
how various strategies can be used in conjunction with [15] S. Gupta, A. Rana and V. Kansal, "Comparison of Heuristic
machine learning classifiers to tackle spam. Researchers have techniques:A case of TSP," 2020 10th International Conference on
Cloud Computing, Data Science & Engineering (Confluence), Noida,
investigated how spam has developed over time in order to India, 2020, pp. 172-177, doi:
trick detection systems. The purpose of this study is to 10.1109/Confluence47617.2020.9058211.
investigate public datasets and performance indicators that [16] Priyanka Chawla, Inderveer Chana, Ajay Rana, Cloud-based automatic
might be utilized in the process of evaluating spam filters. test data generation framework, Journal of Computer and System
Sciences, Volume 82, Issue 5, 2016, Pages 712-738, ISSN 0022-0000,
The difficulties that machine learning algorithms encounter https://doi.org/10.1016/j.jcss.2015.12.001.
while attempting to combat spam were highlighted, and a
number of different approaches to machine learning were
compared and contrasted with one another. The KNN
algorithm was offered as a solution to address the challenges

1028

Authorized licensed use limited to: b-on: UNIVERSIDADE DE AVEIRO. Downloaded on July 31,2023 at 15:21:31 UTC from IEEE Xplore. Restrictions apply.

BETCK105I/205I Cybersecurity Notes
No ratings yet
BETCK105I/205I Cybersecurity Notes
143 pages
S-CSL Searchable PDF
No ratings yet
S-CSL Searchable PDF
152 pages
GR 9 ENGFA Task 3
No ratings yet
GR 9 ENGFA Task 3
12 pages
Module 3 - Social Media Overview & Security
No ratings yet
Module 3 - Social Media Overview & Security
8 pages
Sample Internship Report
No ratings yet
Sample Internship Report
23 pages
Ai Project
No ratings yet
Ai Project
29 pages
Technology in Alternative Delivery System Report
100% (1)
Technology in Alternative Delivery System Report
52 pages
Spam Detection Synopsis
No ratings yet
Spam Detection Synopsis
8 pages
Real Time Spam Detection
No ratings yet
Real Time Spam Detection
65 pages
ORBIS
No ratings yet
ORBIS
3 pages
Project Report Emaildetection 4 44
No ratings yet
Project Report Emaildetection 4 44
41 pages
Computer Crime and Ethics
100% (1)
Computer Crime and Ethics
20 pages
November Revision Primary 4 (ICT Primary 4)
No ratings yet
November Revision Primary 4 (ICT Primary 4)
5 pages
1822 B Deleted Merged Cropped
No ratings yet
1822 B Deleted Merged Cropped
40 pages
Fundamentals of Blockchain
No ratings yet
Fundamentals of Blockchain
27 pages
Action Technological Plan
No ratings yet
Action Technological Plan
22 pages
Carpet Production Process: S - GTM B.S - (T &)
No ratings yet
Carpet Production Process: S - GTM B.S - (T &)
17 pages
CS GTU Study Material Presentations Unit-4 27092020040601AM
No ratings yet
CS GTU Study Material Presentations Unit-4 27092020040601AM
37 pages
Spamming Complete Guide
No ratings yet
Spamming Complete Guide
30 pages
Sms Spam
No ratings yet
Sms Spam
14 pages
Email Spam Filtering ITS Repository 5216201701-Master - Thesis
No ratings yet
Email Spam Filtering ITS Repository 5216201701-Master - Thesis
82 pages
81.phishing Detection System Through Hybrid Machine Learning Based On Url
No ratings yet
81.phishing Detection System Through Hybrid Machine Learning Based On Url
99 pages
Mobile Device Forensic Tool Test Specification V 3.2
No ratings yet
Mobile Device Forensic Tool Test Specification V 3.2
22 pages
Literature Review On Image Classification
100% (2)
Literature Review On Image Classification
4 pages
Fraud Apps Detection Using Sentiment Analysis and Spam Filtering
No ratings yet
Fraud Apps Detection Using Sentiment Analysis and Spam Filtering
5 pages
Class 7 Cyber Tools
No ratings yet
Class 7 Cyber Tools
20 pages
Taking Action An Advocates Guide To Assisting Victims of Financial Fraud
No ratings yet
Taking Action An Advocates Guide To Assisting Victims of Financial Fraud
68 pages
Comparative Analysis of Network Forensic Tools and
No ratings yet
Comparative Analysis of Network Forensic Tools and
6 pages
E-Commerce Security and Fraud Issues and Protections: Learning Objectives
No ratings yet
E-Commerce Security and Fraud Issues and Protections: Learning Objectives
38 pages
Chapter 1 - Cyber Security
100% (1)
Chapter 1 - Cyber Security
31 pages
Comparative Study of Rumour
No ratings yet
Comparative Study of Rumour
25 pages
Forensic Computing Dissertation Ideas
100% (3)
Forensic Computing Dissertation Ideas
5 pages
The Spam Book On Porn Viruses and Other PDF
No ratings yet
The Spam Book On Porn Viruses and Other PDF
18 pages
CAPTCHA
100% (1)
CAPTCHA
40 pages
Most Used Digital Forensics Tools
No ratings yet
Most Used Digital Forensics Tools
9 pages
My Family's Changing
No ratings yet
My Family's Changing
12 pages
How Do Turn An Issue I Care About Into A Campaign?
100% (2)
How Do Turn An Issue I Care About Into A Campaign?
24 pages
Sessonal 2, Carding
No ratings yet
Sessonal 2, Carding
4 pages
11.piano Time Chlpikova 3.book
No ratings yet
11.piano Time Chlpikova 3.book
31 pages
44 Decision Tree Model For Email Classification
No ratings yet
44 Decision Tree Model For Email Classification
4 pages
Twitch Lawsuit
No ratings yet
Twitch Lawsuit
19 pages
Cyber Crime: by Ramesh Kumar
No ratings yet
Cyber Crime: by Ramesh Kumar
32 pages
Digital Forensic Tools - Aimigos
No ratings yet
Digital Forensic Tools - Aimigos
12 pages
Social Issues and Professional Practice: M5LE5B
No ratings yet
Social Issues and Professional Practice: M5LE5B
2 pages
Fake Product1
No ratings yet
Fake Product1
37 pages
6 C 71 D 419 A 8 D 0
No ratings yet
6 C 71 D 419 A 8 D 0
6 pages
Technical White Paper Template
No ratings yet
Technical White Paper Template
4 pages
Security and Ethical Challenges
98% (44)
Security and Ethical Challenges
19 pages
SKPDD Canto 2 CH 1-10 Question Bank
No ratings yet
SKPDD Canto 2 CH 1-10 Question Bank
31 pages
True-Caller: Oftware Equirements Pecification
No ratings yet
True-Caller: Oftware Equirements Pecification
12 pages
Letter For The Record To Members of The House Committee On Energy and Commerce On Facebook, Transparency and Use of Consumer Data
No ratings yet
Letter For The Record To Members of The House Committee On Energy and Commerce On Facebook, Transparency and Use of Consumer Data
4 pages
Chapter-15 Electrostatic Potential and Capacitance (PG 239 - 266)
No ratings yet
Chapter-15 Electrostatic Potential and Capacitance (PG 239 - 266)
24 pages
Sky Computing
No ratings yet
Sky Computing
43 pages
Anti Forensic 1
No ratings yet
Anti Forensic 1
8 pages
Combating Link Spam: Prof. Soumen Chakrabarti Om P. Damani
No ratings yet
Combating Link Spam: Prof. Soumen Chakrabarti Om P. Damani
23 pages
G-10 Math Final Exam Oromia Education Bureau
No ratings yet
G-10 Math Final Exam Oromia Education Bureau
9 pages
6 Access Layer PDF
50% (2)
6 Access Layer PDF
84 pages
SEO Book
No ratings yet
SEO Book
32 pages
SMS Spam Fraud Prevention
No ratings yet
SMS Spam Fraud Prevention
6 pages
Spam Review Detection Using Natural Language Processing Techniques
No ratings yet
Spam Review Detection Using Natural Language Processing Techniques
6 pages
Present Perfect vs. Past Simple & Past Continuous
No ratings yet
Present Perfect vs. Past Simple & Past Continuous
6 pages
Quentin Tarantino - Pulp Fiction Script
100% (3)
Quentin Tarantino - Pulp Fiction Script
141 pages
SQL Cheat Sheet
No ratings yet
SQL Cheat Sheet
9 pages
Advanced Excel: Multiple Worksheets
No ratings yet
Advanced Excel: Multiple Worksheets
9 pages
Fighting Obfuscated Spam
No ratings yet
Fighting Obfuscated Spam
15 pages
The Willy Report: Proof of Massive Fraudulent Trading Activity at Mt. Gox, and How It Has Affected The Price of Bitcoin - The Willy Report
No ratings yet
The Willy Report: Proof of Massive Fraudulent Trading Activity at Mt. Gox, and How It Has Affected The Price of Bitcoin - The Willy Report
14 pages
Synopsis of Project On Automatic Phishing Email Website Detection System Using Fuzzy Techniques
No ratings yet
Synopsis of Project On Automatic Phishing Email Website Detection System Using Fuzzy Techniques
20 pages
SRS
No ratings yet
SRS
17 pages
Exploring Library Resources and Services For Research and Instruction
100% (1)
Exploring Library Resources and Services For Research and Instruction
40 pages
Mobilities 1st Edition John Urry Instant Download
No ratings yet
Mobilities 1st Edition John Urry Instant Download
51 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
130 pages
Comptency Map 21ST Literature of The Philippines and The World
No ratings yet
Comptency Map 21ST Literature of The Philippines and The World
6 pages
Aktu-Qp BCC302 2023-24 Odd-Sem
No ratings yet
Aktu-Qp BCC302 2023-24 Odd-Sem
4 pages
Session 2.1 Ancient World
No ratings yet
Session 2.1 Ancient World
31 pages
The Sacred Revolution Propaganda and Personality Cult in North Korea
No ratings yet
The Sacred Revolution Propaganda and Personality Cult in North Korea
18 pages
Defining - Non-Defining New Version
No ratings yet
Defining - Non-Defining New Version
6 pages
Science 6 - Week 7 Dll-Bow
No ratings yet
Science 6 - Week 7 Dll-Bow
2 pages
Modelo de Examen de Inglés
No ratings yet
Modelo de Examen de Inglés
5 pages
A Review of Practices and Digital Technology Integration in Reading Instruction and Suggestions For The Philippines
No ratings yet
A Review of Practices and Digital Technology Integration in Reading Instruction and Suggestions For The Philippines
10 pages
Dissertation de Philosophie Sur La Culture
100% (1)
Dissertation de Philosophie Sur La Culture
7 pages
On Page Seo
No ratings yet
On Page Seo
11 pages
Khutbahs by Almaghrib Institute Instructors
No ratings yet
Khutbahs by Almaghrib Institute Instructors
11 pages
SAP BODS Training v1.0
No ratings yet
SAP BODS Training v1.0
10 pages
Template Reading Assessment Monitoring Tool
No ratings yet
Template Reading Assessment Monitoring Tool
4 pages
Executive Summary of Mujib Climate Prosperity Plan
No ratings yet
Executive Summary of Mujib Climate Prosperity Plan
3 pages
Google Input Tool
No ratings yet
Google Input Tool
3 pages
Govardhan Ashtaka
No ratings yet
Govardhan Ashtaka
2 pages
LESSON PLAN Y3 3g Soil Brainstorming
No ratings yet
LESSON PLAN Y3 3g Soil Brainstorming
3 pages
DX Dy Substitution
No ratings yet
DX Dy Substitution
2 pages
Trackpad Ver. 2.0 Class 4
From Everand
Trackpad Ver. 2.0 Class 4
Nidhi Arora
No ratings yet

E-Mail Spam Detection Using Machine Learning KNN

Uploaded by

E-Mail Spam Detection Using Machine Learning KNN

Uploaded by

2022 5th International Conference on Contemporary Computing and Informatics (IC3I)

E-mail Spam Detection Using Machine Learning –

Ajay Reddy Yeruva Deepika Kamboj Dr. Poorna Shankar

Upendra Singh Aswal Dr. A Kakoli Rao Somu C S

known as "spamming," and individuals who engage in this

E. Adaptive spam filtering technology that can respond to

Messages that appear to be spam and those that are

III. PROPOSED LEARNING TECHNIQUES

One of the most popular spam detection algorithms is K-

Fig 3.2. Shows the training data phase

The datasets in a sample are useful for assessing a spam

d) Feature Extraction and Selection: A. Classification of Machine Learning Techniques

• Eliminating Final Clauses (-ed, -ing, -full, -ness,

How productively this model can be used. It only takes a

You might also like