Spam Detection with Logistic Regression

The use of the internet is increasing day by day, and the spammers who consistently try to spam people by sending fraud mails and SMS.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

413 views4 pages

Spam Detection with Logistic Regression

The use of the internet is increasing day by day, and the spammers who consistently try to spam people by sending fraud mails and SMS.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Volume 6, Issue 9, September – 2021 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Spam Message Detection Using Logistic Regression

NIKHIL KUDUPUDI1, SHILPA NAIR2
U.G. Student, School of Engineering, Ajeenkya DY Patil University Pune, India -4121051
U.G. Student, School of Engineering, Ajeenkya DY Patil University Pune, India -4121052

Abstract:- The use of the internet is increasing day by the help of Text classification methods like stemming,
day, and the spammers who consistently try to spam lemmatization, vectorization, etc., it is possible to classify
people by sending fraud mails and SMS. Mails and SMS the mails and train the model, which will be able to detect
are one of the most important and most used means of unwanted mails.
communication, because of which 2.4 billion messages
are sent every one second. With the rise of such exchange In this study, we have come up with our model that
of emails and messages, some find it an opportunity to would classify emails and messages into either spam or
fill other's inbox with preposterous messages that reduce ham. The evaluation metrics for performance such as
internet speed and plunders our personal data. However, accuracy were considered evaluating the proposed study.
due to recent advancements in technology, it is possible The results obtained from experiments confirmed that the
to find solutions to all such problems easily. With the proposed research achieved high accuracy.
help of Natural Language Processing and Machine
Learning, we can quickly detect spam messages. One of II. LITERATURE SURVEY
the crucial aspects of research in the world of machine
learning applications is "NLP". In this paper, we have In this paper [1], (Omay, 2010)the author mentioned
proposed a model where emails would be classified into the history and explained the concept of logistic Regression.
the categories of Spam or Ham. He also explained types of logistic Regression like Binary
Logistic Regression, Multinomial Logistic Regression, and
Keywords:- Spam-Detector, Natural Language Processing, Ordinal Logistic Regression; however, he gave detailed
Logistic Regression. information on binary logistic Regression. The primary
purpose of this paper is to assess the combination of
I. INTRODUCTION independent variable's influence on dependent variables. For
this, the author conducted a study on 200 students from
Technology is advancing at a high rate. A few decades Ankara University, and the dependent/target variable was
back, the only source of communication was the letters, critical thinking. The author found that an increase of one
which turned into telegrams, and in recent times it is in unit in scientific thinking led directly to a 14.4 percent
various forms like emails, phone calls, SMS, etc. An increase in critical thinking, and a rise of one unit in
average person sends 72 messages per day, as texting is also epistemological belief resulted in a 4.9 percent increase in
the most common cell phone activity. Almost 300 billion high critical thinking.
emails are exchanged per day, and half of them are spam
emails. 'Spam Mail' is basically undesired and unwanted In this paper [2], (Lei, 2018) author 'Liu Lei' showed
emails that are sent to many of recipients that is just filling how logistic Regression could be used quickly and
up all the inboxes. Most of these messages are product efficiently to detect Breast Cancer. He applied a logistic
buying links, which would consume our personal data or regression model to the breast cancer dataset. The author got
could be some links and attachments. Sometimes the most accurate results with an accuracy of 96.5% when
carelessness from some users can cause significant damage 'Maximum Texture' and 'Maximum Perimeter' were chosen
to their personal data. Spam mails not only fill your inbox as input to the model. In contrast, he got an accuracy of
with junk mails but also cause email traffic. Spam messages 90.48% when he took 'Mean Texture' and 'Mean Radius' as
accounted for 45.1% of email traffic in March 2021. In input to the model. Therefore, choosing a better feature
short, such mails can be frustrating and dangerous at the combination will give more accurate results.
same time.
In this paper [3], (Radulescu, M.Dinsoreanu, &
Inboxes are 85% filled with Spam mails and due to R.Potolea, 2014) the main goal is to detect spam comments.
which the valuable and important emails are ignored. Many This was achieved by considering unclear comments with
researchers are developing various techniques to find the increased punctuation marks, new lines stop words, non-
solution for such problems and secure to communication. ASCII characters, new lines, capital letters, and offensive
Since the unsolicited emails are termed 'Spam', important words and converting them into vectors to classify them into
and valuable ones are termed 'Ham'. spam or non-spam comments. Next, they added word
duplication ratio as spam comments tend to have repeated
There are many techniques developed to classify such words and stop words ratio, which is the count of stop words
spam and ham mails. One such technique is by using divided by the total count of words in the comment. This
Natural language Processing and Machine Learning. With increased the accuracy of classification. Finally, they added

IJISRT21SEP728 www.ijisrt.com 815

Volume 6, Issue 9, September – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
post-comment similarity and topic similarity to remove compared to a threshold value. Spam is defined as email
comments unrelated to specific context. The authors also messages that exceed a specified number of recipients.
showed decision tree classifier works better with their spam
detection model. The dataset was subjected to a series of experiments
based on Natural Language Processing (NLP) principles
The authors of this paper [4] (Qaiser, Shahzad, & Ali, such as label encoding, tokenization, stemming, stop word
2018) authors explained what is Term Frequency Inverse removal, and generating features before being subjected to
Document Frequency, how does TI-IDF works. They also an ensemble approach - voting classifier. [9] (Pragna &
discussed strengths and weakness of TI-IDF and how to .RamaBai, 2019)All of the trials in the model correctly
overcome them. First, they collected data from different categorize the data set. The algorithms used in this study
domains and removed stop words from data, then they produced good accuracy results. However, Support Vector
applied TI-IDF on the processed data and displayed the Classifier, with an accuracy of 98.49 percent, is the best
results. The displayed results showed keywords and their predictor of spam messages among the numerous trials
TF-IDF value of different domains. Top keywords from conducted. Other methods have a comparable level of
'.biz', '.com', '.edu' and '.org' domain were parts, presidential, precision, with a variance of around 3%.
years and Marketing respectively.
III. MATERIALS AND METHODS
The Authors[5] (Sjarif, Nila, & Amir, 2019) of this
paper used Term frequency Inverse Document Frequency Dataset
and Random Forest to detect spam messages. The data was The main aim of our project was to detect spam
collected from UCI Machine Learning Repository. Before messages accurately. For this, we have taken the "SMS
applying the TI-IDF, they did some preprocessing like Spam Collection Dataset" from Kaggle.com. The dataset
removing stop words as these messages contain special contains 5574 messages with tags either legitimate/Ham or
symbols, pronouns, and prepositions, which do not help in spam. There are 5574 messages in the dataset, out of which
spam identification. After applying TI-IDF, the authors used 4825 legitimate messages and 747 spam messages. The text
multiple classification algorithms and found that Random messages were compiled from various accessible research
Forest gave better Accuracy, Precision and F-measure sources like 425 spam messages were manually selected
compared to other classification algorithms. from the "Grumbletext" website. 3375 messages were
chosen at random from the National University of Singapore
In this paper[6] (Pandey & Yadav, 2020), the author SMS Corpus (NSC). 450 ham messages were collected from
proposed a model where deep neural networks are exploited "Caroline Tag's" Ph.D. Thesis and 1324 messages were
for detecting spam mails using Tensor Flow. This model gathered from "SMS Spam Corpus v.0.1 Big" out of which
uses a linguistic approach, demonstrating the advantage of 1002 were spam messages, and 322 were legitimate
automatically neural networks. This paper also surveyed messages
various publicly available datasets and noted the basic
structure of the model. They have also revealed plentiful of Packages
open research problems related to spam filters. To work on our project, we have imported different
packages. The "pandas" package was imported to read the
Spam filters' sole purpose is analyzing the incoming dataset and to convert categorical data into indicator
data into unwanted(Spam) or wanted(Ham). Many variables like 0 and 1 using "get_dummies" function. "nltk"
researchers have come up with various types of filters. [7] package was used to get functions like "stopwords",
(Shankar, 2018)The Model proposed in this paper uses "porterstemmer" and "tfidvectorizer" to work on the test
Natural Language processing and Naïve Bayes. This processing. "re" package (Regular Expression Operations)
Bayesian Spam Filter is trained, and a database is was also used for processing text data. "sklearn" package
maintained to store and track the spam and ham messages. was imported to get "train_test_split" and
The messages are split into tokens and messages can be "logisticRegression" function. "train_test_split" function
analyzed once the token database is created aby the filter. was used to split the data into training and testing dataset
The model also introduces a threshold counter that helps to while, "logisticRegression" function was for prediction
maintain the spam filter efficiency. model. "seaborn" and "matplotlib" packages were imported
to plot confusion matrix of our final result. "joblib" package
Different Spam classification methods are used to was imported to save the model and use it again without
classify data into groups.[8] (Emmanuel, Gbengadada, & repeating every process to make predictions.
Joseph, 2016) Some of such types include Random Decision
Tree, probabilistic Method, Support Vector Machine, IV. PROPOSED ANALYSIS APPROACH AND
Artificial Neural networks, etc. These classification RESULTS
techniques have been shown in the literature to be useful for
spam mail filtering when combined with a content-based 1. Data Preprocessing
filtering strategy that recognizes specific features (keywords Dataset had five columns, out of which three had no
frequently utilized in spam emails). The likelihood for each values, and all the columns did not have a proper name. We
feature in the email is determined by the frequency with removed those three columns as they were of no use and
which these qualities appear in emails, which is then gave the other two columns proper names. The column with

IJISRT21SEP728 www.ijisrt.com 816

Volume 6, Issue 9, September – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
"spam/ham" categorical values were converted into numeric will be 0.02. Term Frequency alone will not give a good
values as machine learning algorithms work well with idea as some insignificant words might occur multiple times
numeric data. This was done using the "get_dummies" in a document but do not have much weightage. As Term
function of the "pandas" package. "get_dummies" function Frequency treats every word equally, but every word has a
converts a given column into two or more new columns with different significance, Inverse Document Frequency (IDF) is
values in 0's and 1's based on categorical values present in used to tackle this issue. IDF helps reduce the weightage of
the old column. terms that are very common in a set of documents. IDF is
calculated by taking the log of the total number of
Label Encoding refers to change the value in numeric documents divided by the number of documents in which
form so it can be Machine-readable. After conversion, that specific term is present. Let us suppose a word A1 is
Machine Learning Algorithms can decide how to operate present in 10 documents out of 100 and word A2 is present
with those labels. This is an essential step for Supervised in 60 documents out of 100 therefore, the IDF of A1 and A2
Machine Learning. will be log(100/10) = 1 and log(100/60)=0.22 respectively.
Term Frequency – Inverse Document Frequency is obtained
1.1.Stop Words Removal by multiplying Term Frequency(TF) and Inverse Document
For the Machine to understand, analyze and operate Frequency(IDF).
Natural Language Processing on the data, the texts (emails
in the dataset) should be readable. Machines do not 2. Implementation of Algorithm
understand human language, so we need to preprocess the As cleaning and preprocessing of the dataset is done,
data to make our data understandable by machines. To be we can use "train_test_split" function to divide the dataset
pristine, we need to clear out useless data from the dataset. into training and testing data. To implement the training data
Such useless words are known as 'Stopwords'. on the model and predict whether the text is spam or not, we
need to import Logistic Regression algorithm from the
Some common examples of stopwords are 'is', 'are', 'a', "scikit-learn" library and performance metrics. In our
'as', etc. Stopwords are commonly used in NLP and even in project, we have used the Logistic Regression algorithm for
text mining to eliminate useless information. classification purpose. Logistic Regression is an excellent
predictive modeling algorithm that models probabilities for
1.2. Stemming classification problems with two or more possible outcomes.
Stemming refers to reducing the word to its root word, Logistic Regression is similar to Linear Regression, where
mostly by removing the suffix. It shortens the vocabulary we get an S-shaped line to get output in either 0’s or 1's
space, which in turn helps to speed up the process. It is one instead of a straight line. To get this S shape curve, Logistic
more method to normalize sentences for machines. Regression uses the sigmoid function. The sigmoid function
gives probabilities between 0 and 1. In our model, logistic
Regression will give us whether the message is spam or not.
Where if it’s 1, it would be spam else it would be ham if the
value is 0.

Fig 1: Stemming Fig 2: Logistic Regression

1.3.TF-IDF Let's Suppose you get the following message on your

Now we need to convert text data into vectors as the phone:
machine learning algorithm works only on numeric data. For "CONGRATULATIONS!! Your email address has won a
this, we will use Term Frequency-Inverse Document lottery sum of USD 2,500,000.00. To claim your prize,
Frequency (TF-IDF). please contact our office via email
[email protected] or call +44 704 675 12446"
Term frequency (TF) is used to measure the frequency Here keywords are [lottery, prize, office, email]
of a word in a document. It is found by dividing the The given weight vector is w = [0.3, 0.3, −0.1, −0.04] T
frequency of a word by the total number of words in that The probability that the email is spam will be:
document. Let us suppose we want to find the TF of the
word 'Health', which occurs 20 times in a document of 1000 𝑥 = [1,1,1,2]𝑇
words long. Therefore, the TF of Health in that document 𝑤 𝑇 𝑥 = 0.3 ∗ 1 + 0.3 ∗ 1 − 0.1 ∗ 1 − 0.04 ∗ 2 = 0.42 > 0

IJISRT21SEP728 www.ijisrt.com 817

Volume 6, Issue 9, September – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
1 REFERENCES
Pr(𝑦 = 1|𝑥) = 𝜎(𝑤 𝑇 𝑥) = = 0.603
1 + 𝑒 −0.42
[1]. Sjarif, Nila, & Amir, N. (2019). SMS Spam Message
Detection using Term Frequency-Inverse Document
Frequency and Random Forest Algorithm. Procedia
Computer Science , 509-515.
[2]. Shankar, S. (2018). Advanced Detection of Spam And
Email Fitering using NLP algorithms. IJARIT .
[3]. Radulescu, C., M.Dinsoreanu, & R.Potolea. (2014).
Identification of Spam Comments using Natural
Language Processing Techniques. ICCP..
[4]. Qaiser, Shahzad, & Ali, R. (2018). Text Mining: Use
of TF-IDF to Examine the Relevance of Words to
Documents. International Journal of Computer
Applications , 25-29.
[5]. Pragna, B., & .RamaBai, M. (2019). Spam Detecting
Fig 3: Working of Model using NLP Techniques. IJRTE .
[6]. Pandey, S., & Yadav, R. (2020). Email Spam
In order to test the accuracy of our model, an accuracy Detection using Machine Learning and Deep Learning.
score metric is used. This metric compares the predicted IJRASET .
results with the actual results. After running the code, we [7]. Omay, C. (2010). Logistic Regression: Concept and
got 96% accuracy. We have also plotted a heat map to get an application. 2-3.
idea of how accurate our predicted values are compared to [8]. Lei, L. (2018). Research on Logistic Regression
actual values. Algorithm of Breast Cancer Diagnose Data by
Machine Learning. ICRIS, (pp. 3-4).
[9]. Emmanuel, Gbengadada, & Joseph. (2016). Machine
learning for email spam filtering: review, approaches
and open research problems. Heliyon

Fig 4: Heat Map

V. CONCLUSION AND FUTURE SCOPE

In this study, we looked into the general applications

of spam detecting using NLP. We also reviewed the step-by-
step process of the algorithm and how it classifies the mail
into spam and Ham. The dataset we used in this paper was
publicly available, and performance metrics was also
implanted to check the model's accuracy. In the future, we
can use neural network and deep learning models to predict
a given message is spam or not. Deep learning works very
well for natural language processing; however, it requires a
vast amount of data to give accurate results and to
outperform other traditional machine learning algorithms.
Since Natural Language Processing is a relatively
underdeveloped area for research, further enhancements can
be made to the proposed system for spam detection and
email filtering in the field of online security

IJISRT21SEP728 www.ijisrt.com 818

Automated Spam Detection Using ML
No ratings yet
Automated Spam Detection Using ML
4 pages
SMS PDU Protocol Explained
No ratings yet
SMS PDU Protocol Explained
18 pages
Unicode SMS API User Manual
100% (1)
Unicode SMS API User Manual
110 pages
SMS Gateway Security System
100% (1)
SMS Gateway Security System
5 pages
Detection of Spams Using Extended ICA & Neural Networks
No ratings yet
Detection of Spams Using Extended ICA & Neural Networks
6 pages
Vipul S Razor
No ratings yet
Vipul S Razor
3 pages
How SMS Messaging Works
No ratings yet
How SMS Messaging Works
15 pages
Phishing Scams: Impact & Prevention
No ratings yet
Phishing Scams: Impact & Prevention
16 pages
ICO Letter
No ratings yet
ICO Letter
2 pages
Phishing Detection via URL Scraping
No ratings yet
Phishing Detection via URL Scraping
6 pages
Glossary - Malwarebytes
No ratings yet
Glossary - Malwarebytes
63 pages
CC Tutorial
No ratings yet
CC Tutorial
190 pages
Certificate Authority (CA) : Truststore Directory Structure
No ratings yet
Certificate Authority (CA) : Truststore Directory Structure
2 pages
Email Spam Classifier Using GaussianNB
No ratings yet
Email Spam Classifier Using GaussianNB
3 pages
Email List for Trap Campaign
No ratings yet
Email List for Trap Campaign
5 pages
Setting Up Postfix for Phishing Tests
No ratings yet
Setting Up Postfix for Phishing Tests
16 pages
Phishing Email Detection Using Forensics
No ratings yet
Phishing Email Detection Using Forensics
10 pages
Whit E Pap ER: Australian SMS SPAM Compliance
No ratings yet
Whit E Pap ER: Australian SMS SPAM Compliance
9 pages
Origin of the Word 'Spam' Explained
No ratings yet
Origin of the Word 'Spam' Explained
2 pages
Bandwidth Bandits
No ratings yet
Bandwidth Bandits
9 pages
Network Virus Spreading Guide
No ratings yet
Network Virus Spreading Guide
10 pages
SMS Spam and Fraud Prevention Guide
No ratings yet
SMS Spam and Fraud Prevention Guide
18 pages
SMS Text Messaging (13.50A/70) : Click On The Chapter Titles Below To View Them
No ratings yet
SMS Text Messaging (13.50A/70) : Click On The Chapter Titles Below To View Them
56 pages
Network Sniffing and Spoofing Guide
100% (1)
Network Sniffing and Spoofing Guide
5 pages
Sendmail Setup for Sysadmins
No ratings yet
Sendmail Setup for Sysadmins
9 pages
Websms Panel 6.1 - Bulk SMS Reseller Panel
No ratings yet
Websms Panel 6.1 - Bulk SMS Reseller Panel
21 pages
Httrack Users Guide (3.10)
No ratings yet
Httrack Users Guide (3.10)
47 pages
Tutorial 1 Internet - Question
No ratings yet
Tutorial 1 Internet - Question
2 pages
SMS
No ratings yet
SMS
16 pages
Utl SMTP
No ratings yet
Utl SMTP
3 pages
Apache SSL Module Configuration Guide
No ratings yet
Apache SSL Module Configuration Guide
24 pages
Payments For Electronic Commerce
No ratings yet
Payments For Electronic Commerce
48 pages
Beginner's Guide to Anonymous Emails
No ratings yet
Beginner's Guide to Anonymous Emails
3 pages
Silent SMS
100% (1)
Silent SMS
6 pages
YouTube Spam Detection Study
No ratings yet
YouTube Spam Detection Study
3 pages
Managed SMS Firewall: Guard Against Fraud and Spamming
No ratings yet
Managed SMS Firewall: Guard Against Fraud and Spamming
2 pages
Attendance Bot Tutorial
No ratings yet
Attendance Bot Tutorial
38 pages
Web Sms Application Suite
No ratings yet
Web Sms Application Suite
4 pages
Keylogger Setup for Developers
No ratings yet
Keylogger Setup for Developers
2 pages
MailSniper Field Manual
No ratings yet
MailSniper Field Manual
2 pages
Email Scam Awareness Guide
No ratings yet
Email Scam Awareness Guide
18 pages
SMS Defense White Paper
No ratings yet
SMS Defense White Paper
16 pages
How To Prevent Internal Email Spoofing in Exchange
No ratings yet
How To Prevent Internal Email Spoofing in Exchange
7 pages
Mitel 3300 Voicemail to Email Setup Guide
No ratings yet
Mitel 3300 Voicemail to Email Setup Guide
25 pages
Email Spam Detection Techniques
No ratings yet
Email Spam Detection Techniques
5 pages
OPEN Connectivity SMS Hubbing Architecture 2.0 27 September 2012
100% (1)
OPEN Connectivity SMS Hubbing Architecture 2.0 27 September 2012
96 pages
SSN
No ratings yet
SSN
2 pages
NISR The PHISHING Guide
No ratings yet
NISR The PHISHING Guide
42 pages
Phishing Attack Documentation
No ratings yet
Phishing Attack Documentation
2 pages
Electronic Mail (E-MAIL ARCH, MIME, SMTP, IMAP, POP3) PDF
100% (1)
Electronic Mail (E-MAIL ARCH, MIME, SMTP, IMAP, POP3) PDF
45 pages
Understanding Phishing Attacks and Prevention
No ratings yet
Understanding Phishing Attacks and Prevention
17 pages
FTP Command Line
No ratings yet
FTP Command Line
7 pages
Max Bulk Mailer HowTo Guide
No ratings yet
Max Bulk Mailer HowTo Guide
26 pages
Why Phishing Works
No ratings yet
Why Phishing Works
10 pages
What Does Spam Mean?: Spamming Is The Use of Messaging Systems To Send An Unsolicited Message (Spam)
No ratings yet
What Does Spam Mean?: Spamming Is The Use of Messaging Systems To Send An Unsolicited Message (Spam)
16 pages
HowTo SMTP EMailConfigurationSetup Soln 699429
No ratings yet
HowTo SMTP EMailConfigurationSetup Soln 699429
8 pages
Spam Detection in Email Using Machine Le
No ratings yet
Spam Detection in Email Using Machine Le
8 pages
Email Spam Detection with ML
No ratings yet
Email Spam Detection with ML
5 pages
46 - Ijme... Mech Engg..Research Paper-1
No ratings yet
46 - Ijme... Mech Engg..Research Paper-1
10 pages
Logistic Regression For Spam Filtering: Niclas Englesson
No ratings yet
Logistic Regression For Spam Filtering: Niclas Englesson
37 pages
SecureAid Bot
No ratings yet
SecureAid Bot
8 pages
Factors of Service Quality with Patient Satisfaction Using Social Security Administrating Agency (BPJS) Health in Hospital Inpatient Rooms
No ratings yet
Factors of Service Quality with Patient Satisfaction Using Social Security Administrating Agency (BPJS) Health in Hospital Inpatient Rooms
5 pages
Bird Diversity of Govindgarh Lake, Rewa (Madhya Pradesh) and Its Potential for Nature-Based Tourism
No ratings yet
Bird Diversity of Govindgarh Lake, Rewa (Madhya Pradesh) and Its Potential for Nature-Based Tourism
7 pages
From Talent Insights to Market Impact: The Role of AI in Linking HR Analytics and Marketing
No ratings yet
From Talent Insights to Market Impact: The Role of AI in Linking HR Analytics and Marketing
5 pages
Body Adipo-Structuring
No ratings yet
Body Adipo-Structuring
5 pages
SentinelAI: An Intelligent Real-Time Face Recognition Framework for CCTV Surveillance
No ratings yet
SentinelAI: An Intelligent Real-Time Face Recognition Framework for CCTV Surveillance
9 pages
Intelligent Customer Segmentation in Digital Commerce Using K-Means Clustering
No ratings yet
Intelligent Customer Segmentation in Digital Commerce Using K-Means Clustering
12 pages
Enhanced FP-Growth Framework and Apriori Algorithm Utilizing TDA for Big Data Analysis
No ratings yet
Enhanced FP-Growth Framework and Apriori Algorithm Utilizing TDA for Big Data Analysis
10 pages
Management of Renal Angiomyolipomas in Pregnancy: A Protocol is Needed
No ratings yet
Management of Renal Angiomyolipomas in Pregnancy: A Protocol is Needed
5 pages
An Improved Efficient FP-Growth Algorithm Using FP-TDA Algorithm
No ratings yet
An Improved Efficient FP-Growth Algorithm Using FP-TDA Algorithm
9 pages
AI Prompting and the Development of Prompters: Implications for Nigeria’s Technological Future
No ratings yet
AI Prompting and the Development of Prompters: Implications for Nigeria’s Technological Future
9 pages
Analyzing the Influence of Market Structures on Rice Pricing in San Jose City, Nueva Ecija
No ratings yet
Analyzing the Influence of Market Structures on Rice Pricing in San Jose City, Nueva Ecija
49 pages
School Violence and the Role of Social Climate in Student Protection: A Case Study from Cameroon
No ratings yet
School Violence and the Role of Social Climate in Student Protection: A Case Study from Cameroon
4 pages
Isolation and Characterization of Bacteria Culture Mimics of Vibrio Cholerae from Drinking Water Samples in Internally Displaced Persons (IDPS) Camps Within North Central Nigeria
No ratings yet
Isolation and Characterization of Bacteria Culture Mimics of Vibrio Cholerae from Drinking Water Samples in Internally Displaced Persons (IDPS) Camps Within North Central Nigeria
5 pages
The Impact of Business Process Automation and Robotic Process Automation (RPA) on Telecom Operational Performance: A Case Study of STC
No ratings yet
The Impact of Business Process Automation and Robotic Process Automation (RPA) on Telecom Operational Performance: A Case Study of STC
3 pages
Examining the Effect of Human Resource Management Practices on Employee Performance in Public Sector Organisations: A Systematic Review
No ratings yet
Examining the Effect of Human Resource Management Practices on Employee Performance in Public Sector Organisations: A Systematic Review
15 pages
Smart-LungNet for Lung Disease Classification
No ratings yet
Smart-LungNet for Lung Disease Classification
4 pages
A Study on Awareness Level of Financial Literacy and Factors Influencing the Investment Decision among Employed Women in Vatakara City
No ratings yet
A Study on Awareness Level of Financial Literacy and Factors Influencing the Investment Decision among Employed Women in Vatakara City
7 pages
Characterization, Synthesis, Analytical Application of Composite Cation Exchange Materials for Environmental Metal Ion Separation
No ratings yet
Characterization, Synthesis, Analytical Application of Composite Cation Exchange Materials for Environmental Metal Ion Separation
6 pages
Influence of Promotional Channels on Consumer Purchasing Behavior for OTC Medical Devices
No ratings yet
Influence of Promotional Channels on Consumer Purchasing Behavior for OTC Medical Devices
8 pages
Design and Fabrication of Automated Waste Segregation System
No ratings yet
Design and Fabrication of Automated Waste Segregation System
10 pages
Melatonin and its Application in Dentoalveolar Surgery: A Review of Literature
No ratings yet
Melatonin and its Application in Dentoalveolar Surgery: A Review of Literature
6 pages
Phytochemical and Biological Characterization of Phyllanthus emblica Seed Extract
No ratings yet
Phytochemical and Biological Characterization of Phyllanthus emblica Seed Extract
11 pages
Assessing the Impact of Information and Communication Technology (ICT) on Student Learning in Higher Education: Evidence from Milton Margai Technical University and the College of Business and Information Technology, Sierra Leone
No ratings yet
Assessing the Impact of Information and Communication Technology (ICT) on Student Learning in Higher Education: Evidence from Milton Margai Technical University and the College of Business and Information Technology, Sierra Leone
8 pages
Human-Wildlife Conflict and its Impact on Tourism in Manas and Kaziranga National Parks
No ratings yet
Human-Wildlife Conflict and its Impact on Tourism in Manas and Kaziranga National Parks
6 pages
AI-Based Fraud Detection in the Telecom Sector
No ratings yet
AI-Based Fraud Detection in the Telecom Sector
4 pages
Challenges of Formalizing Informal SMEs in the Telecommunications Sector in the Democratic Republic of Congo
No ratings yet
Challenges of Formalizing Informal SMEs in the Telecommunications Sector in the Democratic Republic of Congo
4 pages
Deepfake Detection in Manipulated Images/ Audio
No ratings yet
Deepfake Detection in Manipulated Images/ Audio
11 pages
Web-Based Engagement and Confirming Behavior of Junior High School Students
No ratings yet
Web-Based Engagement and Confirming Behavior of Junior High School Students
3 pages
MediaPipe Based Workout Monitoring System Using BlazePose Models
No ratings yet
MediaPipe Based Workout Monitoring System Using BlazePose Models
8 pages
TAR 2017 ProjectReports
No ratings yet
TAR 2017 ProjectReports
69 pages
Unit 1
No ratings yet
Unit 1
177 pages
Finaldiissertationreportformanipal
No ratings yet
Finaldiissertationreportformanipal
85 pages
1912.00778 Finding New Customers
No ratings yet
1912.00778 Finding New Customers
3 pages
Study On Application of Graph Theory in Artificial Intelligence (AI)
No ratings yet
Study On Application of Graph Theory in Artificial Intelligence (AI)
8 pages
Agentic AI
No ratings yet
Agentic AI
26 pages
Group Synopsis
No ratings yet
Group Synopsis
127 pages
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
0% (1)
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
2 pages
AI in Material Science Revolutionizing Construction in The Age of Industry 4.0
No ratings yet
AI in Material Science Revolutionizing Construction in The Age of Industry 4.0
50 pages
Conference Paper
No ratings yet
Conference Paper
10 pages
Exploringthe Profound Impactof Artificial Intelligence Applications Quillbot
No ratings yet
Exploringthe Profound Impactof Artificial Intelligence Applications Quillbot
24 pages
AI's 2023 Impact on Industries
No ratings yet
AI's 2023 Impact on Industries
2 pages
A Survey On Legal Judgment Prediction Datasets Metrics Models and Challenges
No ratings yet
A Survey On Legal Judgment Prediction Datasets Metrics Models and Challenges
22 pages
Deep Learning
No ratings yet
Deep Learning
22 pages
Lecture03 Language Modelling
No ratings yet
Lecture03 Language Modelling
28 pages
Shifra Your Intelligent Virtual Assistant
No ratings yet
Shifra Your Intelligent Virtual Assistant
9 pages
Unveiling The Power of Large Language Models A Comparative Study of Retrieval-Augmented Generation Fine-Tuning and Their Synergistic Fusion For Enhanced Performance
No ratings yet
Unveiling The Power of Large Language Models A Comparative Study of Retrieval-Augmented Generation Fine-Tuning and Their Synergistic Fusion For Enhanced Performance
16 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
11 pages
DocuBot - A Context-Aware Chatbot For Document-Driven Interactions AIDS - Major - Project - Report - Docx-1
No ratings yet
DocuBot - A Context-Aware Chatbot For Document-Driven Interactions AIDS - Major - Project - Report - Docx-1
41 pages
(IJCST-V11I5P2) :MR U. Bhargav Kumar, SMT M. Prashanthi, SMT D. Madhuri
No ratings yet
(IJCST-V11I5P2) :MR U. Bhargav Kumar, SMT M. Prashanthi, SMT D. Madhuri
7 pages
Distributed Computing UT1 QuestionPaper - 2024-25
No ratings yet
Distributed Computing UT1 QuestionPaper - 2024-25
2 pages
Tech's Role in Modern Society
No ratings yet
Tech's Role in Modern Society
18 pages
Rakyat Digital-DVET - M2 Natural Language Processing and Computer Vision
No ratings yet
Rakyat Digital-DVET - M2 Natural Language Processing and Computer Vision
24 pages
Complex Word Mathematics in Natural Language Processing (NLP) PDF
No ratings yet
Complex Word Mathematics in Natural Language Processing (NLP) PDF
10 pages
Abhishek D
No ratings yet
Abhishek D
1 page
Conversational Image Recognition Chatbot Presntation
No ratings yet
Conversational Image Recognition Chatbot Presntation
6 pages
Finbert: Financial Sentiment Analysis With Pre-Trained Language Models
No ratings yet
Finbert: Financial Sentiment Analysis With Pre-Trained Language Models
11 pages
Deep Neural Network Based Conversational Chatbot: (Mr. Chandrappa S, Dr. Manjuprasad B)
No ratings yet
Deep Neural Network Based Conversational Chatbot: (Mr. Chandrappa S, Dr. Manjuprasad B)
10 pages
Principles of Natural Language Processing
No ratings yet
Principles of Natural Language Processing
264 pages
TST-GAN A Legal Document Generation Model Based On Text Style Transfer
No ratings yet
TST-GAN A Legal Document Generation Model Based On Text Style Transfer
4 pages

Spam Detection with Logistic Regression

Uploaded by

Spam Detection with Logistic Regression

Uploaded by

Volume 6, Issue 9, September – 2021 International Journal of Innovative Science and Research Technology

Spam Message Detection Using Logistic Regression

IJISRT21SEP728 www.ijisrt.com 815

IJISRT21SEP728 www.ijisrt.com 816

Fig 1: Stemming Fig 2: Logistic Regression

1.3.TF-IDF Let's Suppose you get the following message on your

IJISRT21SEP728 www.ijisrt.com 817

Fig 4: Heat Map

V. CONCLUSION AND FUTURE SCOPE

In this study, we looked into the general applications

IJISRT21SEP728 www.ijisrt.com 818

You might also like