0% found this document useful (0 votes)

315 views14 pages

SMS Spam Detection and Classification Using NLP Thesis

This document outlines the methodology for an SMS spam detection and classification project. The methodology includes collecting an SMS dataset with spam and non-spam messages, preprocessing the data, extracting patterns using FP-Growth algorithm, classifying messages as spam or non-spam using Naive Bayes and J48 algorithms, and evaluating performance to balance fitting and generalization. The expected result is a model that can accurately identify spam messages.

Uploaded by

oyeyemidare1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

315 views14 pages

SMS Spam Detection and Classification Using NLP Thesis

Uploaded by

oyeyemidare1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

METHODOLOGY

SMS Spam Detection and Classification to

Combat Abuse in Telephone Network

Supervised By: Dr. Adebola Ojo

Submitted By: Oyeyemi Dare Azeez (192509)

2023
OVERVIEW
• Introduction
• Problem Statement
• Literature Review
• Aim and Objectives
• Methodology
• Dataset
• SMS Spam Detection Phases
• Data Pre-processing with BERT Model
• Feature Extraction and Selection
• SMS Message Spam Classification
• Performance Evaluation
• Expected Result
• References
INTRODUCTION
• Many people now consider their mobile phones to be a kind of devoted companion an practically everyone has a
mobile phone, whether it be a smartphone or not, with the ability to send and receive text messages.

• According to (Shirani-Mehr, Houshmand, 2013), SMS is a text-based medium that enables mobile phone users to
share a short text message (usually limited to 160 7-bit characters) and has become one of the most widely used
methods for individuals to communicate electronically.

• In Nigeria, more people use SMS messages than emails to communicate because it doesn't require an internet
connection and is quick and easy. SMS has become a multi-million service in the telecommunications industry
due to the explosive growth of mobile devices and the millions of people who send messages daily (National
Bureau of Statistics, 2019).

• The negative aspect of the rise in mobile users and the low cost of SMS text messages is that mobile phones are
receiving more unsolicited bulk messages, particularly in adverts, which has led to the SMS spam issue.
INTRODUCTION cont’d
• SMS spams nonetheless is endangering mobile users privacy with phishing and fraud on daily basis.

• Keyword filters have been the most common strategy utilized to distinguish between spam and non-spam messages
(ham), using Statistical Learning Theory, Artificial Neural Networks (ANNs), and Support Vector Machines (SVMs)
(Suleiman & Al-Naymat, 2017).

• In numerous experiments of different classification algorithms, some perform better with specific training datasets
while performing poorly with other training datasets for no logical reason (Megha Rathi, Vikas Pareek, 2013).

• There are numerous spam filtering techniques in use, however, no single spam filtering strategy can be guaranteed
to be 100% effective at eradicating spam issues but with the application of text mining techniques to SMS, it will
improve the effectiveness of detecting and classifying spam messages to combat telephone abuse (Vikas Pareek,
2013).

• The objective of this research is to propose an alternative method to address the problem of SMS spam message
identification and classification utilizing Naive Bayes, C4.5(J48), and Frequent Pattern (FP)-Growth Algorithm.
PROBLEM STATEMENT
• Unsolicited message sent to a mobile phone user is usually regarded as spam and this
problem occurs when a mobile user does not want to receive a particular text or text
from a particular type of IDS (Joe, I., & Shim, 2010).

• SMS is less formal than a standard document text due to limited characters (maximum
7-bit 160 characters), all of these makes it difficult to classify as spam.

• Nigeria’s network providers offer an SMS service called Do Not Disturb (DND) Nigerian
Communications Commission (NCC). There is no denying the efficiency of this DND
solution, but because it also prevents ham messages from reaching the target device,
it cannot ensure the complete elimination of spam issues.

• Numerous spam filtering models are in use, but these techniques have experienced
overgeneralization and overfitting issues.

• "Is the present model good enough to distinguish between SMS spam and non-spam?"
LITERATURE REVIEW
TOPIC & AUTHOR SOURCE CONTRIBUTION TO SHORTCOMING
KNOWLEDGE
Choudhary and Jain, 2009 International Journal Explored and analyzed patterns Limited to only abbreviation
A novel approach to detect spam of E-Services and for SMS spam classification patterns in SMS.
and smishing SMS using machine Mobile Applications
learning techniques.

Nurulhuda Firdaus Mohd Azmi, International Journal Method for filtering spam Performance of the algorithm
2012 of Computer Science message using TF-IDF and various based on the features
Filtering spam message using Term and Information Random Forest Algorithm used in the data set.
frequency-inverse document Security
frequency (TF-IDF) and Random
Forest Algorithm

Sethi, G., & Bhootna, V., 2014 International Journal Utilized Bayesian filter in The method accuracy is based
Automated SMS classification and of Computer Science developing an android on only two specific factors;
spam analysis using and Information application that can detect sensitivity and specificity.
topic modeling Technologies (IJCSIT) spam SMS.
AIM & OBJECTIVES
AIM
This research aims to use data mining techniques to detect and classify SMS Spam to combat abuse in telephone
network.

OBJECTIVES
• To implement an alternative approach to the problem of SMS spam message detection and classification using
Naïve Bayes, C4.5(J48), and Frequent Pattern (FP)-Growth Algorithm.

• To design a model that balance both fitting and generalization challenges in detecting and classifying anomalies
in SMS spam detection.

• To evaluate the most effective data mining methods for SMS spam using a variety of datasets with extremely
high classification and prediction accuracy.
METHODOLOGY
• The methodology outlines the overall structure of the workflow of this research.

• In this study, data mining techniques and machine learning algorithms are utilized for the analysis, detection,
and classification of the dataset.
METHODOLOGY cont’d
• Dataset
o Data is gathered from numerous sources to create a respectable dataset of spam and ham text messages,
which will be utilized as the model's input (SMS messages)

o The spam dataset was obtained from the Knowledge Discovery and Data Mining (KDD) machine learning
repository.

o The dataset contains 50,795 English raw text messages (711 continuous input attributes and 2 nominal
class label target attributes) with tag labels either as non-spam (ham) or spam.
METHODOLOGY cont’d
• SMS Spam Detection Phase
o This phase involves preprocessing, pattern extraction, and selection and classification (Han, Jiawei,
Micheline Kamber, and Jian Pei., 2013).

o The activities will be carried out using WEKA data mining software.

• Cleaning and Preprocessing

o Involves turning the collected dataset (unstructured data) into more structured data (Han, Jiawei,
Micheline Kamber, and Jian Pei., 2013).

• Pattern Extraction and Selection

o Frequent Pattern (FP) Growth Algorithm will be used during this phase for pattern selection in
detecting and classifying spam SMS.

• SMS Message Spam Classification

o Naïve Bayes & C4.5(J48) Algorithm will be used for spam classification.
EXPECTED RESULT
• The expected model is designed to identify anomalies in SMS and categorize the SMS as either spam or
ham messages.

• The outcome will achieve a balance between the issues of overfitting and overgeneralization in identifying
and classifying abnormalities in SMS spam detection
REFERENCES
• Chuprat, S., Sarkan, H. M., Yahya, Y., & Sam, S. M. (2019). SMS Spam Message Detection using Term Frequency-Inverse
Document Frequency and Random Forest Algorithm

• Gupta, M., Bakliwal, A., Agarwal, S., & Mehndiratta, P. (2018). A Comparative Study of Spam SMS Detection Using Machine
Learning Classifiers.

• Han, Jiawei, Jian Pei, and Yiwen Yin. (2000) “Mining Frequent Patterns Without Candidate Generation.”

• Han, Jiawei, Micheline Kamber, and Jian Pei. (2013) Data Mining: Concepts and Techniques 3rd Edition.

• Joe, I., & Shim, H. (2010). An SMS Spam Filtering System Using Support Vector Machine.

• Megha Rathi, Vikas Pareek. (2013). Spam Mail Detection Through Data Mining- A Comparative Performance Analysis.

• Nagwani, N. K. (2017). A Bi-Level Text Classification Approach for SMS Spam Filtering and Identifying Priority Messages

• R.Kishore Kumar, G.Poonkuzhali, P.Sudhakar, LAENG. (2012). Comparative Study on Email Spam Classifier using Data
Mining Techniques.

• Shirani-Mehr, Houshmand. (2013) "SMS Spam Detection Using Machine Learning Approach."
REFERENCES cont’d
•Suleiman, D., & Al-Naymat, G. (2017). SMS Spam Detection Using H2O Framework. Procedia Computer Science.

•Qian, Wang, Han Xue, and Wang Xiaoyu. (2009) "Studying Of Classifying Junk Messages Based On The Data Mining.“

Websites Visited
• https://ics.uci.edu/ml/solutions/spam-messages
• http://www.esp.uem.es/jmgomez/SMSspamcorpus
• https://www.softwaretestinghelp.com/fp-growth-algorithm-data-mining/
• http://abcnews.go.com/blogs/technology/201 2/08/69-of-mobile-phone-users-get-text-spam/
• http://archive kdd.org/datasets/download/SMS+Spam+Collection
• https://www.researchgate.net/publication/269651895_Spam_Mail_Detection
• https://medium.com/@easpex/pitfalls-of-using-fp-growth-algorithm-in-weka

Sms Spam Detectionn
No ratings yet
Sms Spam Detectionn
63 pages
MSOFTX3000 V200R010C20 ASN.1 CDR Description
No ratings yet
MSOFTX3000 V200R010C20 ASN.1 CDR Description
259 pages
Fitsum Tesfaye
100% (1)
Fitsum Tesfaye
59 pages
SMS SPAM FILTERING Report
No ratings yet
SMS SPAM FILTERING Report
38 pages
Alaris Sms Platform 3.5.25
No ratings yet
Alaris Sms Platform 3.5.25
708 pages
Telecom
No ratings yet
Telecom
78 pages
PUMMP: Phishing URL Detection Using Machine Learning With Monomorphic and Polymorphic Treatment of Features
No ratings yet
PUMMP: Phishing URL Detection Using Machine Learning With Monomorphic and Polymorphic Treatment of Features
20 pages
Department of Cse (Artificial Intelligence & Data Science) : Sms Spam Detection
No ratings yet
Department of Cse (Artificial Intelligence & Data Science) : Sms Spam Detection
27 pages
SMS Spam Detection Using Machine Learning
No ratings yet
SMS Spam Detection Using Machine Learning
9 pages
Interim Project - Sentiment Analysis of Movie
No ratings yet
Interim Project - Sentiment Analysis of Movie
101 pages
Abh 1
No ratings yet
Abh 1
17 pages
Spam Detection Thesis
100% (3)
Spam Detection Thesis
6 pages
En NEON-dX DataFoundationGuide V6.1.5
No ratings yet
En NEON-dX DataFoundationGuide V6.1.5
96 pages
SMSC Competitive Overview 08
100% (1)
SMSC Competitive Overview 08
31 pages
Detection of Phishing Website
No ratings yet
Detection of Phishing Website
12 pages
Gsma PRD FF.09
No ratings yet
Gsma PRD FF.09
35 pages
Spam Detection With Machine Learning
No ratings yet
Spam Detection With Machine Learning
2 pages
IMS - IP Multimedia Subsystem
No ratings yet
IMS - IP Multimedia Subsystem
21 pages
Creditcard Fraud Detection
No ratings yet
Creditcard Fraud Detection
26 pages
Upsell Model Case PDF
No ratings yet
Upsell Model Case PDF
48 pages
SMS Spam Classification Using WEKA: Dipak R. Kawade Kavita S. Oza
No ratings yet
SMS Spam Classification Using WEKA: Dipak R. Kawade Kavita S. Oza
5 pages
7.analysis and Detection of Malware in Android Applications Using Machine Learning
No ratings yet
7.analysis and Detection of Malware in Android Applications Using Machine Learning
55 pages
(KAVYA R SHETTY)
No ratings yet
(KAVYA R SHETTY)
21 pages
Fyp Phase 1 & 2 Documentation
No ratings yet
Fyp Phase 1 & 2 Documentation
101 pages
Mobile Value Added Services 2012: India
No ratings yet
Mobile Value Added Services 2012: India
12 pages
5G - The Path To The Next Generation
No ratings yet
5G - The Path To The Next Generation
162 pages
Gudid GMDN
No ratings yet
Gudid GMDN
13 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
8 pages
SMSC - STP Connectivity - Latest
No ratings yet
SMSC - STP Connectivity - Latest
9 pages
SIP FraudDetection WeDo 13.11.2018
No ratings yet
SIP FraudDetection WeDo 13.11.2018
41 pages
Traffic Analysis of A Short Message Service Network: January 2010
No ratings yet
Traffic Analysis of A Short Message Service Network: January 2010
5 pages
Comviva WP Secure Messaging Platform
No ratings yet
Comviva WP Secure Messaging Platform
18 pages
A Report of Recommender Systems: Ennan Zhai Peking University Zhaien@infosec - Pku.edu - CN
No ratings yet
A Report of Recommender Systems: Ennan Zhai Peking University Zhaien@infosec - Pku.edu - CN
19 pages
Secondary ICT 2 Teacher Guide
No ratings yet
Secondary ICT 2 Teacher Guide
56 pages
Design Approach To Handle Late Arriving Dimensions and Late Arriving Facts
No ratings yet
Design Approach To Handle Late Arriving Dimensions and Late Arriving Facts
109 pages
Cellusys Portfolio
No ratings yet
Cellusys Portfolio
8 pages
Lecture4 v3
No ratings yet
Lecture4 v3
49 pages
A System To Filter Unwanted Messages From Osn User Walls
0% (1)
A System To Filter Unwanted Messages From Osn User Walls
19 pages
Module 4 - Qualitly & Commissioning
100% (1)
Module 4 - Qualitly & Commissioning
77 pages
A Comparative Study For SMS Spam Detection
No ratings yet
A Comparative Study For SMS Spam Detection
4 pages
Telco VN - Presentation 14may 2019
No ratings yet
Telco VN - Presentation 14may 2019
21 pages
Detection of Url Based Phishing Attacks Using Machine Learning IJERTV8IS110269
No ratings yet
Detection of Url Based Phishing Attacks Using Machine Learning IJERTV8IS110269
8 pages
UC SIM Box Detection1
No ratings yet
UC SIM Box Detection1
47 pages
Infobip Whitepaper SMS FW Fraud Detection
No ratings yet
Infobip Whitepaper SMS FW Fraud Detection
6 pages
A2PSMSC 21270.v.2.0 Web
No ratings yet
A2PSMSC 21270.v.2.0 Web
4 pages
Home Automation Raspberry Pi
No ratings yet
Home Automation Raspberry Pi
17 pages
Digital Marketing and Its Analysis
No ratings yet
Digital Marketing and Its Analysis
37 pages
What Can We Do For Your Network?: Secure
No ratings yet
What Can We Do For Your Network?: Secure
6 pages
SMSC SMPP Server-Client
No ratings yet
SMSC SMPP Server-Client
8 pages
Savitribai Phule Pune University: A Report On Mini Project
No ratings yet
Savitribai Phule Pune University: A Report On Mini Project
10 pages
Big Data in Telecom
No ratings yet
Big Data in Telecom
35 pages
CustomerChurn PDF
No ratings yet
CustomerChurn PDF
16 pages
I-Bank User Manual PDF
No ratings yet
I-Bank User Manual PDF
39 pages
Reading Com Popcorn Past Simple
No ratings yet
Reading Com Popcorn Past Simple
2 pages
Reliance Jio Infocomm LTD.: Digital Marketing Campaigns
No ratings yet
Reliance Jio Infocomm LTD.: Digital Marketing Campaigns
9 pages
Calling Line Identification Services
No ratings yet
Calling Line Identification Services
19 pages
Data Mining in Telecom 2009024
No ratings yet
Data Mining in Telecom 2009024
23 pages
Log
No ratings yet
Log
24 pages
Managed SMS Firewall: Guard Against Fraud and Spamming
No ratings yet
Managed SMS Firewall: Guard Against Fraud and Spamming
2 pages
PRONOUNS
No ratings yet
PRONOUNS
15 pages
BT Cloud Voice Customer Firewall and Lan Guide
No ratings yet
BT Cloud Voice Customer Firewall and Lan Guide
3 pages
Distributed Processing, Client/Server and Clusters
No ratings yet
Distributed Processing, Client/Server and Clusters
17 pages
TENTATIVE LOAD FORM Online Rohan
No ratings yet
TENTATIVE LOAD FORM Online Rohan
3 pages
Mobileum - CaseStudy ROAMING CEM
No ratings yet
Mobileum - CaseStudy ROAMING CEM
4 pages
Cellusys Datasheet SMS Defence v4.6
No ratings yet
Cellusys Datasheet SMS Defence v4.6
4 pages
CS SIP Trunk T Mobile Czech v40815
No ratings yet
CS SIP Trunk T Mobile Czech v40815
1 page
Introducing Networking - DNS & Load Balancing
No ratings yet
Introducing Networking - DNS & Load Balancing
19 pages
IPSec Lab
No ratings yet
IPSec Lab
4 pages
Project Photo Share)
No ratings yet
Project Photo Share)
58 pages
Sunnxt App
No ratings yet
Sunnxt App
8 pages
Configuring Triple Play Service (GPON/XG-PON/XGS-PON Networking in Simplified Mode)
No ratings yet
Configuring Triple Play Service (GPON/XG-PON/XGS-PON Networking in Simplified Mode)
7 pages
Rss Serial Protocol
No ratings yet
Rss Serial Protocol
27 pages
Hulma BOB Enrollment Form
No ratings yet
Hulma BOB Enrollment Form
12 pages
Classification OF Computer
No ratings yet
Classification OF Computer
36 pages
Yealink IP PHONE: SIP-T21 E2 & SIP-T21P E2
No ratings yet
Yealink IP PHONE: SIP-T21 E2 & SIP-T21P E2
8 pages
Securing Wireless Networks - CISA
No ratings yet
Securing Wireless Networks - CISA
3 pages
Airtel Xstream Fiber!
No ratings yet
Airtel Xstream Fiber!
3 pages
ASCII Character Set
No ratings yet
ASCII Character Set
4 pages
IOS Configuration HSRP
No ratings yet
IOS Configuration HSRP
7 pages
8 IntelliJ
No ratings yet
8 IntelliJ
7 pages
Solved MCQs On Computer Networking
No ratings yet
Solved MCQs On Computer Networking
10 pages
Basic Geography - Paper Discussion - Penn
No ratings yet
Basic Geography - Paper Discussion - Penn
4 pages
The Global Intelligence Files - Re - Edward Lozansky Via Wikileaks
No ratings yet
The Global Intelligence Files - Re - Edward Lozansky Via Wikileaks
2 pages
Openet Subscriber Optimized Charging 4G Services WP
No ratings yet
Openet Subscriber Optimized Charging 4G Services WP
19 pages
B07NLKWFXW MF
No ratings yet
B07NLKWFXW MF
2 pages
Csci 41 Intro. To Data Structures Project 1: C++ Reviews and CH 1
No ratings yet
Csci 41 Intro. To Data Structures Project 1: C++ Reviews and CH 1
2 pages
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
From Everand
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
Dr.Chandrakant
No ratings yet
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)
Diameter Protocol A Complete Guide
From Everand
Diameter Protocol A Complete Guide
Gerardus Blokdyk
No ratings yet

SMS Spam Detection and Classification Using NLP Thesis

Uploaded by

SMS Spam Detection and Classification Using NLP Thesis

Uploaded by

METHODOLOGY

SMS Spam Detection and Classification to

Supervised By: Dr. Adebola Ojo

Submitted By: Oyeyemi Dare Azeez (192509)

• Cleaning and Preprocessing

• Pattern Extraction and Selection

• SMS Message Spam Classification

You might also like