0% found this document useful (0 votes)

40 views

Solution: March 2018

This document summarizes a research paper that proposes a .Net library for SMS spam detection using machine learning. The paper addresses the lack of research on SMS spam detection compared to email spam. It presents a new preprocessing technique to create an SMS spam dataset and uses various machine learning algorithms like Random Forest on the dataset. The best performing algorithm, Random Forest, is implemented in a cross-platform .Net library that can classify new SMS data as spam or ham using a pre-built model.

Uploaded by

Adio odunola

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views

Solution: March 2018

Uploaded by

Adio odunola

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/323809241

.Net library for SMS spam detection using machine learning: A cross platform
solution

Conference Paper · March 2018

DOI: 10.1109/IBCAST.2018.8312266

CITATION READS

1 135

1 author:

Syed Sarmad Ali

Beihang University (BUAA)
3 PUBLICATIONS 2 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

.Net library for SMS spam detection using machine learning View project

All content following this page was uploaded by Syed Sarmad Ali on 08 January 2019.

The user has requested enhancement of the downloaded file.

.Net Library for SMS Spam Detection using Machine Learning
A Cross Platform Solution
Syed Sarmad Ali Junaid Maqsood
School of Computer Science & Engineering Department of Computer Science
Beihang University (BUAA), Beijing, China. Carleton University
AND Ottawa, Canada
Dept. of Computer Science & IT [email protected]
The University of Lahore, Lahore, Pakistan
[email protected]

Abstract—Short Message Service is now-days the most used The recent growth in mobile users because of the recent
way of communication in the electronic world. While many advancements in smart phones, the popularity of SMS’s has
researches exist on the email spam detection, we haven’t had the
increased. This has caused a lot of different communities to
insight knowledge about the spam done within the SMS’s. This
might be because the frequency of spam in these short messages is
create tools and techniques for spamming the user’s mobile
quite low than the emails. This paper presents different ways of phones in order to get the desired output. To create a better
analyzing spam for SMS and a new pre-processing way to get the understanding in terms of machine learning algorithms to sort
actual dataset of spam messages. This dataset was then used on out the spam and filter them. There existed a lack of a proper
different algorithm techniques to find the best working algorithm dataset as well as a lack on the study for different algorithms and
in terms of both accuracy and recall. Random Forest algorithm clustering techniques for this specific problem.
was then implemented in a real world application library written
In this research paper, a new tool is created based on .Net
in C# for cross platform .Net development. This library is capable
of using a prebuild model for classifying a new dataset for spam framework. The resultant tools are actually a cross platform
and ham. library project which is compatible of using an already
normalized dataset to map it within the internal model and to see
Keywords—spam filter; SMS; detection; machine learning; in real examples what are ham and which are spam.
classification; clustering; algorithms; C# library; online detection
II. LITERATURE STUDY
I. INTRODUCTION
In 2013, Houshmand[1] put on dissimilar machine learning
Short Message Service (SMS) has indeed occupies the majority algorithms to SMS spam classification problem. Further they
of our communication and has become an essential part in daily analyze and compare the output to achieve the understanding
human activity. According to [1], SMS itself has become a that can sieve the SMS spams. The Author use the database from
multi-billion industry. It is now a matter of seconds for anyone UCI machine learning repository, explained in [3][4]. An SMS’s
to connect with others using SMS. With the recent subset arbitrarily selects ham messages. The dataset
advancements in the technology and with a huge competition
constitutes of the label message and trailed by the message
among the different cellular companies the cost of sending the
string. Methods like SVM and Naïve bayes are imposed to the
SMS has reduced to just about nothing. Now with different
cellular packages you get close to unlimited SMS’s and the sample which are initially processed and then features are
ability to send world-wide and low cost. This along with the extracted. Finally, the best classifier will be compared to the
betterment has also caused the short message service to be used dataset discussed in [4]. Matlab was used for feature extraction
as a marketing or other un-wanted services. In order to keep the and the analysis of the data and then different algorithm are
quality of this service in check, proper steps must be taken for applied using the python scikitlearn library.
the prevention of spam. Spam can be easily described as an
unwanted content that is send in a bulk quantity to bulk users. In 2011, Tiago et al [5] studied this issue and attempted to find
The purpose of spam is to either get users toward a specific different smaller datasets and their own personal study to create
marketing scheme or to just scam. a better dataset for academic studies. They created a new
collection of 4827 ham and 757 spam SMS and they donated this
Even today the quantity of spam SMS is quite low than spam
dataset to the community for further analysis. This was a
emails, but still there is enough quantity to create a miss-leading
remarkable step towards finding a solution to stop spam.
usage. In 2010-2012, it is reported [2] that about 90% of emails
are spam worldwide while this number is very low in terms of In 2012, Coskun and Giura from a research institution in New
SMS. In Asia about 30% of total Messages were actually spam York City performed an experiment [6] to classify the spam-ham
dataset by using the similarity equation. What they did was to
[2]. As the percentage is quite small, there has been more create an algorithm capable of performing a block match
advancements in terms of catching and blocking email-spam but analysis on a steam of different messages to find a similarity
still a very few studies are available for doing the similar thing among them. Their hypothesis was that if a lot of messages are
in terms of SMS. similar to previously sent messages than this steam is basically
a combination of spam and should be blocked. It was in-fact a
smart independent way of classifying spam without using any
kind of previous knowledge base. They used an internal
algorithm called the Counting Bloom Filter which was capable
of finding true similarity. Another interesting thing about their

,(((

3URFHHGLQJVRIWK,QWHUQDWLRQDO%KXUEDQ&RQIHUHQFHRQ$SSOLHG6FLHQFHV 7HFKQRORJ\,%&$67

,VODPDEDG3DNLVWDQWK±WK-DQXDU\
study was that they used YouTube comments to test their feature selection approaches. They don’t try different string
algorithm because of their assumption that the comments on conversion techniques. In this paper the relation of the string
videos are quite similar to the Short messages as their number of
characters is quite the same and they are just as much filled with conversion to a normalized vector distribution and the result on
spam as messages. The proposed algorithm had a 100% the performance of the classifiers is discussed. After pre-
detection rate when a stream of similar messages was flowing. processing the dataset eight different classifiers are used on it to
But when less number of messages were matching it had quite
less accuracy. In 2013, Shirani-Mehr [7] tried to study the see the performance.
possibility to classify SMS spam using different machine After having results from different classifiers, T-Test and
learning algorithms using the UCI spam dataset. Fredmen’s test will be used to determine if a valid difference
The research was composed of testing five different algorithms exist among the different algorithms and to find which exact
on the same exact dataset without any pre-processing. He was algorithm performed statistically better than the others.
able to find that the algorithms will be able to classify the spam Moreover, a new tool library was made based on the best
dataset much more effectively and he concluded that resultant algorithm. This library was constructed in C# language
Multinomial Naïve Bayes was the best among them all while for cross platform .Net development. Due to the new C# 6 launch
SVM ranked 2nd. this library can be used in any platform web based, desktop
In 2015, Akbari et al [8] did what the 2013 study lacked. They application, android, iPhone or to be deployed to a live stream
tried to study the different post-processing techniques on the server for on the way detection. This library is currently only
dataset created by the Tiago et al [5] to find if any post- capable of using an already normalized dataset. Nevertheless,
processing technique might be helpful in better classifying the Future work will make it work with the actual text, consequently
dataset. They tried different wrapper algorithms for better it can be used to visualize the results just as you type a
classifications. They founded out that Gentle Boost using R was combination of words. Similarly, for future development more
able to detect the spam out of the complete dataset at an accuracy research will be conducted to dynamically change the model
of 98.3% using Naïve Bayes it resulted in 97.6% detection. based on specific inputs within the C# application.
There may exist numerous ways of classifying the SMS spam Correspondingly, to add specific functions to enable the choice
dataset. Many researches have already studied different to algorithm’s model and their size.
algorithms and different boosting filters. In this study we will
investigate and put our efforts to formulate pre-processing IV. METHODOLOGY
techniques on dataset. Furthermore, we test the optimized data A. Dataset
set on different machine learning algorithms and different The first part of this study was mostly in analyzing the different
clustering techniques, to find if the original dataset after datasets out there and to do a specific pre-processing technique
preprocessing had any better results than what previous studies to see if it will make any difference to the end result of the
found. classifying algorithms. We started with the most generic and the
biggest dataset out there [5], presented by Tiago et al to the
III. GOALS AND OBJECTIVES community. The dataset contained a huge amount of SMS’s, but
after further analysis the problem in hand was actually a class
This study will investigate the role of data-set as a whole imbalance problem. There were almost 7 times more instances
towards the different classification techniques in terms of SMS of ham class than spam. To solve this problem, we decided to
spam detection. When we discuss about machine learning study another dataset [7], presented by Dublin Institute of
classification, two main concept exist. First is the dataset and Technology which contained only a collection of 1353 spam
the other algorithm. The dataset is indeed a key factor in messages. After an easy algorithm created in C# language we
making a model capable. If the dataset is not balanced or were able to merge the two datasets and easily format them into
contains reproductions, then the resultant algorithm will have a CSV. The end result was a coma separated file containing the
difficulty performing optimal solution, when given a real class attribute in the first part and the actual text message in the
world problem. There also subsists the problem that the second. It contained 2098 spaminstances and 4808 ham.
classifier might become too specific to the dataset making it B. String to Normatized distribution and Feature Selection
perform as if it’s working well. Mostly that problem is solved The dataset created had the actual string message which is not
by either using 10-fold cross validation or using a different that much easy to classify by any automated machine learning
training set and a different testing set. In this study 5-fold cross algorithm. To further transform it into numeric normalized data
validation is used because of the time limitation. Comparing to WEKA [9] was used. The CSV file was easily converted to an
the previous studies Houshmand[1], uses Naïve Bayes ARFF file which is the most specific file used by the WEKA
algorithm with 10-fold cross validation. Many algorithms take tool for any kind of classification. To make the dataset a more
a lot of time in processing the dataset more on that in Section readable for machine learning algorithms we used
IV. StringToWordVector filter which basically maps each
combination of words to a vector of numbers each reporting a
The pre-processing is the key factor in discussion here. Most of number instead of a word. We configured the filter, therefore the
the papers presented previously use the data as is or try only the

3URFHHGLQJVRIWK,QWHUQDWLRQDO%KXUEDQ&RQIHUHQFHRQ$SSOLHG6FLHQFHV 7HFKQRORJ\,%&$67

,VODPDEDG3DNLVWDQWK±WK-DQXDU\
resultant data was normalized and contained the number which algorithms. F-Measure is used as it looks at a bigger picture and
articulates how many times a specific word was used within a combines both the precision and recall.
message. After performing this filter, the resultant dataset
contained a total of 1047 attributes. At this stage we can
introduce a feature selection approach. After using InfoGain
Filter with the ranker and the judge the reminder was 353
attributes. These were the ones that were used at least 2 times
within the messages and were the ones with the rank and
threshold of at least 0.025. After doing so in order to solve the
still class imbalance issue we added a few re-sampling filter and
gave more weight to the imbalanced class. The resultant dataset
had 4419 instances with 353 attributes and was divided as 1944
spam and 2475 ham.

C. Different Classification Algorithms using WEKA

After having a proper dataset, the dataset was then divided into
2 datasets. One was 90% of the original this was called Spam-
Weka-Dataset.arff which was used to train and test the classifier Fig 1: Bar chart of F-Measures of all eight algorithms. F-Measure
using 5-fold cross validation and eight different classification
algorithms using WEKA tool. While the other dataset was 10% The F-Measure clearly show that the Random Forest had a very
of the original named high margin with the other algorithms.
Spam-Library-Dataset.arff, this was made to test the end result Houshmand[1] found Naïve Bayes algorithm more accurate as
library for its actual accuracy. Note that the both datasets did not well as speed and accuracy was also good in their analysis.
have any similar values in them meaning they were SVM also did remarkable but other algorithms while retaining
independent of each other. speed did not retain accuracy. Random forest was quite slow in
Once the datasets were complete. Eight different classifiers were terms on model creation and testing but gave good results.
used on them Results in Section V. The list of these classifiers To further test the algorithms many more results were created
are given below including the ROC curve. Which we personally prefer because
of its neat display and its nature to give a brief overview of the
Naïve Bayes (NB)
while algorithms working and the level of correctness it has in
Multinomial Naïve Bayes (NB-M) terms of the dataset in question. The ROC of almost all the
Support Vector Machine (SVM) algorithms was almost in the same range with the random forest
K nearest neighbors (K-NN) as an exception because it gave a really smooth curve with
Updatable Naïve Bayes (NB-MU) almost .98 AUC.
Decision Trees (J48)
Cost sensitive Naïve Bayes (NB-C)
Random Forest (RF)
Table 1: List of Algorithms

Three clustering algorithms were also used after un-labelling the

dataset, they did not state any significant result.

Farthest First (FF)

Simple K Means (SKM)
Fig 2: ROC Curve
X-Means (XM)
Table 2: List of clustering Algorithm
As it is clearly visible the Random Forest (RF) did have an
exceptionally better curve than the other ones. It was studied that
V. RESULT
this is because of the nature of the dataset it-self being
The dataset created for WEKA was used for classification normalized in the way it was, along with the specific features
purpose using different classifiers, eight to be exact. After pre- selected.
processing of the dataset Random Forest showed the most
A. Fredmens Test and the T Test
promising results in terms of spam detection in SMS. This was
done using almost 350 features. SVM and Naïve Bayes also To better check the accuracy of the algorithms a T test was
gave good results but they were not the best for this experiment. conducted among them using percentage of correctness as the
mean measure and alpha as 5%. The test result showed a
The first diagram is a bar chart of F-Measures of all eight significant difference within the different classifiers and also

3URFHHGLQJVRIWK,QWHUQDWLRQDO%KXUEDQ&RQIHUHQFHRQ$SSOLHG6FLHQFHV 7HFKQRORJ\,%&$67

,VODPDEDG3DNLVWDQWK±WK-DQXDU\
decision tree was suggested to be the best classifier for this accuracy in sorting out the spam without the actual knowledge
particular data. This is the result of data preprocessed by the two of their class. To do this the dataset was again converted to a one
filters previously mentioned
without the class label. And then the class label was used to
study the accuracy of the clusters and to visualize them via color.
Paired T-Test Two tailed This same process was done for three clustering algorithms
Dataset NB NB-M NB- SVM K- NB- J48 RF mentioned before and as previously stated they do not give any
MU NN C new results. The reason is because of the nature of this problem.
Main 93.91 93.62 93.35 96.6* 94.8 93.8 96.04* 98.85* The two classes might not be different in terms of attributes but
Table 3: T-Test Result (* means significant difference) are more different in terms of patterns among them. These
patterns are better found by a classifier than a clustering
The SVM and J48 did quite significantly better than the actual algorithm.
baseline which was the Naïve Bayes (NV) but even this test The first algorithm tested was the SKM which did not found any
suggested that the Random Forest (RF) had the most proper clusters just like the others. It did create two clusters and
percentage of correctness and this was significantly better than one was just ham while the other was a mixture of ham and
the others. spam. No separate spam cluster was found.

Folds NB NB-M NB- SVM K-NN NB-C J48 RF

MU
1 5 2 1 7 3 4 6 8
2 5 2 1 7 3 4 6 8
3 3 2 4 7 5 1 6 8
4 2 3 4 6 5 1 7 8
5 3 1 2 6 7 4 5 8
R 18 10 12 33 23 14 30 40
Table 4: Fredmen’s Test

Important observation are as follows:

Fig 3 : Result of First Tested Algorithm
H0: There is no significant difference
H1: There is a significantdifference
The next algorithm was X-Means which also gave the same
Chi Square: 27.333 results as the SKM and did not make any proper findings.
Alpha: 0.05
Decision Rule: 14.067
(Null Hypothesis rejected as Chi Square > Decision Rule

To better test this significance value the Fredmen’s test was done
among the eight classifiers and RF being ranked 1st in all of the
5 folds made it quite easy for the test to determine the
significance of these differences. Our null hypothesis being that
there is no difference among any of them was obviously rejected
by the Fredmen’s test clearly stating that there is a significant
difference among all of these. Later on Wilcoxon test proved that
Fig 4: Result of X-Mean Algorithm
the Random Forest (RF) performed the best in terms off-
Measure ranking and had the most accuracy and recall value. The Farthest First on the other hand made both clusters as a
combination of both classes.
Reason of Random Forest Success -Analysis
The reason why The Random Forest classifier was so successful
in our experiments can be because as [10] suggested that RF is
recognized as an active classifier when dealing with
approximations of what variables are significant in the
arrangement. RF also equipped with corresponding error in class
population disturbed data sets.
DeBarr, D..et al [11] proposed that the reason why Random
Forest has been successful is because the strong point of the
Random Forest technique comprises feature selection and
deliberation of numerous feature subsets. Fig 5: Result of the Farthest First

B. Clustering Algorithms This suggested that in terms of clustering there might not be any
After testing the different classification approaches another possibility to determine the data difference between the two
interesting thing would have been to study the different classes as they both have similar attributes and they are made
from a string attribute. There is no specific difference among
clustering algorithms out there to see if any of them have any

3URFHHGLQJVRIWK,QWHUQDWLRQDO%KXUEDQ&RQIHUHQFHRQ$SSOLHG6FLHQFHV 7HFKQRORJ\,%&$67

,VODPDEDG3DNLVWDQWK±WK-DQXDU\
them. It is more like a pattern than differentiate them in terms of
their classes. Spam will more likely have a specific pattern or a
specific number of words, specific usage of different words like
sell, buy, now or some sort of urgency or a sale of a good, as
majority usage indicate marketing.
C. Random Forest 100 vs 500 trees
Now that we had clearly found the top accurate algorithm in the
spam detection for this particular dataset. The next step was to
analyze the actual flow of execution of the Random Forest
algorithm on the dataset. For this paper the accuracy, recall and
the F-measure was used and different graphs were generated to
Fig 8: Graph showing the decreased level
understand the flow of its execution in terms of instances.
First starting with a generic settings of the random forest and
The F-measure shows a mix of both precision and recall. It kind
using 100 randomly generated trees from randomly selected nine
attributes we got the following flow graphs of had a zig-zag rise when it starts just like the precision graph
but then after reaching a certain level it had a fall similar to the
recall. What was interesting to study the same graphs using a
different version of random forest generated using 500 trees and
nine randomly selected attributes. The difference was when the
rise or fall happen but the structure of the graphs remains the
same.

Fig 6: Analysis flow of execution of the Random Forest algorithm

The precision started out slow but after about 80 instances it

began to grow and was constantly growing until it reached a
constant state.

Fig 7: Further Results of the Algorithm

Recall is most of the times the inverse graph than the

precision and therefore the graph of recall no surprise was the
opposite of the precision, it started on the top and as more
instances went through the algorithm it decreased to a certain
level. Fig 9: Graph showing the difference in length with previous setting

3URFHHGLQJVRIWK,QWHUQDWLRQDO%KXUEDQ&RQIHUHQFHRQ$SSOLHG6FLHQFHV 7HFKQRORJ\,%&$67

,VODPDEDG3DNLVWDQWK±WK-DQXDU\
All three graphs gave the same result as with the
previous settings but the difference was the length in
the total instances. The 500 tree model was almost 8
Algorithm
times bigger and 5 times slow than the 100 tree model.
After studying the variety of the algorithms and
different flavors of Random Forest the RF 100 tree was
selected for the next step which is the actual library
creation and testing. This will be a remarkable
improvement in terms of actual deployment from what
was previously done.
VI. .NET LIBRARY
First thing first the actual Random Forest Model was
saved using the WEKA tool as a Model file. After this
a way to predict un-known instances was determined Fig 10: The Algorithm used in the experiment.
which used this generated model to classify new
instances. As we already had our second dataset which In the Process Prediction method, the result can be
compromised of the 10% of the original, the only thing processed in any way required. In the example project
left was to find a way to use the model within the it was just matched with the original class value to test
library. the results.
To use the model, the best choice was to use the WEKA After some further analysis with the code and the
java library. But as it was written and compiled in java model. It was clear that the model was working well for
language it required a java virtual machine to be used the dataset. Also a point to note here is that the dataset
and recompiled. To create this as a .Net module another used is the second divided one. Spam-Library-
component called IKVM which is basically a mixed Dataset.arff which is the one having 10% of the
original data. Remember this is that data which the
Java and .Net module used to convert a java library file
model was not trained on and had no idea about. The
into a .Net dll library. This basically runs the .jar file evaluation object used in the code basically processes
into a JVM and instead of generating an executable file an instance from this dataset to the model and then
for windows platform it generates a dll library file for saves the output class. The class is retuned as a double
windows platform. value in this specific case 0 = spam while 1 = ham. This
Having a .Net WEKA library was a perk as it can be is because of the arrangement of the attribute values
used among any .Net language. C# 6 was the best within the ARFF file.
choice to create a library that merges this .Net library Below is the result computed for all the instances
because it can if wanted be reprocessed into a cross- within that 10% dataset computed through the resultant
C# 6 library it is stated as a confusion matrix and
platform application. The C# language requires a
clearly the performance of decision tree model on this
Visual Studio 2015 to be used to make it cross platform
dataset was remarkable. The accuracy for this dataset
as previous versions does not support it was 98.64%.
After processing the weka.jar file through IKVM the
project needed to include all the various Sun-Java files A B Classified
among the weka.dll file. In total 30 libraries were As
required by the project to process a model file. The
algorithm used within the resultant library is as 191 5 A=Spam
following: B=Ham
1 245
Table 5: Result computed for all the instance

A. Implication for developer

Our work will bring the question what .NET library has
to offer the developers? The answer would be that a
single library file, which can then be included simply
into any desktop or web based application. Most likely
a server which handles communication. Furthermore,

3URFHHGLQJVRIWK,QWHUQDWLRQDO%KXUEDQ&RQIHUHQFHRQ$SSOLHG6FLHQFHV 7HFKQRORJ\,%&$67

,VODPDEDG3DNLVWDQWK±WK-DQXDU\
the application is not just for SMS but for any Contributions to the study of SMS spam filtering: new collection
and results. In Proceedings of the 11th ACM symposium on
communication platform. Document engineering (DocEng ’11). ACM, New York, NY,
USA, 259-262. DOI=10.1145/2034691.2034742
[6] B. Coskun and P. Giura, “Mitigating sms spam by online detection
VII. CONCLUSION of repetitive near-duplicate messages,” in Communications (ICC),
The research studied the SMS spam classification 2012 IEEE International Conference on. IEEE, 2012, pp. 999–
1004.
problem. The first step was to find multiple datasets
[7] H. Shirani-Mehr, “SMS Spam Detection using Machine Learning
which were then combined into one after duplicates Approach”,(unpublished)http://cs229.stanford.edu/proj2013/Shir
removal. The resulting dataset was then processed aniMeh r-
SMSSpamDetectionUsingMachineLearningApproach.pdf
through StringToWordVector filter to create a
[8] Akbari, F.; Sajedi, H., "SMS spam detection using selected text
normalized distribution of all the strings containing features and Boosting Classifiers," in Information and Knowledge
words. This was done without having any stop words. Technology (IKT), 2015 7th Conference on , vol., no., pp.1-5, 26-
28 May 2015 doi: 10.1109/IKT.2015.7288782.
After that feature selection approaches were used to [9] Sarah Jane Delany, Mark Buckley, and Derek Greene. 2012. SMS
create a more suitable dataset with almost 350 spam filtering. Expert Syst. Appl. 39, 10 (August 2012), 9899-
9908. DOI=http://dx.doi.org/10.1016/j.eswa.2012.02.053
attributes. To solve the class imbalance problem data
resampling was done and the data was reduced to a
smaller version by favoring the imbalanced class.
Then eight different classification techniques were
studied on the improved dataset which resulted in
Random Forest to have the most accurate
classification results. Then the generated model was
then processed within an externally created C# library
to classify something it had no previous knowledge
about. It gave 98.64% result on the dataset it was not
trained on through C# library using the Weka model.

VIII. FUTURE WORK

For future work, the library can be improved to make it
more adaptable to the environment. So that while on
the execution when new messages are received it could
classify them and if the predictions are strong it could
add that instance to its training set and re-train. This
might help it study the new patterns of spam as they
emerge. Also the library can be optimized to run on a
server while on the go. This will enable for the cellular
companies to classify a message before it is being sent.
It can also help detect the users that are performing a
more number of spams and block them to ensure a
better quality service to all other users.

REFERENCES
[1] Mehar, H.S. 2013. SMS Spam Detection using Machine Learning
Approach.. International Journal of Information Security Science
2.
[2] Wikipedia-Docs-
https://en.wikipedia.org/wiki/Mobile_phone_spamJ. Clerk
Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2.
Oxford: Clarendon, 1892, pp.68-73
[3] SMS Spam Collection Data Set from UCI Machine Learning
Repository,
http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection
[4] SMS Spam Collection v.1, ”http://www.dt.fee.unicamp.br/∼tiago/
smsspamcollection”
[5] Tiago A. Almeida, Jos Mara G. Hidalgo, and Akebo Yamakami.
2011.

3URFHHGLQJVRIWK,QWHUQDWLRQDO%KXUEDQ&RQIHUHQFHRQ$SSOLHG6FLHQFHV 7HFKQRORJ\,%&$67

,VODPDEDG3DNLVWDQWK±WK-DQXDU\
View publication stats

Possible Dissertation Defense Questions You Should Be Prepared For
100% (2)
Possible Dissertation Defense Questions You Should Be Prepared For
21 pages
Dbms
No ratings yet
Dbms
2 pages
Homework 2 Key
No ratings yet
Homework 2 Key
4 pages
An Airline Reservation Application
100% (1)
An Airline Reservation Application
17 pages
Religion and Globalization - Research Proposal
No ratings yet
Religion and Globalization - Research Proposal
4 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
Machine Learning Paper-2
No ratings yet
Machine Learning Paper-2
4 pages
A Comparative Study For SMS Spam Detection
No ratings yet
A Comparative Study For SMS Spam Detection
4 pages
Content-Based Sms Spam Filtering Using Machine Learning Technique
No ratings yet
Content-Based Sms Spam Filtering Using Machine Learning Technique
7 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
Spam Detection Thesis
100% (3)
Spam Detection Thesis
6 pages
IJCRT23A5429
No ratings yet
IJCRT23A5429
7 pages
Multi-Purpose Chat Bot: Team Formation Team Members
No ratings yet
Multi-Purpose Chat Bot: Team Formation Team Members
15 pages
Fraudulent Text Detection System Using Hybrid Machine Learning and Natural Language Processing Approaches
No ratings yet
Fraudulent Text Detection System Using Hybrid Machine Learning and Natural Language Processing Approaches
9 pages
Thesis On Spam Detection
100% (3)
Thesis On Spam Detection
4 pages
Detection of Spams Using Extended ICA & Neural Networks
No ratings yet
Detection of Spams Using Extended ICA & Neural Networks
6 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
IJCNS CooperativeSpam
No ratings yet
IJCNS CooperativeSpam
12 pages
Efficient Spam Filtering System Based On Smart Cooperative Subjective and Objective Methods
No ratings yet
Efficient Spam Filtering System Based On Smart Cooperative Subjective and Objective Methods
12 pages
Major-Final Research Paper
No ratings yet
Major-Final Research Paper
3 pages
Email (Research) 3
No ratings yet
Email (Research) 3
7 pages
IJNRD2403165
No ratings yet
IJNRD2403165
5 pages
A Systematic Literature Review On SMS Spam Detection Techniques
No ratings yet
A Systematic Literature Review On SMS Spam Detection Techniques
10 pages
Email Spam Detection (Research Paper)
No ratings yet
Email Spam Detection (Research Paper)
8 pages
Research Paper Spam Detection
No ratings yet
Research Paper Spam Detection
4 pages
Spam Detection Synopsis
No ratings yet
Spam Detection Synopsis
8 pages
Future Generation Computer Systems: Pradeep Kumar Roy Jyoti Prakash Singh Snehasish Banerjee
No ratings yet
Future Generation Computer Systems: Pradeep Kumar Roy Jyoti Prakash Singh Snehasish Banerjee
10 pages
Apache Spam Assassin
No ratings yet
Apache Spam Assassin
8 pages
research article on the forensic
No ratings yet
research article on the forensic
14 pages
Journal Paper
No ratings yet
Journal Paper
7 pages
Enhancing Email Security with Naïve Bayes Spam Detection.docx Fully edited
No ratings yet
Enhancing Email Security with Naïve Bayes Spam Detection.docx Fully edited
64 pages
Research Paper Emaildetection
No ratings yet
Research Paper Emaildetection
6 pages
Email Filter For Spam Mail: A Review
No ratings yet
Email Filter For Spam Mail: A Review
5 pages
Spam PDF
No ratings yet
Spam PDF
4 pages
46_ijme...Mech Engg..Research Paper-1
No ratings yet
46_ijme...Mech Engg..Research Paper-1
10 pages
SMS Classification: Conjoint Analysis of Multinomial Naive Bayes Application
No ratings yet
SMS Classification: Conjoint Analysis of Multinomial Naive Bayes Application
9 pages
Application of Natural Languag
No ratings yet
Application of Natural Languag
32 pages
Information: Malicious Text Identification: Deep Learning From Public Comments and Emails
No ratings yet
Information: Malicious Text Identification: Deep Learning From Public Comments and Emails
19 pages
A hybrid machine learning approach for spam and malware
No ratings yet
A hybrid machine learning approach for spam and malware
14 pages
Spam Message Detection Using Logistic Regression
No ratings yet
Spam Message Detection Using Logistic Regression
4 pages
Computers 12 00196 v3
No ratings yet
Computers 12 00196 v3
25 pages
Spam Mail Detection Using Machine Learning
No ratings yet
Spam Mail Detection Using Machine Learning
5 pages
Presentation 3
No ratings yet
Presentation 3
13 pages
Hybrid Spam Filtering For Mobile Communication: Ji Won Yoon, Hyoungshick Kim, Jun Ho Huh
No ratings yet
Hybrid Spam Filtering For Mobile Communication: Ji Won Yoon, Hyoungshick Kim, Jun Ho Huh
14 pages
E-Mail Spam Filtering
No ratings yet
E-Mail Spam Filtering
7 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Email Spam PDF
No ratings yet
Email Spam PDF
5 pages
Final ppt
No ratings yet
Final ppt
51 pages
Email Spam Classification Using Hybrid Approach of RBF Neural Network and Particle Swarm Optimization
No ratings yet
Email Spam Classification Using Hybrid Approach of RBF Neural Network and Particle Swarm Optimization
12 pages
Analyzing Privacy and Security in Cloud Computing Environments
No ratings yet
Analyzing Privacy and Security in Cloud Computing Environments
6 pages
Chatanya
No ratings yet
Chatanya
43 pages
Spam Detection in Email Using Machine Le
No ratings yet
Spam Detection in Email Using Machine Le
8 pages
Fortified
No ratings yet
Fortified
40 pages
Review (2) - Machine Learning For SPAM Detection 2023
No ratings yet
Review (2) - Machine Learning For SPAM Detection 2023
13 pages
Anchalora
No ratings yet
Anchalora
29 pages
Jebin 2
No ratings yet
Jebin 2
22 pages
Synopsis Email Spam
No ratings yet
Synopsis Email Spam
9 pages
Spam Detection Using BERT
No ratings yet
Spam Detection Using BERT
6 pages
87_95_Ensemble-Machine-Learning-Algorithm-for-Telegram-Spam-Detection
No ratings yet
87_95_Ensemble-Machine-Learning-Algorithm-for-Telegram-Spam-Detection
10 pages
Analysis of Spam Email Filtering Through Naive Bayes Algorithm Across Different Datasets
No ratings yet
Analysis of Spam Email Filtering Through Naive Bayes Algorithm Across Different Datasets
4 pages
A Multi-Layer Architecture For Spam-Detection System: Conference Paper
No ratings yet
A Multi-Layer Architecture For Spam-Detection System: Conference Paper
9 pages
Ijresm V6 I9 3 2
No ratings yet
Ijresm V6 I9 3 2
5 pages
The Infinite Bit: An Inside Story of Digital Technology
From Everand
The Infinite Bit: An Inside Story of Digital Technology
Arvind Padmanabhan
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
From Everand
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
Fouad Sabry
No ratings yet
Learning Edition Via Blue Prism Digital Exchange (DX) Install Blue Prism Install Blue Prism License
No ratings yet
Learning Edition Via Blue Prism Digital Exchange (DX) Install Blue Prism Install Blue Prism License
2 pages
2020 Convocation List Camera Ready Current Students
No ratings yet
2020 Convocation List Camera Ready Current Students
10 pages
DFA Minimization Based On Hopcroft's Algorithm Along With Slides On Brzozowski's Algorithm Also Found in Lecture 5
No ratings yet
DFA Minimization Based On Hopcroft's Algorithm Along With Slides On Brzozowski's Algorithm Also Found in Lecture 5
40 pages
Chap 1 Database
No ratings yet
Chap 1 Database
38 pages
Call Center Simulation Modeling: Methods, Challenges, and Opportunities
No ratings yet
Call Center Simulation Modeling: Methods, Challenges, and Opportunities
10 pages
Modelling
No ratings yet
Modelling
10 pages
Securing The Internet of Things (Iot) : A Security Taxonomy For Iot
No ratings yet
Securing The Internet of Things (Iot) : A Security Taxonomy For Iot
7 pages
LUCINO Research 2 1
No ratings yet
LUCINO Research 2 1
15 pages
1 s2.0 S0016328723002136 Main
No ratings yet
1 s2.0 S0016328723002136 Main
20 pages
Research On Evolution of EA Sports
No ratings yet
Research On Evolution of EA Sports
28 pages
PUBH3305 Qualitative Assessment 2024 Final
No ratings yet
PUBH3305 Qualitative Assessment 2024 Final
5 pages
10 - Osakuade Joseph Oluwatayo
No ratings yet
10 - Osakuade Joseph Oluwatayo
7 pages
Comparative Study On Deductive and Inductive Argument: Fec-32 Logical Reasoning Slot-2 1 Semester
No ratings yet
Comparative Study On Deductive and Inductive Argument: Fec-32 Logical Reasoning Slot-2 1 Semester
13 pages
Technoculture Theory
No ratings yet
Technoculture Theory
19 pages
1 3 Textbook
No ratings yet
1 3 Textbook
5 pages
Composition and Rhetoric
No ratings yet
Composition and Rhetoric
118 pages
A Project Report Onbrand Awareness of Vijaya Dairy Products Brand
No ratings yet
A Project Report Onbrand Awareness of Vijaya Dairy Products Brand
4 pages
Critical Literacy Practice - Applications of Critical Theory in Diverse Settings - (2015)
No ratings yet
Critical Literacy Practice - Applications of Critical Theory in Diverse Settings - (2015)
224 pages
Veena Rani
No ratings yet
Veena Rani
84 pages
Sample Proposal FINAL
No ratings yet
Sample Proposal FINAL
13 pages
CHN 2 Midterms
No ratings yet
CHN 2 Midterms
71 pages
Senior Thesis Nyu
100% (3)
Senior Thesis Nyu
7 pages
Re Structuring Mercato Undergraduate The
100% (1)
Re Structuring Mercato Undergraduate The
62 pages
Sistem Informasi Kredit Program (SIKP)
No ratings yet
Sistem Informasi Kredit Program (SIKP)
11 pages
Lkti Efektifitas Pengenalan Pariwisata Melalui Aplikasi Geoguessr Terhadap Pembangkitan Ekonomi Kreatif Jawa Timur
No ratings yet
Lkti Efektifitas Pengenalan Pariwisata Melalui Aplikasi Geoguessr Terhadap Pembangkitan Ekonomi Kreatif Jawa Timur
8 pages
Finance Homework Sample
100% (1)
Finance Homework Sample
6 pages
Life Grand Map
No ratings yet
Life Grand Map
1 page
A Study On Advertising of HUL
100% (1)
A Study On Advertising of HUL
60 pages
FREE STUDY ABROAD TUTORIAL 1 & 2 - ACADEMIC CV
No ratings yet
FREE STUDY ABROAD TUTORIAL 1 & 2 - ACADEMIC CV
6 pages
Difference Between Literature Review and Research Proposal
No ratings yet
Difference Between Literature Review and Research Proposal
7 pages
O'Brien (2009)
No ratings yet
O'Brien (2009)
16 pages
Download Full Handbook of Environmental Economics Volume 1 Environmental Degradation and Institutional Responses 1st Edition Karl-Göran Mäler PDF All Chapters
100% (8)
Download Full Handbook of Environmental Economics Volume 1 Environmental Degradation and Institutional Responses 1st Edition Karl-Göran Mäler PDF All Chapters
40 pages
Research Instrument in Thesis Writing
100% (3)
Research Instrument in Thesis Writing
4 pages
Research
No ratings yet
Research
23 pages
UNIT III SCIENTIFIC VALUES
No ratings yet
UNIT III SCIENTIFIC VALUES
22 pages