0% found this document useful (0 votes)
145 views8 pages

A Framework For SMS Spam and Phishing Detection in Malay Language: A Case Study

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views8 pages

A Framework For SMS Spam and Phishing Detection in Malay Language: A Case Study

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/289036263

A framework for SMS spam and phishing detection in Malay language: A case
study

Article  in  International Review on Computers and Software · January 2014

CITATIONS READS

4 759

3 authors:

Cik Feresa Mohd Foozy Rabiah Ahmad


Universiti Tun Hussein Onn Malaysia Technical University of Malaysia Malacca
31 PUBLICATIONS   64 CITATIONS    107 PUBLICATIONS   914 CITATIONS   

SEE PROFILE SEE PROFILE

Mohd Faizal Abdollah


Technical University of Malaysia Malacca
89 PUBLICATIONS   403 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Integrating information quality dimensions into information security risk management (ISRM) View project

3D Courseware for UTeM Student: Electromagnetic Subject View project

All content following this page was uploaded by Cik Feresa Mohd Foozy on 23 July 2018.

The user has requested enhancement of the downloaded file.


,ffi, International Review on Computers and Soliware (I.RE.CO.S.), Vol. 9, l,l. 7

?rnal* rc,*Q ?dzo ISSN 1828-6003 Julv 2014

A Framework for SMS Spam and Phishing Detection


in Malay Language: a Case Study

Cik Feresa Mohd Foozy, RabiahAhmad, Faizal M. A.

Abstract - Short Message Service (SMS) spam and SMS phishing has been ina'eese nov,adavs
especially in Malal' language which is the Jirst langtage Jbr Malaysia country. Currently, nnnl'
SMS spam in others language has heen proposed, however nol yetJbr Maluy- language and we are
the Jirst to propose these. In addition, this paper also analysl on several Jramev'orks of SMS spam
Jiltering.[or onr SMS spanr and phishing detection .framevork. From the ana{uis, the chosen
framev,ork has been enhanced.fbr Malav- SMS spam and phishing. The enhancement has been
done on classiJication phase *'here our.frameu'ork proposed dual clussification. The t'lassification
I w,itl classit'y the SMS into ham ancl scam SMS. For classification 2, the scam SMS will be
c'lassified again into SMS spam and SMS phishing. Aller dual classifications phase completed, the
Malay SMS Aas been examined asing iVaii'e Ba3,es and J48 unsupervised Machine Learning
tec'hniques. The result shows high aecurdcy in detecting Malay SMS ham, spam and phishing.
Copyright @ 2014 Praise Worthy Prize S,r.I. - AII rights reserved.

K eyw o rds : D e t e c t i o n, F i I teri n g, P h i s h i n g, .Securi4t. SMS, Spazr

I. Introdnction Moreover, if user clicked or replied the trigger SMS,


the malware will connect with the mobile device and the
SMS is one of the alternative communication services mobile device owner has potential to losing money and
nowadays. Since the SMS charges are in reasonably infbrmation privacy. Joe et al. [5] found the unu'anted
priced per SMS in many countries. these gain interest to incoming SMS makes the respondent feel the SMS
many users that don't have mobile data or Intemet to use violate their personal privacy and Sfv{iShing usually u"ill
this service as communication tool and
sending influence the SMS recipient to enter their fraud contest,
information such as advertising marketing, spread news asking money transactions into their bank account" pay
and etc. There are tools and software for sale to filter thcir bills, traud advertising and etc. Since there are
SMS. However, the SMS spam and phishing attack is many disadvantages of SMiShing attack. few numbers of
still increased. Spam and SMS phishing is two different studies on SMS spam filtering has been proposed.
types of attack. Spam message will contain advertising Traditionally. phishes will send SMiShing attack in a
[l] and marketing. However, for phishing attack the high volume. Howevcr, nowadays phishes interested to
message will trick users by announce user i5 a winner or send SMiShing attack in a small quantity to observe the
get free gift, These phishing tactics is for user to response security level of the mobile device, thus it is possible to
the message. According to [2] the SMS phishing detect the SMS phishing based on the SMS quantity and
nessages can charged fees to mobile device ou'ner the distribution of sending message pattern.
silently if the user response to the SMS. Moreover, from In addition. SMS language or abbreviation words ate
the replied SMS phishing, the phishes can get more frequently used when sending the SMS instead of proper
infbrmation about the mobile device such as mobile language because the limited text length fo vrite the
device version. contact number and etc. message- This SMS compose l'eatures on mobile phone
Boodae [3] identify, mobile device users are three will make user compress the message by using
times more likely to enter a'*'eb-based phishing attack abbreviation words as long as the recipients read the
than desli:top users and a security report by Lookout [4] message. However, too much abbreviation words in SMS
found three out oi ten Lookout's users interested to will be detected as spam by some of the SMS filtering
clicking on an unsafe link per year by using mobile tools.
device. Recently, there are variety of mobile applications In this paper we proposed a I'ramework to detect
have been developed and have been dorn'nload on mobile Malay SMS spam and phishing using classification
device by many user. This is one of the reason phishes techniques. Malay SMS corpus, have been collected tiom
revolutionize their attack strategy by embedding malware the contributors such as web site, fi'iends, family and
URL links into the SMS fbr user to clicks or reply the unknou'n respondents.
SMS as accepting the rules and regulations to download The text collection method is by using n'anscription
or install the mobile applications. and our online form.

Cop-vright @ 2014 Praise Worrhy Prize S.r.l. - All rights reserved

1248
Cik Fercsa Mohd Foon. Rabiah Ahmad. Faizal M. A.

This paper is organized as follows. In Section 2, For this paper, the Malay SMS phishing and spam
Literature ret'iew on SMS phishing detection. will be classified based on the generic features such as
In Section 3, will explained the pre-processing SMS Total words t431, t441, [45] and [46], Number of
datasets, features selection development and the Character bi-grams[43] antl [45], Number of Character
experiments to identiff the accuracy offeatures selection n'i-grams[43] and [45], Average number of length [44]
using data mining tools named WEKA. Section 4, shows and [a6] and Average number of word [aa] and [46].
the result and findings based on the experiment that has We also add additional f'eatules fbr this study such as
been done and finally is conclusion and fufure wort were advertisement, contest, ut3enr SMS, ask money, asked to
given. response SMS, telephone, URL and arurounce user get
free gift or win. These criteria is based oo spam and
phishing attack. Total f'eatures applied in this study are
II. Related Work 14 fearures.

Mobile device phishing adack has been categorized by


Dunham et al.[6] into four types such as Bluetooth t1.2. ilfala1t l-6s1g11age Detet'tion Framev:ork
phishing, Short Message Service (SMS) phishing and
Voice over IP Phishing(vishing). Example Bluetooth Malay language has been applied as case shrdy for
phishing attack has been discuss bV [6]. the Bluetooth several research areas.:However, there is still lack lbr
phishing attack works when user connect to the Wi-Fi information security detection study especially for SMS
hotspot and the attacker can steal the data when the user spam and phishing study which is not yet been examined
connects to the Wi-Fi. For vishing, this attack will attack based on the literature review in SMS spam and phishing
customers' service organization to get the private study. The example of Malay detection research area
information of customers -
such as speech derection[47], language detection[48],[7]
Mobile web application phishing is similar with and email spam detection [a9].
desktop web phishing attack. However, the solution for Moreover, speech detection by Lim et al.[47]
this attack mostly based on client-server method and purposed to produce high quality synthetic speech in
depends on the mobile architecture itself because mobile Malay language. In addition, for language detection by
device has few limitation such as CPU processing, Tsai et al. [48].
battery life and memory size" thus the existing solution is The,researchers aimed to reduce the English, Malay
by using antivirus or anti-phishing which is purposely and Chinese languages documents redundancy. The
develop for lightweight mobile device. Thus, this paper, framework has three different preprocessing which suit
we u.ill focus on SMS spam and phishing attack solution
' for three different languages and finally will apply in a
that has been increased recentlv. detection process such as Sentence Segmentation.
Sentence Level and Novel Rate computation.
A. Kwee et al. [7]. proposed English and Malay
il.1. SMS Corpus Ctillection Method language detection and the processes for Malay language
Several methods can be used to collect text corpus. detection consist of Language Translation. Stop Word
According to Table l, there is variety of text size that Removal, Word Stemming and Detection.
has been collected. Most SMS languages that has been
In email spam detection research area by T.
collected are in English SMS and text [7] contributor are Subramaniam et al. [49], the Malay language has been
trom unknown, known contributor and participant. applied as case study using Bayesian filtering techniques.
For unknown SMS contributor are usually SMS taken The results show this technique successfully classity
lrom soltware and websites. spam and non-spam of Malay language email with 96%
For known are from family and friends. Few studies accuracy. The frameworks consist of Collection of
not mention the process of SMS collection clearly. emails, Tokenization, Features reduction, Features
However, Hard af Segerstad [8], Ogle [9], Fairon and selection, Training and Testing^
Paumier [0], Choudhury [ ll, Herring and Thus, in this paper, Malay language has been
Zelenkauskaite [l2], Bach and Gunnaruson [13] has state examined as case study fbr SMS spam and phishing
the method that has be used to collect SMS.
detection. The basic detecfion process has been applied
SMiShing attack has been discussed by [6] and [39]- in this framework and the enhancement of dual
According to K. Dunham [6]. SMiShing is a new tactic classification .*ith two additional new features such as
to spread malwarc by adding the URL link in SMS and advertisement and contest feafures has been proposed.
inJluence the recipient to click on the URL.
The framework and result will discussed further in the
next section.
In addition O. Salem et al. [39], ideutity the
SMiShing atnck is on the rise atrack cun'ently on mobile
device and Xu et al. [40] agreed SMS spamming is a ILs. SMS Fihering and Detection Framework
serious attack for SMS nowadays. J. W. Yoon et al. [41]
and H. Peizhou et al. [42] studies on SMS content-based Generally, filtering method consist of tokenization,
filtering and Q.Xu et al. [40] proposed SMS filtering on lemmatization and stop word removal, representation and
non content-based. classifierby [50].

Copl'7if6t 'g 2014 Praise ltortlry Prize S.r.l. - .4ll rights resen'ed httenatktnal Reviev: on Comuulers and ktftu'are, rbL 9, N. 7

t249
Cik Fercsa Mohd Foozy, Rabiah Ahmad, Faizal fuL A.

TABLE I
LJTERATURE ON TEXT MESSAGING CoRPUs
References Tevt Size TertLensuaec TcxtContributor Text Collcction l\{ethod
Pietrini ll4l 500 Italian l5-35 Years Old Unknoun
Schlobinski et al. [151 1500 Germany StDdents Unknown
Shortis l16l 207 English I Male Student liicnds and Family Transcript
Doring [r7] 100{) German 200 partrcipans Unknou'n
Hard ef Setgetstad ll$l | 152 Swedish I I 2 trom an anonymous webpage, 252 From webpage, rrolunteers,
nrcssagcs foru'ardcd from voluntccrs family and liicnds
and ?88 fiorn family and friends
Kasesnicmi and Rautiainen 7800 Finnish Teenagers ( l3-18 Years Old) Transcript
flel
Grinter and Eldridge[201 4'7'l English l0 Teenagers i15-16 Years Old) Tmnscripr
Thurlorv end Brown [2ll 5M English 135 Freshmen Trarrscript
Ogl" 97 Englisir Nightclubs Subscribe SMS Promotion
Of Nightclub
Yljue How and Ken l22l l0l l7 Englislr ?003 Respondents Transcript
Feiron end Paumler ll0l 30000 French 166 Univexity Students For*,ard
Choudhury [l ll lo00 English 3.200 Contributos Search The SMS
From The Website
Rich Ling [231 867 Norwegian Rendomly Transcript
Rcttie (2007) 278 English 3l contributors Unknown
Zic Fuchs and Tudman 6000 Croation University students, Unknown
Vukorid l24l family and iiiends

Gibbon and Kul 292 Folish Uniuersity studenrs and Unknown


l2sl friends
Deunert and Oscar 312 English, isiXhosa 22 young adulls Trdnscript. Forward
Masinyana [261
Hutchby erd Tannr l27l | 250 English -30young pmfessionals Transcript
(?0-35 years old)
Welkowske [281 I700 Polish 2(X) contriburors Fonpard, Softu'arc
Herring end 1452 lralian Audiences of an iTV Online SMS archives
Zelenkauckrite progrum
ll2l
Teggl2el l062ri English 16 tbmily and friends Transcript
Ehis J30l 600 English. 72 universitv students Forward
French, etc. and lecturers
Barasa [3ll 2730 ,English. 84 univercity studcnts Fonvard
Kiswahili. and 37 young prottssionals
etc.
Bech end Gunnarsson [31 3l 52 Swedish, I I conlributors Soliware
English.
etc.'
Bodomo l32l 853 English. Chinese 87 youngsrcrs Transcript

Liu and Weng 85870 Chinese Real volunteers Unknown


133l
Sotillo I34l 6629 Elglish -59 participanrs Software
Ditrscheid and Stark [351 2398? Germany, 2.6f7 vtrluntcer: Fors'nrd
French. etc
Lexender [361 496 French I 5 young people Transcript
Elizondo f3?l 357 English 12 volunteers Transcript
Chcn end Ken l38l 71000 English and University Students Unknown
Mandarin
Chinese.

However, there are multiple frameworks for spam four processes such as identify suitable representation of
filtering and detection such as below: an SMS, build spam models, classihcation and
determined the spam. ln addition, SMS-Watchdog SMS
t Tools detection scheme by Yan et al. [53] has tlu'ee processes
Cormack et al. [45] on SMS tiltering using 5 fypes of of monitoring, anomaly detection and alefi handling
spam filter tools to filter the SMS. Before applied tools using SMS services.
on SMS, there are pre-processing data that has been done
to come out with four (4) main features. Moreover, to . Contenl-Based Filte,'ing
detect SMS protocol in real time Rafique et al. [51] J. W. Yoon, et al. [41] proposed hybrid framewor*
applied Hidden Markov Model (MIm) which the that implement content-based technique with challenge-
architecture consists of sniffer. feafi.rre extraction, response scheme. The SMS classified into ham, spam
classifier and mles decision. Que and Farooq [52] also and uncertain, then the challenge response will classifu
apply MHH on byte level distribution of SMS that have uncertain into ham and spam by matching the sender

Copyright 'O 201 4 Praise Worthj, Prize S.r.l. - .4ll rights resen ed Iuternational Revieu, on Computers and Soflu,are, Vrtl. 9, N. 7

l25t)
Cik Feresa Mohd Foozy, Rabiah Ahmad, Faizal M. A.

response. Gdmez Hidalgo et al. [54] have proposed Which is mostly has been applied in SMS shrdtes.
content-based SMS filtering for English and Spanish These explained the SMS spam filtering study already
SMS spam using Bayesian filtering that consist of applied var-ious filtering and detecrion techniques with
preprocessing, feature selection and learning. different SMS language except for Malay SMS language.
Additional, one of the difl'erent befween frameu'orks
t Mac:hine Learning is the technique applied but yet the result is still good.
Xiang et al. [55] proposed Support Vector Machine Varieties of fiamework presented for spam filtering
technique to filter the mobile spam. Moreover, Cai et al. and detection. However, for SMS spam and phishing
[56] improved the spam tilter using traditional balanced attack not yet available. Thus, the proposed SMS spam
Winnow algorithm which applied pre-processor, feature and phishing framework will do some enhancement on
selection, texts representation and winnow algorithm the generic lramework of SMS spam tiltering by [50].
module. An independent mobile device filtering by The enhancement framework will have dual
Taufiq Numzzaman et al. [44] applied several processes classification tbr SMS Malay language.
in their SMS independent sparn filtering such as data set The reason to have dual classification is to identify the
and running environment, feature extraction, vector SMS collection has been classified conectly. After the
creation and filtering process of Naive Bayes or SVM first classification done, the second classification proeess
and update filtering system. Yadav [46], [57] had three will classity the scam SMS into spam and phishing. The
process in the their SMS filtering such as Bayesian framework will discuss further in next section.
filtering algorithm, mobile application and
synchronization service on server.
UI. Methodology
t Vatching Pattern This section, explain about Malay SMS spam and
Wu et al. [58] has proposed SMS filtering flow such phishing detection framework development. The SMS
as SMS screening. bayesian learning. keyword SMS and spam and detection will focus on features based on
Pinyin Fuzzed keyword matching. previous studies. As mention before, this study is the lirst
Moreover, the Chinese SMS filtering by Jie et al, to collect Malay SMS for detecting spam and phishing.
[59] has pre-processing, lbatures selection. modeling, and Thus" there are no SMS spam and phishing datasets
classifier. In addition. Najadat et al. [60]. frameworks available in Malay Language. SMS spam and phishing
involve of three proccsses of data collection. pre- datasets nced to be prepared for this srudy.
processing, text mining, testing, evaluation metrics and For datasets preparation, a collection of Malay SMS
implementation. has been done from website, unknown respondents.
friends and fomily. The proposed framework is based on
t Artift'ial I ntmttneSlstern Guzella and Caminhas [50] which have four (4) main
T. M. Mahmoud and A. M. Mahfouz [61] applied steps in filtering spam nressage such as tokenization,
arrifrcial immune system method filter SMS spam that lemmatization, representation and classifier.
contain analysis engine, tokenize word, stop word, For this framework, four main steps will be applied.
dataset. training and AIS engine. Chaminda et al. [62] However. additional classifier will be added in this
proposed a hybrid solution ot neural network and flamework which called dual classifier in Malay SMS
Bayesian filtering where the SMS filtering process are spam and phishing detection tlamework. .
sender identification module, spam folder, SMS content
The reason we need dual classifier compare to a single
extractor, tokenizer. Bayesian filter.. categorization, classification process because we collect SMS ham and
training and inbox. scam SMS from website. friend, family and unknown
respondents. The respondents usually have basic
. knowledge about SMS spam and phishing and some
Ctyptograph)'
doesn't know anything about these attacks.
In Cryptography area, Saxena[63] proposed a secure
After Malay SMS collection have done SMS harn md
SMS protocol for SMS tr.an$mission and a cryptographic
algorithm in the SIM card. The processos nf framework SMS scam will be tokenizing, lemmatizing and stop
are request to send SMS and authenticate sending SMS-
word removal, representation and classification I -
In addition, Pereira et al.[64] also proposed a
Atter get the result liom classitication 1, second

lightweight cryptography algorithm to mitigate the SMS classification process will be proceed to classified again
SMS scam into SMS spam and phishing. The similar
security issues, protocols, pror,iding encryption,
method has been applied by J. W. Yoon, et al. [41] to
authentication and signature services. In addition. Choi
classiiy uncertain SMS into spam and ham class. Figure
t65l applied Common Public Kuy Cryptography I and the process below listed the process of Malay SMS
technique for SMS communication efl'eciency which
spam and phishing datasets and detection development:
containt of initialization for aulhenticate, encrypt. or
decrypt and communication phuses lbr sending SMS.
i. Collect SMS ham and scam SMS from rvebsite,

As summary, *re basic architecmre frameworks are


friend, family and respondents and do first
classiltcation.
data collection, pre-processing, f'eatures selection,
training and testing. ii. Tokenization.

Cop)'right O 2014 Praise Worthl,Prize $.r.!. - 4!! rights resseryed Inkrnational Re'-ie*- on Computers and Sttflu'are, Vol- 9, N- 7

l25t
Cik Fet'esa Mohd Foo4,, Robiah Ahmad, Faizal M. A.

iii. Lemmatization as remove redundancy and noise. However, SMS usually will contain many
iv. Representation Srrings into nominal datasets. abbreviation words. It is difficult to group similar
v- Features Selection- meaning for variety of words such as in Malay SMS
vi. Second classification to Scam SMS into spam and abbreviation, &e word Thank You can be typed as TQ,
phishing. thank Q, thanks or tengkiu. Thus, for this study, all
vii.Examine the result Malay SMS datasets using Naive words in ttris SMS will be calculated the occurrences and
Bayes and J48 Technique. will be identified as different words.
The calculation of SMS word occurrences process are
done by using JAVA programming to identitied the
lnuoming
Lemmatization I unique words in these Malay SMS collection. There are
Tokenization
Ham and
Scam xl and stop u'ord
removal
80? words in SMS after lemmatization.
SMS
lII.4. Fedtures Selec'tion
Representation
Clessilkr I
(Features Selection) SMS representation in this paper isapplying the
features based on the previous studies. The features arc
Total words, Number of Character bi-grams, Number of
Scam
Character tri-grams. Average number of length, and
sMs
Average number of word.
For this study, additional features based on the spam
[Gr*"ur and phishing characteristics also included such as
l2 Advertisement or announcement, Contest, Malicious
-_Ir*
______u__.
URL, Telephone Number, Winning or Free gift. SMS ask
help'to get money and SMS ask to respond or subscribe
si*;
S.mm or PhiqlsEiir!
sen'ices.
sMs

Fig. l. SMS Spam and Phishing Detection in Malay Language il1.5. Class(icatiotr
Framework
fitere are two classil'rcations processes proposed in
this framework. The raw data collection has been
il|.1. Matav SMS Corpus Collection Method classified into SMS ham and SMS scam. After
classification process l, the classification result shou's
As preliminary study in collecting Malay SMS corpus, high accuracy. Afier that, the second classification
Malay SMS has been collected using methods in Table I. process, classify the SMS scam into SMS spam and
The SMS collection methods are from website, phishing. The reason dual classifications are done
personal SMS tbrwarding, transcriptions and online first study to proposed framervork in
because this is the
fbrm. The SMS contributors are from respondent, detecting SMS spam and phishing. Thus, to ensure good
website, family and {'riends. After the SMS collections result in classification accuracy. this dual classification
are done, all SMS are transcript into Microsofl Office has been proposed and the results rvill be discussed in tle
Excel 2007 for tokenization; lemmatization and
next section.
representation process.

III.2. Tokenization
IV. Analysis and Findings
Tokenization is a process to divide the sentence into An experiment has been done to examined 179 of
SMS ham, spam and phishing class using WEKA a data
word, The purpose tokenizations have been done for
mining tools to test the classified accuracy, truc positive,
calculating the word for f'eatures selection and
classificarion process. Fig. 2 is an example of the SMS
true false on Malay spam and phishing corpus using
tokenization. There are 179 SMS has besn tokenize and
Naive Bayes and J48. The reason these technique has
total word after-tokenization are 21694.
been applied to tested the classifrcarion accuracy rate
because these techniques is one of the well known
supervised method in machine leaming techniques.
Table II is a classification result between 4l SMS ham
and 82 SMS scam. The tesult shows Nai've Bayes and
Fig. 2. Atter SMS Tokcnization Process J48 is 100%. Table III is a classification is result for
Scam SMS that has been classified into 4l SMS phishing
and 4l SMS spam Malay SMS. The result also shows
IIL3. Lemmatization Narve Bayes and J48 get 100 %. The final result lbr
Lemmatization is a process to group the same ternary classification of 41 SMS ham,41 SMS phishing
meaning wortls. and 4l SMS spam show 1007o accuracy.

Copyright Q 2014 Praise Worthy Prize S.r,t. - .4ll rights resen;ed htternational Revieut on Compulers and Soflu'are, hL 9, N. 7

r252
Cik Feresa Mohd Foon'. Rabiuh Ahmad. Faizal M. A.

TABLE II This research is funded bv MOHE under Long


B]NARY CLASSITICATION I RTSUTT FoR MALAY SMS HAM AND SCAM
Research Grant Scheme
Parameter
Beyes
Neil'c J48
LRGS/20 I I iFTMK/TKO I / IROOOO2
Ham Scam Ham Scam
True Positive I I ll
FalsePositive 0 0 00
Corectly Classified 100 7o 100% References
lnconectlv Classified t Y" 0'
tll S. S. Chandtrrn and S. Murugappan, "Spam detecdon and
eliminarion of messager; fiom twiner;" Intetnational Review an
TABUENI Comlnters und Safnturz, vo]. I, pp. 2438-2443, 2013.
BiNARY CL{ss$IcATIoN 2 TIISULT FoR MALA,Y SMS SPAM F-Sccure, "N{obilc Threat Repon Q3 20 | 2," F-Secure Labs20 | 2.
t2l
AND PHTSHING
t3l M. Boodac, "l\{obilc Uscrs Thrcc Timcs Morc Vulncrablc to
Naive Baves J48 Plr.ishing Acacks," ln h'wleer vol. 1012, cd. !01 t.
Parrmcter
Phishint Spam Phishinq Spam t41 I. Lookout, "Lookour Mobile Tlreat Repon August 201 I," 201 l.
Truc Positivc llll t5l l. Joc and H. Shim, "An S1\lS Spam Filtcring Systcrn Using
False Positive 0000 Support Vector Machine." in Future Generation hrlormation
o;ir
Correctly Classified 100 100 ?; Tet:hnoLtg:. vol, 6485. T.-h. Kim, er o/.. Eds., ed: Springel Berlin
Inconectlv Classified 0% 0 9'6 Heidelberg. 201 0. pp. 577-584.
i6l K. Dunbarn, "Chapter 6 - Phishing. SMishing. and Vishing," in
Mobile hfalnutr ,4tta<:ks cnd Dclinse. D. Ken. Ed., cd Boston:
TAELE IV
Syngress. 2009. pp. 125-l 96.
TEnwent ClessrFtcATlo^- RESr"rlr Fon N'lrlav SMS Heu, Slera
ANDPr{rsHrNc t?l A- Kwee, t"t a,/., "sentence-Level Novelty Detection ia Fnslish
and Mafay." in .irlvaaers in Ktrotrlcdge l)isror,ery and Data
l\eive Baves
Peramcter ,Vnrirg. vol. 5476. T. Theeramunkong. et al. Eds., cd: Springcr'
Harn Phishinq Spam Ham Phishing Spam
Berlin Heidelberg. 2009, pp. 40-5 l.
True Positive I l I I I I t8l Y. Hird af Segerstad. t/sc und Adupltttion (t Writltn Lunguuge t{t
False Posirive 000000 tht' Conditions of Computer-ldediated Com*runication'.
Correctly Classified 100 % 1000.;
Univesity of Cothenburg, 200?.
Iocon'ectiy Classified 0% 0?o
tgl T. Ogle, "Crealive Uses of lnlbrmation Extracted from SMS
Messages." Undergraduate, Computer Science, The Universiry of
The higher percentage (%) of co.rectly classified Sheft-ield.2005.
parameter is better. Meanwhile the Tnre Positive is I [|0] .Cddnck Fainrn and S. Paumier. "A Translated Corpus of 30,00t)
French SMS,' ln ln Prorcetlings of Language Resourtes and
showing that the datasets is classified corectly and false Evaluatktn.,20Q6.
positive is 0 means that none of SMS are in wrong class. il 11 M. Choudhury. et al. "lnvestigation and modeling of the stnrcturc
of terting l4nguage." Inr. J. Dot. .4nal. Re'tognit.. vol. 10, pp.
r 57-l ?4. 200?.
:

V. Conclusion [2] S. C. Herring and A. Zelenkauskaite, "Symbolic capital in a


virtual heterosexual market abbreviation and insertion in Italian
SMS phishing is still arising. However, only.SMS iTV Slt'ls.' ,yrittett Communicarion. vol. 26, pp. 5-3 l, 2009.
spam corpus available on the Internet., Based on the
[I3] C. Bach and J. (iunnarsson. "Exlraction of trends is SMS tcxt,"
l{ashtrs thtsis, Lund Uni}ersr,1. 2010.
studies on spam and phishing in inlbrmation security I l4] D. Pietrini, "X'6:-(?": The sms and the triumph of infbrmality and
uea. spam and phishing attack have different definition ludic w{ting," Italknisch, vol.46. pp. 92-101,2001.
and understanding. Il5l P. Schlobinski, rr a/., "Sintsen. Eine Pilotstudie zu sprirchhchen
und kommunikativen Aspekten in der SMS-Komnrunikadon,"
Thus, we initiate to collect SMS phishing in Malay i\it'trttttx 22. Online-Publikatiottttr zum Themu Sprache und
language as altemative mitigation to overcome SMS Kommunikation im Internet, 2001.
spam and phishing in Malay language. [16] T. Shortis, "'New Literacies' and Emerging Foms: Text
Based on the result, \tr'e pmve thal this corpusi lr'lessaging on Mobile Phones," presented at the International
Literacy and Research Network Cont'erence on Leaming., 2001.
successfully been classified into three classes such as N. Doring. "l bread. sausage, 5 bags of apples LL.Y" -
[7]
SMS ham, spam and phishing using dual classification comrnunicative functions of text mcssages (SMS\," ZeitschiJtJiir
techniques. !{ edie np sy c h obgi e 3. 2(X)2..
As conclusion, by applying classification techniques !81 Y, Hard af Sergerstad, Ux: und .lda1ttuttun ol fii'itren l-anguage
to the Conelitions al' ContPuter'Mediated Communication:
using machine learning technique also can detect and University of Gothenburg. 2002.
filter scam SMS. [l9] E.-L. Kasesniemi and P. Rautiainen. "Mobile culrure of chilfuen
There are many techniques can be applied for this and temgers in Finland"" ia Petpetml contact, ed'. Cambridge
research irea and this technique is one ofthe established University Prcss, 2002, pp. 170-192.
[20] R. Grintcr anrl M. Eldridgc, "Wan2dk?: cvlYyday tcxt
technique for SMS hltering and detection since messaging," presented at the Proceedings of thc SIGCHI
processing times by using Naive Bayes and J48 only Conference on Human Factors in Conputing Systenls, Ft.
takes only 0 seconds to classity and get 100-o/o accuracy. Lauderdale, Florida. USA, 2003.
[21] C. a..{. B. Thurlow, "Gcncration Txfl Thc sociolinguistics of
young people's rext messagiug," Discoutse Analysis Online lll),
Jr.. 2003_
Acknowledg€ments
[22] Yijue Horv and M.-Y. Kan, "Opdmizing Predictive Text Entry lbr
Short lt{cssagc Sc'nicc on Mobilc Phoncs," prosr-ntcd at thc In
The authors would like to thank Universili Teknikal
Procecdings of HCII, 2005.
Malaysia Melaka (UTeM), Universiti Tun Hussein Onn [23] Rich Ling md N. S^ Baron, "Text Messaging and lM: Linguistic
MalaysiaflJTHM) and Minisrry of Higher Educarion Comparison of .{merican Collegs Data," 2007.
Malaysia for supporting this research.

Copyright,g 201,1 Praise Wortlry Prize S.r.l. - .4ll rights resened httentational Review on Compulers and ktflu'are, rbl. 9, N. 7

r25l
Cik Felesa Mohd Foo4', RabiahAhmad. Faizal M. A.

[2a] M. Zic Fuchs and N. Tudman Vukovii, "Communication [4?] Y. C. Llm, ct al., "Application of Genetic Algorithm in unit
technologies and theil influence on l:rngurge: Reshuffling tenses selection for Malal' speech synthesis system," Expet ,S)rst€ms
in Croatirur SMS text messaging,r' Jezikoslovlje, pp. 109-122, with Applications, vol. 39, pp. 53?6--5383,2012.
2008. [48] F. S. Tsai, er rrl., "Multilingual novelty detection." Eryert S]srerfls
[25] D. Gibbon and l!{. Kul, "Economy Strategies in Resricted with Applications vol. 38, pp. 652-658. 201 l-
Communicadon Channcls. A study of Polish shon toxt [49] T. Subramaniam, et a/., "Naivc Baycsian ,{nti-spam Filtcring
messages," f,008. Technique for Malay Language."
[26] A- Deumert and S. Oscar Masinyana. "Mobile language choices [50] T, S. Guzella and W. M. Caminhas, "A revierv of rnachine
The use of English and isiXhosa in text messages (SMS) leaming approaches ro spam fihering," Lrpert Systcnts v:ilh
Eviclcncc tiont a bilingual South Afiican sanrplc," English World- ,4pplicatiorts. vol- 36, pp. I 0206- l 0222, 2009.
Wide, vol. 29, pp. I 1?-147, 2008. t51] M. Z. Rafique, el a/.,'Applicarion of evoludonary algorithrns rn
[27] I. Hutcl$y and V. Tanna, "Aspecrs of sequential organizadon in detecting SMS spam at access layer," presented at the
text message cxchange." Diseourse & Cotnmunication, vol. 2, pp. Proceedings of rhe l3th annual confbrence on Genetic and
r43-r64,2008. cvolutionary conrputalion, Dublin, Irclancl 201 l.
[28] J. Walkowska, "Gathering and Analysis of a Corpus of Polish l52l M. Z. R. que and M. Farooq, "SMS Spam Detection By Opemting
SMS Dialogucs," Challenging Pnthlems oJ Science. Computer On B1'te-Level Distributions Using Hidden Markov Models
Science- Rtcent Advances in Intdligent l4formation.ilsreor.s. pp. {HMMS)." prcsented at the Virus Bulletin Contbrence September
r45-r57.:009. 20r0.10r0.
[29] C. Tagg, "A Corpus Linguistics Srudy of SMS Text Messaging." [53] C. Yan, er a/-, "SMS-Watchdog: Profiling Social Bchaviors of
Doctor of Philosophy" Department of English. The University of SMS Users for Anomaly Detection
Birmingham. Birmingham. 2009- Recent Advaace-s in Intrusion Detection." vol- .57-58. E. Kirdu et al.,
[30] F. W. Elvis. "The sociolinguistics of] motrile phone sms usage in Eds.. ed: Springer Berlin Heidelbery.2009. pp. 202-223.
cameroon and nigeria," nrc btternalional Journal of Languuge [54] J. M. G. Hidalgo, et sl.. "Content based SMS spam filtering,"
Society and Culture. vol. 28. pp. 25-40,2009. presented at the Ptoceedings of the 2006 ACM symposium on
[31] S. N. Barasa, langroge, mobile phones antl internet: a studr rt Dorcument engineering. Amsterdam. The Netherlands, 2006.
Sl/S lertrirg, entail, bi ancl SNS chats in campuler netlialed [55] Y. Xiang. c't aL, "Filiering nrobile spam by suppo{ \'ectot
comnunication (Cl\{C) in Kenya, 2410. machirie " presented at thc Conferencc on (lomputer Sciences,
[32] A. B. Bodomo. "Thc Gmmmar of Mobile Phone Written Softwarc Engineeiing lnformation Technology. E-Business md
[.anguage," Chaprer, vol. 7. pp. I l0-198,2010. Applications (3rd: 20M : Cairo, Eg1'pt). Cairo. Egypt. 2004.
[3]l W. Liu and T. Wang. "lndex-based online text classification lbr {561 C- Jie er c/., "Spam Filter fbr Short Me*sages Using Wirtnow," in
snrs spam filtering." Journul rdCompurers, vol. 5. pp.844-851. Atlvaneetl Language Processirg and Web InJbrmation
2010. Technolog;, 2008- ALPTT '08. Into nalional Cotlfbr<nce on, 2008,
[34] S. Sotillo. 'SMS Texting Practices and Communicalive pp.454-459. :

Intcntion," t)hapter, vol- I 6. pp. 252-265, 2010. [57] K. Yadav. et a/.. "Take Control of Your SMSes: Designing an
[35] C. Dirscheid and E. Srark. "SMS4science: An internstional Usable Spam SMS Filtering System," in lt{obile Dant
corpus-based texting pmject and the specilic challenges for lfianagement (MD!L{), 2012 IEEE ISth lnternatiilnal Conference
multilingual Switzerland." Digitul Dis<'ourse: Languagr in the on,2012,pp- l5:-355.
Nc*- .Vcdia: Languuge in the Neu' i\'ledia. p. 299. 201l. [58] W. Ningning. ar a/., "Real-time monitoring and tiltering systenr
[36] K. V" Lerander. "Names U ma puce: multilingual texting in for mobile SMS," in la<Justial Electronics and appliutions,
Senegal," Working paper20l l. 20A8. rc1E.4 300tt. -lrd IEEE Conleren<r' on. 2008, pp. l3l9-
[3?] J. Elizondo, "Not 2 Cryptic 2 DCode: Paralinguistic Restitution. 1 324.
Delction. and Nonstandard Orthography in Text Messages," Ph- [59] J. Huang, et d., "A Bayesian Approach for Text Filter on 3G
D. thqsis. Swanhmore College.20l l Network." in ll'ireless Communicatiotts Nr:nrurl'rng and Mobile
[3E] T. Chen and M.-Y. Kan, "Creating a live, public short mcssage Conl>uting {WiL'.Olrl), }0lA 6& Internationill Conference on,
service corpus: the NUS SMS corpus." Lunguagc Rcxturc:<'s and 1010, pp, l -.5.
Eva luation. vol. 47. pp. 299-335. 20 I il06/0 I 20 | 3. [60] H. Najadat. er al, "Mobile SMS Spam Filtering based on l\{ixing
[39] O. Salem" er al, "Awareness Program and AI based Tool to Classitiers."
Reduce Risk of Phishing Attacks," in Computer und Infbrmarion [{rl] T. M. Mahmoud and A. M. Mahfouz, "SMS Spam Filtering
TechnologS. (CID. 2010 IEEE IOth International Conference on, Technique Based ol Anificial Immune Systern." IJCI9|
2010. pp. l4l8-14?3. Inlernational .Iour,tdl oJ Camputcr Scr'arca' /-rsl<'s, vctl. 9, 20 I 2.
[a0l Q. Xu, e, a/., "SMS Spam Detection using ContentJess Features." [62] T. Charninda, et al.. "Clontent based hybrid srns spam filtcring
hxelligent System-s, /fEf. vol. PP, pp. l-1,_2012. system." 20l4.
t4ll J- W. Yoon. cl a/.. 'Hybrid spam filtering ftrr mobile [6,1] N. Saxena and N. S. Chaudhari, "SecureSMS: A secure SMS
eommunication." (lompaters &amp: Securit.r'. tol. ?9. pp. 446- protocof for VAS and other applications," Journal ofSysleils and
459, t0lo. Soliv.are. vol. 90. pp. 138-150.2014.
[42] H. Peizhou. a a/.,
nA Novel Method for Filtering Group Sending
[64] G. C. C. F. Pereira, er a/.. "SMSCT]?Io: A lightrveight
Short Message Spam," in Convergence und H1-hrid Inl-ormation cryptographic tiarnework for secure SMS transmission," ,Journol
Tcc h no logr, 2 {n8. I C HIT'08. I n ttr n at i on a l Co n ft t"t' nca' on, 2008. ry' S.1sr,nr.s ond Sr/irrzn'. vol. 86. pp. 698-706. 20 I 3.
pp. 60-65. [65] J. Choi and H. Kim, "A Novel Approach for SMS sccurity."
[43] G. V. Cormack, er a/., "Content bascd SMS spam filtering," Intenntional Journd oJ Security & Ils Applicalions, I'ol. 6,2012.
presented at the Proceedings of the 2006 ACM symposium on
Document engineering, Amsterdam, The Netherlands, 2006.
[4a] Ir,f. Taufiq Nrnuzzaman, er a/., "Simplc SMS spam filtcring on Authors' information
independent mobile phone," Spcuri{' and Communicatiort
vol. 5. pp. ll09-l:20,2012.
Neru.nrrLr, Cik Fcrcse Mohd Foozl is cunendy working
[45] G. V. Cormack, el al. "Fearure engineering tbr mobile (SMS) with Universiti Tun Hussein Onn Malal'sia
spanr filtoing," p'escntcd at thc Procccdiltgs of the 30th annual (UTHIVI), Malal'sia. Feresa holds a l\{a$er's
intemationai ACM SIGIR conference on Research and degree in Computer Sciencefinformadon
development in informarion retrierai, Anrsterdarn, Tlre Sccuriry) fronr Universiri Tel:nologi Malaysia
Nerherlands, 200?- Malaysia and a Bachelor's degree in Intbrmation
[46J K. Yadav. e/ ,/., "SN,lsAssassin: crou.Gourcilg driven mobilc- Tectmology and l\'lultimedia tlom Universiti
based system l.or SMS spam tiltering," prcserrted the at Tun Hussein Onn Malaysia (UTln[, Malaysia.
Proceedings of the l.lth Workshop on Mobile Computing She is crurently pwsuing her PhD at the Universiri Teknikal Malaysia
Systems and Applications, Phoenix, Arizona. 20 I I . Melaka. Malaysia.

Copyright,g 2014 Praise Worthy Prize S.r.l. - .4ll rights resemed htternational Review on Comprters and Softu'are, VoL 9, N. 7

r254

View publication stats

You might also like