Future Generation Computer Systems: Sandhya Mishra Devpriya Soni
Future Generation Computer Systems: Sandhya Mishra Devpriya Soni
article info a b s t r a c t
Article history: Smartphone’s popularity and their constant connectivity to the World Wide Web have made these
Received 15 July 2019 devices vulnerable to phishing and smishing attacks. Phishing is a practice of sending malicious
Received in revised form 20 February 2020 emails to users. Smishing is a combined form of SMS and Phishing in which invaders send SMS
Accepted 7 March 2020
containing malicious content to the victim. This content sometimes includes links which redirect the
Available online 12 March 2020
user to websites containing malicious applications and user interfaces. Researchers have proposed
Keywords: various methods in past years to detect smishing but still, we lack a method that significantly avoids
Smishing false-positive results i.e. falsely categorizing a message as malicious when it is genuine. Hence, we
Phishing have proposed a model called ’Smishing Detector’ to identify smishing messages while reducing
Text messaging false-positive results at every possible step. The proposed method consists of four modules, namely,
Mobile security
SMS Content Analyzer, URL Filter, Source Code Analyzer and Apk Download Detector. SMS Content
Machine learning
Analyzer analyzes the text message contents. Naive Bayes Classification Algorithm is used to identify
SMS
the malicious contents and keywords present in the text message. URL Filter inspects the URL to
identify malicious features. Source Code Analyzer examines the source code of the website to identify
the harmful code embedded in it. Form tag and URL domain present in the source code are also
inspected in this module. APK Download Detector identifies whether any malicious file is downloaded
while invoking the URL. User consent taken while downloading the file is also inspected in this
module. Finally, we have developed a prototype of the proposed system which has been validated
with experiments on SMS datasets. In this paper, we have demonstrated the results of each module
separately and also we have demonstrated the final results. The results of the experiments show an
overall accuracy of 96.29%. We have compared this model with other models proposed by various
researchers and we have found that this model covers more security aspects as compared to other
models.
© 2020 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.future.2020.03.021
0167-739X/© 2020 Elsevier B.V. All rights reserved.
804 S. Mishra and D. Soni / Future Generation Computer Systems 108 (2020) 803–815
false-positive results. We have done the content analysis of the messages. S-Detector contains four components, namely, SMS
smishing SMS and have also analyzed the behavior of the URL monitor, SMS analyzer, SMS determinant, and Database. This
included in it. We have analyzed the source code of the URL to model inspects the URL to analyze whether .apk file is down-
check its malicious behavior and re-directions. loaded or not. If an executable file is downloaded, the SMS is
What we propose in this paper: categorized as smishing. They have used the Naive Bayes clas-
sification algorithm to analyze the contents of the messages.
• A method that can detect Phishing initiated through text Another study proposed by [14] suggested a rule-based method
messaging.
for identifying Smishing messages. They have identified nine rules
• An embedded APK Download Detector module which veri- to filter smishing messages from legitimate messages. Further,
fies the behavior of the URL included in the SMS.
these rules were trained using different classification algorithms
• A Source Code Analyzer module that inspects the legitimacy like Decision Tree, RIPPER, and PRISM. The performance evalua-
of the Source code and its re-directions.
tion of the approach shows more than 99% true negative rate. A
• A user interface which helps the user to select or skip the research work proposed by Sonowal et al. [11] proposed a model
various steps involved in detecting the Smishing SMS in case
named SmiDCA for detecting smishing messages using a machine
if the user is already aware of the legitimacy of the text
learning approach. The proposed model extracted 39 features
message.
from smishing messages using correlation algorithm. Machine
• To experiment with a real-time application of the system, a Learning algorithms were used and experiments were conducted
prototype of the system is developed and the results of the
on different datasets. Experimental evaluation of SmiDCA dis-
model are verified using SMS datasets.
played an accuracy of 96.40% using Random Forest classifier. Goel
The remaining part of this paper is arranged as follows: A et al. [10] proposed a smishing detection system called ‘smishing
summary of the research work done by other authors in this classifier’ for identifying Smishing messages. This system inspects
field is discussed in Section 2. Our proposed system is presented the contents of the SMS and SMS keywords using Naïve Bayesian
in Section 3. The architecture and flowchart of the model are Classifier. The proposed framework verifies the existence of URL
depicted in Sections 3.1 and 3.2 respectively. Section 4 explains in SMS and it also inspects the mobile number of the sender.
the results of the experiments conducted on the prototype of the Further, login page appearance and download of APK file are also
proposed system. A comparison of the suggested system with evaluated in this model. A. Kang et al. [15] discussed various
other related works is presented in Section 5. The Conclusion of types of phishing and smishing attacks. They have proposed a URL
the proposed work is given in Section 6. validation test to check the authenticity of the URL included in
the message. They have also discussed the smishing box approach
2. Related work as a security measure against downloaded applications during a
smishing attack. In another work proposed, the author [16] used a
In recent years, researchers are focusing more on smishing due content-based approach to detect smishing messages. A machine
to its popularity in mobile attacks. Some researchers have pro- learning algorithm is used for identifying the most frequently
posed models to detect smishing messages [10,11,13–17]. Some used keywords in smishing SMS. Further, the appearance of the
of the spam detection models which used similar approaches login page and downloading of the .apk file is also evaluated
are also discussed here [18,19]. Many researchers have discussed in this model to inspect the maliciousness of the URL. A recent
smishing strategies and smishing detection approaches to bring work proposed by Ankit et al. [17] proposed a feature-based
awareness among users and researchers [20–23]. This section model for detecting smishing messages using a machine learning
gives an overview of various research work presented by other approach. The proposed model extracted some features of smish-
authors in the context of Smishing. As smishing is a category of ing messages and analyzed the results using 5 Machine Learning
phishing and similar techniques are used for phishing detection, algorithms on the same dataset. Experimental evaluation of this
we have discussed some phishing detection approaches [12,24– model displayed an accuracy of 98.74%.
28]. Some authors have proposed spam SMS detection model using
In a smishing study proposed by Joo et al. [13], the author similar approaches. In a study presented by Yadav et al. [18],
suggested a system called ‘S-Detector’ for identifying smishing the author presented ‘SMSAssassin’ a spam SMS detection model
806 S. Mishra and D. Soni / Future Generation Computer Systems 108 (2020) 803–815
using the Naive Bayes algorithm and Support Vector Machine et al. [12] suggested a heuristic approach for detecting phishing
(SVM). Authors have used crowd sourcing to keep the spam websites. Based on the characteristics of phishing URLs and web
database updated. Users having SMSAssassin application on their pages, they have defined 20 heuristics tests. They have also de-
mobile device can use the spam messages identified on their veloped an active anti-phishing toolbar called Phishark. From the
device to update the Spam Keyword Frequency list to enhance the experimental results, they have shown that the combination of
performance of the system. The authors have also presented [19] URL-based features and HTML-based features is very effective in
the design and implementation of SMSAssassin which performs distinguishing legitimate websites from phishing websites.
machine learning-based SMS content filtering to distinguish spam The existing methods proposed by other researchers used
messages. They have designed and implemented an Android ap- rule-based approaches [10,13,14,16] and heuristics approaches
plication which categorizes and handles different kind of mes- [11,12] to detect smishing attacks. Rule-based approaches in-
sages which makes the SMS management simple for the user. cluded some rules which covered the flow of smishing attacks
This application also offers an interface to select the notification and its re-directions. But rule-based approaches do not cover
preference to the user. the different versatilities of the smishing attacks. Heuristic ap-
Some researchers have discussed phishing and smishing de- proaches analyze the contents of the text message. Applying
tection approaches to bring awareness among users and re- machine learning algorithms on SMS contents only does not
searchers. Foozy et al. [20] proposed taxonomy of phishing detec- detect the versatile smishing trends. We strongly feel that it is
tion on a mobile device. They have stated and elaborated various necessary to analyze the behavior of the URL link present in the
phishing techniques like Bluetooth phishing, SMS phishing, voice text message for the detection of smishing.
phishing, and mobile application phishing. The researchers also
compared and evaluated various phishing detection techniques. 3. Proposed work
Hossain et al. [21] elaborated various types of phishing attacks.
They have also discussed various phishing mitigation techniques 3.1. Architecture of the proposed system
and best policies which should be followed to prevent phishing
attacks on mobile devices. Their work is aimed at bringing more The proposed model is a Smishing detector which is comprised
of 4 modules, namely; SMS Content Analyzer, URL Filter, Source
security awareness to mobile users. Diksha et al. [22] presented
Code Analyzer and Apk Download Detector as depicted in Fig. 3.
a detailed study of phishing attacks. They have discussed var-
SMS Content Analyzer verifies the presence of URL, self an-
ious mobile phishing attacks conducted by attackers and their
swering link (SAL), phone number and email id in the SMS.
countermeasures. They have also discussed the taxonomy of
Messages containing URL or SAL are transferred to URL Filter.
phishing solutions, techniques used by authors, various chal-
Messages containing email id and phone number are processed
lenges involved in the detection of phishing attacks and they
for blacklist check. Then messages are forwarded for text pre-
have also elaborated various phishing datasets. In a recent work,
processing. Text pre-processing is conducted to convert the text
Sandhya et al. [23] discussed various smishing attacks performed
into a form that can be used for text analysis. Text pre-processing
by attackers. Author has also suggested some policies which can
includes removing all punctuations and special strings, converting
be adopted by the user to deal with the smishing attacks. Vari-
each word to lower case, splitting words to tokenize, stemming
ous techniques and methodologies used for mitigating smishing
i.e. converting each word to its root form and preparing a word
attacks are also discussed in this paper.
vector corpus. Then we categorize the messages on the basis
In a study to detect phishing websites, the author [24] pro-
of keywords present in it. Keywords contained in the message
posed a model called ‘PhiDMA’, incorporating five layers. They
are classified using TfidfVectorizer and Naive Bayes classifier.
have also implemented a sample of this system which offers TfidfVectorizer converts each word in the message to the feature
an interface for visually impaired persons. Experimental results index in the feature vector matrix. In each vector, the numbers
have shown an accuracy of 92.72%. Wu et al. [25] proposed represents the TF–IDF score of each word selected as a feature.
a phishing detection technique called ‘‘MobiFish’’ that detects These feature vectors are used as input to Naive Bayes Classifier.
phishing attacks on mobile devices. For mobile applications, they Naive Bayes Classifier predicts the result based on the learning
have designed ‘‘AppFish’’ and for mobile web pages, they have from the feature vector matrix.
designed ‘‘WebFish’’. The proposed method examines the URL, URL Filter first converts the short URL to Long URL. Then,
the IP address of the URL and HTML source code of the website it looks for the URL in Blacklist. URL found in the blacklist is
to detect the maliciousness of the webpage. Zhang et al. [26] categorized as Smishing. In the next step, URL Filter verifies four
proposed a system that uses the features of phishing URLs, such features of the URL, namely, the age of domain, presence of @tag,
as hosted features and lexical features to inspect the URL. They presence of hyphen and number of dots present in the URL to
have used a machine learning classifier to detect the phishing check the authenticity of the URL. If the threshold of the above
sites on the basis of the selected features. They have shown features is greater than or equal to three i.e. 75% or above, then
high accuracy of their proposed system by experimenting it on we categorize the message as smishing. If the threshold is less
multiple datasets. Mohammad et al. [27] suggested a phishing than three, we pass on the URL to Source Code Analyzer.
website classification system. They have investigated 17 efficient Source Code Analyzer verifies the presence of any form tag
features for the detection of phishing websites and developed a in the source code. If form tag is present in the source code,
new rule for each feature. They have considered the frequency of Source Code Analyzer compares the domain of the request URL
each feature in their datasets to identify the most popular feature in source code with the domain of the actual URL invoked. If the
in the detection of phishing websites. domain is different, the message is classified as Smishing. On the
A novel system proposed in [28], called CANTINA which is a other hand, if the domain is same, the URL is transferred to APK
content analysis method to detect phishing websites using TF–IDF Download Detector.
measures. TF–IDF score is calculated for each word in a document APK Download Detector checks for any file downloading while
and then five words with the highest values are selected. A lexical invoking the URL. This checking is actually carried on by the
signature is found on the basis of selected words, which is fed proposed system without visiting the website. The base name
into a search engine. If the domain name of N top search results of actual URL is extracted to assess whether it contains .apk
matches the domain name of the current web page, authors extension as part of the base name. This process does not in-
declared it as legitimate website else phishing website. Sophie volve invoking the URL or downloading the APK file. It also
S. Mishra and D. Soni / Future Generation Computer Systems 108 (2020) 803–815 807
Table 1 Table 2
Popularity of features used in designing phishing websites. Performance of machine learning classifiers.
Feature Frequency in % Classifiers Precision Recall F1-Score Accuracy
Long URL 47.2 Naive Bayes Classifier 0.93 0.92 0.92 91.6
Age of Domain 93.5 Random Forest classifier 0.88 0.82 0.83 82.3
Presence of hyphen (-) 29.4 Decision Tree classifier 0.91 0.88 0.89 88.2
Number of dots in URL 10.2
Form tag in source code 96.5
Difference in domain 98.3
APK Download 56.4 on VERIFY button, the message is evaluated by the SMS Content
Analyzer and the system displays the result and it redirects the
user to URL Filter screen. URL Filter and Source Code Analyzer
an application of any legitimate website like Flipkart or Pinter- screen provide the user with the option of skipping that particular
est. Hence, if user consent is taken, we regard the message as step by tapping the SKIP button as shown in Figs. 8(b) and
legitimate. We are also checking for malicious file download after 8(c). APK Download Detector module can be skipped by tapping
re-direction of the page. on NEXT button which redirects the user to the next message.
Hence, the user can navigate to next message either from the SMS
3.3. Popularity of features used in the system Content Analyzer module or APK Download Detector module.
Fig. 9(a) shows a legitimate message identified by the pro-
In case of a URL is detected in the message, we used some posed system. This interface provides three buttons: READ But-
features of the phishing websites in order to detect the phish- ton, NEXT Button and GO BACK Button. The message is recognized
ing attack. Some features commonly used by the attackers in as genuine by the model, hence, the user is permitted to access
designing phishing websites are identified. To find out which the message by clicking the READ Button or user has the option to
feature is most popular in designing phishing websites, we calcu- tap the NEXT button to go to the NEXT message. Go Back button
lated the number of appearance of each feature in the phishing navigates the user to the previous screen. Fig. 9(b) shows the
dataset [29]. The percentage of appearance i.e. frequency of each smishing message recognized by the system. As the message is
feature used in our system is shown in Table 1. The results categorized as smishing by the model, the DELETE button deletes
showed that ‘Difference in domain’ is most popular feature in the messages without navigating to the inbox, and the NEXT
designing phishing websites. Form Tag, Age of Domain and APK button allows the user to move to the next message.
download also appeared in most of the phishing websites with a SMS Content Analyzer analyzes the text content of the SMS.
frequency of 96.5%, 93.5% and 56.4% respectively. That is why we Here, we check for the presence of email id, phone number and
have tried to cater the features of higher percentage with utmost URL in the message. We segregated the messages based on its
care. contents. Fig. 10 shows the result of the SMS content analysis. Out
of 5858 messages, 5122 messages are segregated and declared as
4. Implementation results and evaluation legitimate messages in this stage, as they do not contain any of
the malicious contents like email id, phone number, or URL.
In this section, we describe the implementation details and Text pre-processing is done on messages containing email id
evaluation results of the proposed system. A prototype of the and phone number. Text pre-processing is conducted to convert
Smishing Detector is developed using python in Jupyter Note- the text into a form that can be used for text analysis. Text
book. The system is developed into four parts, namely, SMS Con- pre-processing includes removing all punctuations and special
tent Analyzer, URL Filter, Source Code Analyzer and APK Down- strings, converting each word to lower case, splitting words to
load Detector. Further, these four modules are integrated to get tokenize, stemming i.e. converting each word to its root form and
a final prototype of the whole system. The system is finally preparing a word vector corpus. Fig. 11 shows the results of the
evaluated using the dataset. Dataset is collected from the research text pre-processing module.
work contributed by the author Almeida [30], a contribution to After text pre-processing, we categorize the messages on the
the study of SMS Spam Filtering. This dataset contains a total basis of keywords included in it. Keywords contained in the
of 5574 messages in which 4827 are ham messages and 747 message are classified using TfidfVectorizer and machine learning
are spam messages. As per our research, smishing dataset is not classifiers. TfidfVectorizer converts each word in the message
publicly available until now. But smishing message is a spam to feature index in the feature vector matrix which is used as
message that strives to steal sensitive data from the user. So, input to machine learning classifiers. Machine learning classifier
we have extracted some smishing messages from spam dataset predicts the result based on the learning from the feature vector
because smishing messages are part of spam messages. Also, we matrix. We have used three machine learning algorithms for
have extracted 284 smishing images from pinterest.com [9]. We the classification and comparison purpose, namely, Naive Bayes,
have added these messages into the dataset after converting it Random Forest, and Decision Tree. We have evaluated our dataset
into text form. Our final dataset is a total of 5858 messages for keyword classification using these three algorithms and we
which contains 538 smishing messages and 5320 ham messages. are getting the best results using Naive Bayes Algorithm for the
Smishing detection is a binary classification problem in which particular dataset.
either a message is smishing or it is legitimate. The model evaluated the performance of three well-known
Fig. 8 depicts that the model has four interfaces: SMS Content machine learning classifiers as depicted in Table 2. Fig. 12 show
Analyzer, URL Filter, Source Code Analyzer and APK Download the performance of machine learning algorithms. In this Figure,
Detector. SMS Content Analyzer displays the phone number of the the x-axis displays the machine learning algorithms, and the
sender and the text message received. It provides three buttons y-axis displays the performance of the algorithms. For the im-
for the convenience of the user. If the user is already aware of plemented dataset, Naive Bayes gives the best performance and
the genuineness of the message, then the user can tap the NEXT Random Forest gives the least performance.
button to skip the verification of the particular message and go Source Code Analyzer analyzes the presence of any form tag
to next message or the user can tap the Go BACK button to go in the source code of the URL. It fetches the actual URL of the
to the previous screen. On the other hand, if the user is tapping short URL provided. Then it accesses the source code of the
810 S. Mishra and D. Soni / Future Generation Computer Systems 108 (2020) 803–815
Fig. 11. Results of SMS text pre-processing done on messages containing email id and phone number.
actual URL without actually invoking the URL. Now, the system 80% of data for training purpose, and remaining 20% data for test-
search for a form tag in the whole source code. The result is ing purpose. After 5 iterations, our system gave a final accuracy
displayed according to the search outcome obtained. The result of 96.29%. Cross-validation results are depicted in Table 3.
of the procedure is depicted in Fig. 13. The result of the prototype evaluated and experimented shows
APK Download Detector checks for any file download without that the model gives an accuracy of 91.6% using Naive Bayes
invoking the URL. Fig. 14 shows the whole procedure of per- Classifier in SMS Content Analyzer module. Further, the model
forming an APK Download Check. The first line shows the URL shows a final accuracy of 96.29% after evaluating all four modules.
which is in short URL. The second line shows the result of fetching Hence, through our model evaluation, it is concluded that this
the actual link address from the short URL. The third line shows system is efficient in detecting smishing messages received in
extracting the file name of the file downloaded from the link mobile devices and thereby protecting the users from probable
without taking user consent. If the file is downloaded without threats.
user consent, the link is declared as malicious and hence the As we have followed a flow based approach consisting of a
message is categorized as SMISHING. set of diverse rules, the attacker needs to breach each rules in
We have evaluated our dataset with 5-fold cross-validation a module to reach the next stage in a particular module. We
where we partition the data into 5 equally sized subsets or folds. have four modules in our system, it is almost impossible for the
In the first iteration, first subset is used to test the model and rest attacker to predict the whole flow of the system, to circumvent
4 subsets are used to train the model. In the second iteration, each of the diverse rules applied in each module and to reach to
2nd subset is used as the testing set while the rest serve as the the final decision stage. Hence, this model is a proof against the
training set. This process is repeated until each subset of the 5 attacker who is constantly trying to invent different techniques
folds has been used as the testing set. 5-fold cross-validation uses to circumvent the smishing detection system.
812 S. Mishra and D. Soni / Future Generation Computer Systems 108 (2020) 803–815
Fig. 12. Performance of Machine learning algorithms for SMS keyword classification.
Fig. 13. Results of Form Tag check in the HTML Source code by Source Code Analyzer.
Table 3
Cross-validation results of the Proposed Approach.
Table 4
Comparison of our proposed system with the existing researches.
Security S-Detector [13] Rule-Based [14] SmiDCA [11] Feature-Based [17] Proposed System
Requirements
Approaches Used Rule-Based Rule-Based Heuristic-based Feature Machine learning Rule-Based Flowchart
Flowchart selection classification of SMS
Features
Classification Naive Bayes Decision Tree, RIPPER Random Forest, Decision Logistic Regression, Naive Bayes
Approach and PRISM Tree, AdaBoost and SVM Neural Network, Random
Forest, Naive Bayes and
SVM
Dataset Used Not Specified Almeida [30]-spam Almeida [30]-spam Almeida [30]-spam Almeida [30]-spam
dataset manually filtered dataset of 5574 dataset manually filtered dataset manually
to make a new smishing messages of which 747 to make a new smishing filtered to make a new
dataset of 5169 are spam messages and dataset of 5169 smishing dataset of
messages of which 362 4827 are ham. messages of which 362 5858 messages of
are smishing messages are smishing messages which 538 are
and 4807 are ham. and 4808 are ham. smishing messages and
5320 are ham.
Keywords ✓ ✓ ✓ ✓ ✓
Classification
presence of URL ✓ ✓ ✓ ✓ ✓
Presence of Phone no. ✓ ✓ ✓ ✓ ✓
and Email id in the
message
Phone no. and Email X X X X ✓
id in blacklist
URL in Blacklist X X X X ✓
Check for login page X X X X ✓
Difference in URL X X X X ✓
Domain
APK Download ✓ X X X ✓
APK Download after ✓ X X X ✓
re-direction
User consent while X X X X ✓
downloading APK
5. Comparative analysis of any login page which prompts the user to fill user credentials.
Ankit et al. [14] have followed a rule-based smishing detection
We analyzed the existing researches related to Smishing and system considering the contents of the message but they have not
compared the related works based on the security measures. analyzed the behavior of the URL. SmiDCA proposed by Sonowal
Table 4 represents a comparative analysis of recent researches et al. [11] has presented a heuristics-based system considering
with the proposed system for Smishing detection systems. the contents of an SMS and the keywords present in it. This
The existing Smishing detection systems proposed by other system does not check for the maliciousness of the URL, whether
authors and the proposed system commonly check for the pres- it downloads a malicious file or prompts the user to fill user
ence of URL when an SMS is received, and they also do the con- credentials in a form provided. As per our study, Smishing is
tent analysis of the text message received. But Content Analysis a threat in which an attacker sends an SMS to the user, and
alone cannot determine the maliciousness of a message until and that SMS includes links to user interfaces, malicious applications,
unless we further check the maliciousness of the URL involved. web pages that prompt user to enter their credentials. Hence, we
Verification for the presence of self answering link, phone cannot classify a message as smishing if the link contained in it
number and email id is not done in Smishing detection system is legitimate even if the message contains malicious keywords.
proposed by Joo at al [13]. They have analyzed the URL for Hence, we strongly sense the need for checking the behavior of
download of any APK file but have not checked for the presence the URL further to analyze the maliciousness of the link.
814 S. Mishra and D. Soni / Future Generation Computer Systems 108 (2020) 803–815
Smishing Classifier proposed by Goel et al. [10] has done sim- CRediT authorship contribution statement
ilar work but they have not eliminated the false-positive results.
If an APK is downloaded, they have classified it a smishing but an Sandhya Mishra: Conceptualization, Methodology, Software,
APK can be the legitimate app of a shopping website like Flipkart. Data curation, Writing - original draft, Visualization, Investiga-
Similarly, login page and self-answering link can be legitimate tion. Devpriya Soni: Conceptualization, Supervision, Validation,
too in case of a message received from Facebook or Amazon Writing - review & editing.
asking the user to log in to their website for availing some offers.
Hence, we need to further check the legitimacy of the website References
to avoid false-positive results. If a URL or Self Answering Link
[1] Mobile Internet - Statistics & Facts, Retrieved from https://www.statista.
is not present in a message, keywords threshold alone cannot
com/topics/779/mobile-internet/.
determine the maliciousness because as per our study, a message
[2] Number of mobile phone users in India from 2013 to 2019, Retrieved from
without any link, mobile number or E-mail ID cannot cause any https://www.statista.com/statistics/558610/number-of-mobile-internet-
malicious activity even if it has high threshold keywords present user-in-india/.
in it. Verification for the User Consent while downloading APK [3] Global mobile statistics, part a: Mobile subscribers; handset market share;
file performed in the proposed system is not done by any of the mobile operators, 2014, Retrieved from https://mobiforge.com/research-
authors till now. It considerably helps to detect the malicious file analysis/global-mobile-statistics-2014-part-a-mobile-subscribers-
handset-market-share-mobile-operators/.
downloading on a mobile device. This check also helps to reduce
[4] Daily SMS Mobile Usage Statistics. https://www.smseagle.eu/2017/03/06/
false-positive results. sms-mobilestatistics-2/. (Accessed 7 June 2017).
[5] L. Kessem, Rogue Mobile Apps, Phishing, Malware and Fraud, 2012, Re-
6. Conclusion and future work trieved from https://blogs.rsa.com/rogue-mobile-apps-phishing-malware-
and-fraud.
[6] https://cmap.amp.vg/web/b3lknalklab1f.
Smishing is a critical attack involved with mobile devices
[7] G. Canova, M. Volkamer, C. Bergmann, R. Borza, B. Reinheimer, S. Stock-
which is rising in this mobile era. Hence, this paper proposed hardt, R. Tenberg, Learn to spot phishing urls with the android nophish
an efficient model titled Smishing Detector to detect and block app, in: IFIP World Conf. Information Security Education, Springer, 2015,
Smishing attacks. The proposed model is using SMS content anal- pp. 87–100.
ysis and URL inspection method to classify smishing messages [8] L. Cranor, S. Egelman, Y. Zhang, Phinding Phish: Evaluating anti-phishing
from legitimate messages. SMS Content Analyzer is the module tools, in: Proceedings of The 14th Annual Network and Distributed System
Security Symposium, February 28–March 2, 2007, 2017.
to analyze the contents of the message. URL Filter, Source Code
[9] Pinterest, Smishing message images, November 20 2018, Retrieved from
Analyzer and APK Download Detector are the modules to inspect
https://in.pinterest.com/seceduau/smishing-dataset/?lp=true.
the behavior of URL contained in the message. We have also [10] Diksha Goel, Ankit Kumar Jain, Smishing-classifier: A novel framework for
developed a prototype of the system using the tkinter package. detection of smishing attack in mobile environment, in: NGCT, CCIS 828,
This prototype provides user-friendly buttons to skip some of 2018, pp. 502–512.
the steps involved in the system. Machine learning algorithms [11] Gunikhan Sonowal, K.S. Kuppusamy, SmiDCA: An anti-smishing model
are used to classify messages on the basis of smishing keywords. with machine learning approach, Comput. J. 61 (8) (2018) 1143–1157.
[12] Sophie Gastellier-Prevost, Gustavo Gonzalez Granadillo, Maryline Laurent,
Naive Bayes classifier shows the best accuracy for the keyword
Decisive heuristics to differentiate legitimate from phishing sites, in:
classification in our proposed system for the particular dataset in- Conference on Network and Information Systems Security, La Rochelle,
volved. After integrating all the four modules, the final prototype 2011, pp. 1–9, http://dx.doi.org/10.1109/SAR-SSI.2011.5931389.
experimented shown an accuracy of 96.29%. [13] J.W. Joo, S.Y. Moon, S. Singh, J.H. Park, S-detector: an enhanced security
A comparison of our model with existing models displayed model for detecting smishing attack for mobile computing, Telecommun.
that this system comprises more security requirements as com- Syst. 66 (2017) 1–10.
pared to other proposed models. This system is expected to [14] Ankit kumar Jain, B.B. Gupta, Rule based framework for detection of
smishing messages in mobile environment, Procedia Comput. Sci. 125
deliver more effective security against attacks in terms of iden-
(2018) 617–623.
tifying smishing messages and preventing false-positive results. [15] Anna Kang, Jae Dong Lee, Won Min Kang, Leonard Barolli, Jong Hyuk Park,
The practical implementation of this work can be integrated with Security considerations for smart phone smishing attacks, 2014, http:
the Android platform and can be used as an application to detect //dx.doi.org/10.1007/978-3-642-41674-3_66.
the smishing messages. This application can be used to identify [16] Sandhya Mishra, Soni Devpriya, A Content-Based Approach for Detecting
smishing message when a message is received and the message Smishing in Mobile Environment, Suscom, 2019, Available at SSRN: http:
can be discarded or saved based on the result. //dx.doi.org/10.2139/ssrn.3356256.
[17] Ankit Jain, B.B. Gupta, Feature based approach for detection of smishing
In future, we are planning to incorporate more techniques to
messages in the mobile environment, J. Inf. Technol. Res. 12 (2019) 17–35,
the proposed model in order to prevent more intelligent and ver- http://dx.doi.org/10.4018/JITR.2019040102.
satile threat methods. This system is lacking security in ensuring [18] K. Yadav, P. Kumaraguru, A. Goyal, A. Gupta, V. Naik, Smsassassin: Crowd-
the genuineness of the application downloaded in APK Download sourcing driven mobile-based system for SMS spam filtering, in: Proc. 12th
Detector module. Hence, to ensure application security, we are Workshop on Mobile Computing Systems and Applications, New York, NY,
also planning to embed Malware detector with APK Download USA HotMobile, ACM, 2011, pp. 1–6.
Detector to identify malicious apps in our future work. This [19] K. Yadav, S.K. Saha, P. Kumaraguru, R. Kumra, Take control of your SMSes:
Designing an usable spam SMS filtering system, in: IEEE 13th International
will focus on more research work to provide security against
Conference on Mobile Data Management, Bengaluru, Karnataka, 2012, pp.
personal information leakage and to detect malicious applications 352–355.
downloaded. [20] C.F.M. Foozy, R. Ahmad, M.F. Abdollah, Phishing detection taxonomy for
mobile device, Int. J. Comput. Sci. 10 (1) (2013) 338–344.
Declaration of competing interest [21] Hossain Shahriar, Tulin Klintic, Victor Clincy, Mobile phishing attacks and
mitigation techniques, J. Inf. Secur. 06 (2015) 206–212, http://dx.doi.org/
10.4236/jis.2015.63021.
The authors declare that they have no known competing finan- [22] Diksha Goel, Ankit Kumar Jain, Mobile phishing attacks and defence
cial interests or personal relationships that could have appeared mechanisms: state of art and open research challenges, Comput. Secur.
to influence the work reported in this paper. (2017) http://dx.doi.org/10.1016/j.cose.2017.12.006.
S. Mishra and D. Soni / Future Generation Computer Systems 108 (2020) 803–815 815
[23] S. Mishra, D. Soni, SMS phishing and mitigation approaches, in: Twelfth Sandhya Mishra is currently pursuing her Ph.D. in Mo-
International Conference on Contemporary Computing (IC3), Noida, India, bile Security from Jaypee Institute of Information Tech-
2019, pp. 1–5, http://dx.doi.org/10.1109/IC3.2019.8844920. nology, Noida, India. She received Master of Computer
[24] G. Sonowal, K. Kuppusamy, Phidma—a phishing detection model with Applications from Guru Gobind Singh Indraprastha
University, Delhi, India. Her research interest includes
multi-filter approach, J. King Saud Univ. Comput. Inf. Sci. 29 (2017) 1–15.
Cyber security, Mobile security, Smishing and Phishing
[25] L. Wu, X. Du, J. Wu, MobiFish: A lightweight antiphishing scheme
Detection, Web security, Social Media Network and
for mobile phones, in: 23rd International Conference on Computer
Machine Learning.
Communication and Networks, ICCCN, 2014, pp. 1–8.
[26] J. Zhang, Y. Wang, A real-time automatic detection of phishing URLs,
in: 2nd International Conference on Computer Science and Network
Technology, ICCSNT, IEEE, 2012, pp. 1212–1216. Dr. Devpriya Soni is presently working as Associate
[27] R.M. Mohammad, F. Thabtah, L. McCluskey, Intelligent rule-based phishing Professor in Jaypee Institute of Information Technol-
websites classification, IET Inf. Secur. 8 (2014) 153–160. ogy, Noida, India. She has received her Ph.D. degree
[28] Yue Zhang, Jason Hong, Lorrie Cranor, Cantina: a content-based approach from Maulana Azad National Institute of Technology
to detecting phishing web sites, 2007, pp. 639–648, http://dx.doi.org/10. (MANIT), Bhopal. She is the Board Member for confer-
1145/1242572.1242659. ence proceedings and member of review committee for
journals. She is a Remote Center Coordinator for ISTE
[29] PhishTank – Blacklisted URLs, Retrieved from http://data.phishtank.com/
workshops organized by IIT, Bombay and conducted
data/online-valid.csv.
several workshops as RC coordinator. She is also recog-
[30] T.A. Almeida, J.M.G. Hidalgo, A. Yamakami, Contributions to the study of
nized as the Aakash Project Coordinator for the remote
SMS spam filtering: New collection and results, in: 11th ACM Symposium center of IIT, Bombay and guiding many undergraduate
on Document Engineering, 2011, pp. 259–262. level projects for Aakash tablet. She is also serving as a Reviewer of several
Ph.D. thesis and was invited as jury member, examiner in Skema University
(Lille, France). She has Guided several projects at undergraduate and post
graduate level. Her research interests include mobile security, Mobile Application
Development, Information Retrieval and Data Mining, Software Engineering, Data
Science.