A Smishing Detection Method Based On Sms Contents Analysis and Url Inspection Using Google Engine and Virustotal
A Smishing Detection Method Based On Sms Contents Analysis and Url Inspection Using Google Engine and Virustotal
ISSN: 0067-2904
Abstract
Smishing is the delivery of phishing content to mobile users via a short message
service (SMS). SMS allows cybercriminals to reach out to mobile end users in a new
way, attempting to deliver phishing messages, mobile malware, and online scams that
appear to be from a trusted brand. This paper proposes a new method for detecting
smishing by combining two detection methods. The first method is uniform resource
locators (URL) analysis, which employs a novel combination of the Google engine
and VirusTotal. The second method involves examining SMS content to extract
efficient features and classify messages as ham or smishing based on keywords
contained within them using four well-known classifiers: support vector machine
(SVM), random forest (RF), adaptive boosting (AdaBoost), and extreme gradient
boosting (XGBoost). The best results of the proposed method were 98.5%, 96.9%,
93.1%, and 95.05% in terms of accuracy, precision, detection rate, and F1-score,
respectively. Furthermore, the evaluation results of the proposed method
outperformed the state-of-the-art and showed that the proposed method is effective in
detecting smishing messages.
كشف رسائل التصيد االحتيالي مستندة على تحليل الرسائل وفحص الموقع االلكتروني باستخدام دمج
محرك كوكل مع فايروس توتال
الخالصة
رسالة التصيد االحتيالي هي رسالة تصيد ترسل الى مستخدمي الهاتف المحمول عبر خدمة الرسائل
تتيح الرسائل القصيرة لمجرمي اإلنترنت الوصول إلى مستخدمي الهاتف المحمول إلرسال رسائل.القصيرة
يقترح. ب ارمج ضارة للجوال وعمليات االحتيال عبر اإلنترنت التي يبدو أنها مرسلة من مصادر موثوق به،التصيد
الطريقة األولى.هذا البحث طريقة جديدة الكتشاف رسائل التصيد االحتيالي عن طريق دمج طريقتين الكشف
والطريقة الثانية هيVirusTotal و وGoogle ( باستخدام محركURL) هي تحليل محدد الموارد الموحد
فحص محتوى الرسائل القصيرة الستخراج ميزات فعالة وكلمات رئيسية موجودة في الرسالة وتصنيف الرسائل
آلة متجه الدعم:جيدا
ً على أنها رسائل غير ضارة أو تصيد احتيالي باستخدام أربعة خوارزميات معروفة
___________________________________________
*Email: [email protected]
5376
Mahmood and Hameed Iraqi Journal of Science, 2023, Vol. 64, No. 10, pp: 5376- 5391
1. Introduction
Phishing is the harmful attacks used to gain access to online users' sensitive financial or
private data by utilizing illegal websites that appear to be authentic. Social engineering
techniques are commonly used in phishing attacks to divert clients to malicious websites.
Specifically, an e-mail is sent to clients from trusted sources encouraging them to change their
login information by clicking/following a hyperlink [1]. It uses deceptive techniques to trick
internet users into disclosing their personal information, including usernames, passwords, credit
card details, and bank account information, believing the website to be legitimate [2].
As shown in Figure 1 [3], there has been a rise in mobile phone usage. This led to an increase
in information crime; One such crime is smishing. It is a part of spam that has a significant
negative impact on many users' everyday lives as they waste a lot of time dealing with spam,
which attracts users but may include unanticipated dangerous attachments that can badly
compromise the user's system [4]. A smishing SMS, for example, informs the recipient that
they won a prize or a sum of money, or that they need to resolve an issue with their bank card
or electronic account. Short message service (SMS) is one of the most popular communication
methods [5].
8
number of smartphone use in million
0
2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026
years from 2016 to 2026
Attackers prefer text messages to target victims because they can reach a large number of
people with a low-cost SMS subscription. These messages contain a link to malware or phishing
websites that will ask the user for sensitive information. Malware is downloaded to the user's
mobile device and then performs malicious operations on the device [6].
The unstructured SMS text message data and the nonlinearity involved in interpreting SMS
text message data make distinguishing between phishing and legitimate SMS a challenging
task. Smishing detection models based on checking the legitimacy of Uniform Resource
Locators (URLs) and analyzing SMS content are proposed in this paper using a variety of
machine learning algorithms. The following are the main contributions of this paper:
5377
Mahmood and Hameed Iraqi Journal of Science, 2023, Vol. 64, No. 10, pp: 5376- 5391
• Proposing a new method that combines the Google engine and VirusTotal to examine the
URL authenticity in the SMS
• Examining text messages to extract several features capable of distinguishing smishing
messages from SMS by adopting TF-IDF with a new strategy.
• Applying different machine learning algorithms to judge the performance of the proposed
smishing detection.
2. Related work
Researchers have proposed several approaches to combat smishing attacks, including
content-based, URL behavior analysis, and heuristic techniques. Some of these works are
discussed bellow:
Mishra and Soni [6] presented an approach based on the combination of URL behavior
analysis and message content for smishing detection. The system uses SMS content analysis, a
machine learning classifier, and an examination of the URL behavior method for phishing SMS
classification. the presence of email IDs, phone numbers, or URLs in messages is discovered
in the first phase by filtering the content of the text messages. To calculate word frequency,
they used the term Frequency-Inverse Document Frequency (TF-IDF), and to classify the
smishing messages, OneVsRest classifier was used. The benefit of analyzing URLs is that it
detects Android Application Package (APK) downloads at the same time the source code is also
inspected to see if the form tag exists in the messages.
Joo et al. [7] proposed a smishing detection system to inspect and balk phishing SMS. The
presence of the URL is examined in the message. They systems includes four parts: the SMS
monitor, analyzer, determinant, and a database. The researchers applied Naïve Bayesian
classifier (NB) to distinguish phishing SMS from legal ones.
Jain and Gupta [9] proposed content-based filtering with a rule-based approach. Three
algorithms and nine rules were implemented by researchers: Repeated Incremental Pruning To
Produce Error Reduction (RIPPER), Decision Tree (DT), and PRISM for message
classification. the acquired result was positive and the system can notice the zero-day attack.
A model of smishing detection was suggested by Goel and Jain [10]. The authors implemented
NB to distinguish smishing messages from legitimate ones. The messages were converted to
the standard format using Text Normalization techniques, and the system also checked URLs,
phone numbers, and APK downloads. The blacklist URL proposed in this model is ineffective
because the malicious URL is frequently updated.
A heuristic-based algorithm was introduced by Jain and Gupta [11] for smishing detection
with the use of feature selection and machine learning algorithms. The system selects ten
features by analyzing the content of the messages and classifying them using classification
algorithms.
5378
Mahmood and Hameed Iraqi Journal of Science, 2023, Vol. 64, No. 10, pp: 5376- 5391
Sonowal [13] offered a combination of content feature extraction and four correlation
machine learning algorithms, namely spearman’s correlation, Pearson rank correlation, point
biserial rank correlation, and Kendall rank correlation for ranking features. The system achieved
98.40% accuracy with the AdaBoost classifier.
Another smishing detection model introduced by Mishra and Soni [14] consisted of the
domain checking phase and the SMS classification phase. The first phase discovers the
authenticity of the URL in the SMS, which leads to phishing detection, and the second phase
processes the text content of the messages by extracting discriminant features. The proposed
work used the Backpropagation (BP) algorithm, RF, NB, and DT for message classification.
Moreover, the system obtained 97.93% accuracy.
A content-based model was suggested by Ulfath et al. [15]. They evolved an automated
system with the ability to differentiate smishing messages from legal ones. The proposed work
has multiple steps including features extraction and selection, machine learning classification,
Extreme Gradient Boosting (XGBoost), RF, Classification And Regression Tree (CART),
SVM, and AdaBoost. SVM is put above the other classifiers for showing the best result with
the minimum number of features
Shravasti and Chavan [16] proposed a smishing detection model based on artificial
intelligence. The suggested model begins with pre-processing and extracting some effective
features like (term function, URL, email address, mobile number, number of characters, and
currency symbol). Finally, classification techniques such as Long Short-Term Memory
Recurrent Model (LSTM), K-Neighbors (KNN), Stochastic Gradient Descent (SGD), DT, NB,
and RF are used to classify smishing messages from legitimate ones. In this model, the LSTM
showed the best accuracy of 95.11%.
5379
Mahmood and Hameed Iraqi Journal of Science, 2023, Vol. 64, No. 10, pp: 5376- 5391
This paper attempts to circumvent the limitations of the previous works by proposing a new
method for URL inspection that combines two inspection techniques: Google engine and
VirusTotal. Then this will be followed by SMS classification.
Table 1 compares the proposed method with other smishing detection methods from various
perspectives. The domain names are verified by Google, while VirusTotal determines whether
the SMS URLs are malicious or not, and APK downloads is utilized for checking file contents.
Contents analysis for extracting features and feature selection are taken into account because
they have an impact on smishing detection. Finally, a heuristic method depends on distinctive
features from both smishing and legitimate SMSs .
Table 1: Comparison of the Proposed Model with Some Smishing Detection Models in the
Literature
Techniques [6] [8] [11] [14] [15] [17] The
proposed
method
Google engine X X X ✓ X X ✓
VirusTotal X X X X X ✓ ✓
Content analysis ✓ ✓ X X ✓ ✓ ✓
APK download checking ✓ X X ✓ X X X
Feature-selection X ✓ ✓ X ✓ X ✓
Heuristic X X ✓ ✓ X X X
3. Preliminary concepts
The following subsections provide a background relating to chi-square, and machine
learning algorithms including SVM, RF, adaptive boosting (AdaBoost), and XGBoost.
3.1 Chi-square
Chi-square (𝑥 2 ) test is used in statistics to determine the independence of two events. The
events X and Y are considered independent when Eq. (1) is satisfied [19]. Chi-square is used to
see if the observed data matches the expected data as described in Eq. (2).
5380
Mahmood and Hameed Iraqi Journal of Science, 2023, Vol. 64, No. 10, pp: 5376- 5391
The following two points summarize its main concept: First, it builds a nonlinear kernel
function that represents the inner product of the feature space, which corresponds to a nonlinear
algorithm mapping data from the input space into a potentially high-dimensional feature space.
Thus, a linear algorithm can be used to analyze the nonlinear properties of samples in the feature
space. Second, it applies the structural risk minimization principle from statistical learning
theory by generalizing the optimal hyper-plane with the greatest margin between the two classes
[23].
The Adaboost algorithm generates a set of poor learners by keeping a collection of weights
over training data and adaptively adjusting them after each weak learning cycle. The weights
of training samples misclassified by the current weak learner will be increased, while the
weights of correctly classified samples will be decreased [26, 27].
5381
Mahmood and Hameed Iraqi Journal of Science, 2023, Vol. 64, No. 10, pp: 5376- 5391
detector model categorizes the SMS 𝑠𝑖 , as ham or smishing depending on the analysis of the
behaviour of the URL existing in the SMS and its contents. Moreover, to detect smishing, it is
necessary to identify discrimination features that distinguish smishing from ham. To support
machine learning algorithms, we need to extract a set of n features 𝐹 = {𝑓1 , 𝑓2 … , 𝑓𝑛 } from 𝕊.
A new regular expression is proposed to describe a URL search pattern. The proposed
regular expression that can effectively extract URLs from SMS is (http[s]? S+) |(HTTP[s]? S+)
|(www.S+)|(WWW.\S+). The existence of the URL for each message, 𝑠𝑖 , ∈ 𝕊 is checked. If it
does not exist, the message is passed to the content analysis phase. Otherwise, the URL will be
extracted and inspected by the Google search engine and VirusTotal API. Algorithm 1 clarifies
the URL inspection phase.
The first inspection of the URL is performed by the Google engine. To validate the URL,
the domain name of the URL is extracted. In addition, the Natural Language Tool Kit (NLTK)
is used to extract all nouns in a message using a text blob. The extracted nouns and domains are
checked by the Google engine. The results of the top five Google searches are selected and
compared to the extracted domain name and the nouns. The second inspection is performed by
the VirusTotal API, which analyses the behaviour of URLs in 𝑠𝑖 ,. VirusTotal is a web service
that analyzes URLs and files to detect suspicious or malicious content. VirusTotal detects
malicious URLs and returns whether the URLs are malicious by comparing the extracted URLs
with URL databases stored by antivirus companies such as Bitdefender and Kaspersky. If the
URL is not found in the top Google search engine or is not declared malicious by VirusTotal,
the message is considered smishing. Otherwise, the message is passed to the next phase, content
analysis.
2: Extract the domain name from the URL and save it as 𝐷𝑜𝑚𝑛𝑎𝑚𝑒
5382
Mahmood and Hameed Iraqi Journal of Science, 2023, Vol. 64, No. 10, pp: 5376- 5391
4.2.1 Preprocessing
The first important step in SMS content analysis is the preprocessing to prepare the message
for analysis. Preprocessing involves the following
1. Tokens identification: the message is divided into tokens, each of which is identified by a
delimited space.
2. Stopwords exclusion: the stopwords are removed from the set of tokens identified in the
previous step, and a list of keywords is generated. In addition, all punctuation is removed
3. Stem generation: the tokens are then stemmed to identify their origin to increase the
frequency of the words. For example, the words (studying and studied) are converted to the
word study.
4. Currency symbols, numbers, phone numbers, email IDs, and URLs are converted to specific
words, as shown in Table 2, that can be processed effectively by feature extraction and increase
their weights in the messages.
5383
Mahmood and Hameed Iraqi Journal of Science, 2023, Vol. 64, No. 10, pp: 5376- 5391
5384
Mahmood and Hameed Iraqi Journal of Science, 2023, Vol. 64, No. 10, pp: 5376- 5391
Machine learning algorithms have been extensively studied in SMS classification. Four
well-known classification algorithms are used in this paper for detecting smishing. SVM, RF,
AdaBoost, and XGBoost. Algorithm 3 demonstrates the process of classification of SMS.
4. Experimental results
In this paper, we used the SMS spam collection dataset from the UCI machine learning
repository [29]. This dataset contained 5772 messages, of which 4825 were classified as "ham"
(legal SMS) and 747 as spam. In addition, Pinterest's 120 phishing SMS were employed [30].
Since the smishing dataset isn't published, Pinterest's smishing images are converted to text,
and all smishing messages are extracted from the SMS spam collection dataset to produce a
dataset consisting of 867 smishing and 4825 ham. Stratified 3 cross-validation is used to
evaluate the proposed model. Here, the dataset is split into three folds, each fold having an equal
proportion of messages with a particular label. One-fold acts as a testing set and the other 2-
fold acts as a training set. The iteration continues until all folds are used as the testing set.
Furthermore, the Accuracy (Acc), precision (P), Detection Rate (DR), and F1-score
measures were used to evaluate the proposed smishing model's performance. The experiments
were carried out on a PC with an Intel Core 7 Duo 2.90 GHz processor, 8 GB RAM, and a 64-
bit processor operating system Microsoft Windows 10. PYTHON 3.9 by Charm was used as
5385
Mahmood and Hameed Iraqi Journal of Science, 2023, Vol. 64, No. 10, pp: 5376- 5391
Finally, the number of trees, the maximum depth of a tree, and the learning rate were set to
5000, 5, and 0.01, respectively.
A comparison of the impact of combining Google engine with VirusTotal for URL
inspection versus Google engine used in [14] and VirusTotal used in [17] is shown in Table 3.
When two techniques are combined to detect the maliciousness of URLs, the inspection
operation is improved by an increase in the number of detected smishing messages. This reflects
the beneficial effect of smishing detection through the collaboration of the Google engine and
VirusTotal because Google engine detects smishing messages that VirusTotal cannot detect and
vice versa.
Table 3: The Number of Smishing Messages Detected by Google Engine, VirusTotal, and the
Proposed URL Inspection
URL inspection technique No. of detected smishing messages
Google engine in [14] 140
Virus Total in [17] 145
proposed URL inspection 156
To demonstrate the effectiveness of using UTF-IDF during the feature extraction process, a
comparison has been made between the accuracy obtained using UTF-IDF, which is dependent
on splitting the dataset into two sets: smishing and ham, and that obtained using standard TF-
IDF, which operates on the entire set. Figures 2–4 depict the accuracy results of UTF-IDF
against TF-IDF for a unigram, a bigram, and a combination of a unigram and a bigram.
In most cases, using UTF-IDF gives better accuracy than using TF-IDF. The reason for this
is that when the data is divided by message type and the frequency of each term is calculated,
the importance of the features is preserved relative to the type of message, and the weight of
the features is determined by what is contained in the dataset based on the label. Furthermore,
the results show that chi-square selection feature selection method has a positive impact on the
performance of the classifier algorithms.
5386
Mahmood and Hameed Iraqi Journal of Science, 2023, Vol. 64, No. 10, pp: 5376- 5391
Accuracy %
0.9449630.944787
classifier
UTF-IDF TF-IDF
0.9852290.985053
accuracy %
UTF-IDF TF-IDF
Figure 4: The Accuracy Results of Combination of Unigram and Bigram UTF-IDF Against
TF-IDF.
To confirm the results of the experiment, the results of the proposed model are compared
with previous research in [15] as reported in Table 4. The results reveal that the proposed
method outperforms [15] in all measures. In another comparison, the proposed model can be
assessed by the number of features, which is less than [15], but outperforms [15]. This reflects
that the proposed smishing model has a higher degree of discrimination between smishing and
ham. This is because the extracted features of the proposed smishing have a higher capability
than [15] to distinguish smishing from ham. As a result, we conclude that the proposed model
can effectively detect phishing SMS.
The proposed smishing detection model can be evaluated further by plotting the receiver
operating characteristic (ROC) curve and calculating the Area under the ROC Curve (AUC)
that measures the degree of distinction. Fig. 5 depicts the ROC curve and AUC of SVM and
XGBoost. The reason for choosing SVM and XGBoost is that SVM's performance in [15] and
in the proposed smishing detection was the best, while XGBoost's performance was the worst.
The figures clearly show that the proposed smishing model has a higher degree of
discrimination between smishing and ham the AUC of SVM in the proposed smishing detection
(equals 0.9907), whereas the AUC of SVM in [15] was equal to 0.9894. Furthermore, the AUC
of XGBoost in the proposed smishing detection equaled 0.9836, whereas the AUC of XGBoost
5387
Mahmood and Hameed Iraqi Journal of Science, 2023, Vol. 64, No. 10, pp: 5376- 5391
in [15] was equal to 0.9773. This is due to the extracted features of the proposed smishing model
having more significant strength to distinguish smishing from ham than [15].
(a) (b)
Figure 5: The ROC Curve Result. (a) SVM ROC of Both [15] and the Proposed Detection
Model, (b) XGBoost ROC of both [15] and the Proposed Detection Model
5. Conclusion
Smartphones’ popularity and their consistent connection to the World Wide Web make
devices vulnerable to smishing assault, which is a serious attack on mobile devices. This paper
introduces a security model that combines different analysis methods to detect malicious
content in SMS. This model consists of investigating malicious URLs and analyzing SMS
content. Google search engine was used with VirusTotal to verify URLs and determine their
malicious intent. It performs a more effective role in inspecting URLs than the Google search
engine alone and VirusTotal alone. The crucial part of content analysis is to separate smishing
from ham messages. This is accomplished by extracting the essential features and selecting the
relevant ones. Four machine learning algorithms were used in this paper, SVM, RF, AdaBoost,
and XGBoost. SVM is superior to other algorithms with an accuracy of 0.985229 due to its
productivity in high dimensional. Furthermore, the proposed model outperforms the existing
work in the field.
For future work, a mobile application for detecting smishing and protecting a smartphone
can be developed. In addition, the number of smishing messages is less than the number of ham
messages, resulting in an unbalanced class problem, which can be solved by either acquiring
5388
Mahmood and Hameed Iraqi Journal of Science, 2023, Vol. 64, No. 10, pp: 5376- 5391
more smishing messages or employing some other techniques. Furthermore, feature extraction
is a crucial component in detecting smishing and deep learning can be an option for this purpose.
Conflicts of Interest
The authors declare no conflict of interest
References
[1] I. Qabajeh, F. Thabtah, and F. Chiclana, “A recent review of conventional vs. automated
cybersecurity anti-phishing techniques,” Comput. Sci. Rev., vol. 29, pp. 44–55, 2018, doi:
10.1016/j.cosrev.2018.05.003.
[2] S. O. Folorunso, F. E. Ayo, K. K. A. Abdullah, and P. I. Ogunyinka, “Hybrid vs ensemble
classification models for phishing websites,” Iraqi J. Sci., vol. 61, no. 12, pp. 3387–3396, 2020,
doi: 10.24996/ijs.2020.61.12.27.
[3] Statista ,” Number of Smartphone Users from 2016 to 2026 “2022. [Online]. Available:
https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/].
[acceced 23 feb 2022]
[4] S. M. Hameed and M. B. Mohammed, “Spam Filtering Approach based on Weighted Version
of Possibilistic c-Means,” Iraqi J. Sci., vol. 58, no. 2C, pp. 1112–1127, 2017, doi:
10.24996/ijs.2017.58.2c.15.
[5] S. M. Hameed and Z. H. Ali, “SMS Spam Detection Based on Fuzzy Rules and Binary Particle
Swarm Optimization,” Int. J. Intell. Eng. Syst., vol. 14, no. 2, pp. 314–322, 2021, doi:
10.22266/ijies2021.0430.28.
[6] S. Mishra and D. Soni, “A Content-Based Approach for Detecting Smishing in Mobile
Environment,” in SSRN Electronic Journal, pp. 986–993, 2019. doi: 10.2139/ssrn.3356256.
[7] J. W. Joo, S. Y. Moon, S. Singh, and J. H. Park, “S-Detector: an enhanced security model for
detecting Smishing attack for mobile computing,” Telecommun. Syst., vol. 66, no. 1, pp. 29–
38, 2017, doi: 10.1007/s11235-016-0269-9.
[8] G. Sonowal and K. S. Kuppusamy, “SMIDCA: An anti-smishing model with machine learning
approach,” Comput. J., vol. 61, no. 8, pp. 1143–1157, 2018, doi: 10.1093/comjnl/bxy039.
[9] A. K. Jain and B. B. Gupta, “Rule-Based Framework for Detection of Smishing Messages in
Mobile Environment,” in Procedia Computer Science, vol. 125, pp. 617–623, 2018, doi:
10.1016/j.procs.2017.12.079.
[10] D. Goel and A. K. Jain, Smishing-classifier: A novel framework for detection of smishing attack
in mobile environment, vol. 828. Springer Singapore, 2018. doi: 10.1007/978-981-10-8660-
1_38.
[11] A. K. Jain and B. B. Gupta, “Feature based approach for detection of smishing messages in the
mobile environment,” J. Inf. Technol. Res., vol. 12, no. 2, pp. 17–35, 2019, doi:
10.4018/JITR.2019040102.
[12] A. K. Jain, S. K. Yadav, and N. Choudhary, “A novel approach to detect spam and smishing
SMS using machine learning techniques,” Int. J. E-Services Mob. Appl., vol. 12, no. 1, pp. 21–
38, 2020, doi: 10.4018/IJESMA.2020010102.
[13] G. Sonowal, “Detecting Phishing SMS Based on Multiple Correlation Algorithms,” SN
Comput. Sci., vol. 1, no. 6, pp. 1–9, 2020, doi: 10.1007/s42979-020-00377-8.
[14] S. Mishra and D. Soni, “DSmishSMS-A System to Detect Smishing SMS,” Neural Comput.
Appl., vol.45 , 2021, doi: 10.1007/s00521-021-06305-y.
[15] R. E. Ulfath, I. H. Sarker, M. J. M. Chowdhury, and M. Hammoudeh, “Detecting Smishing
Attacks Using Feature Extraction and Classification Techniques,” Lect. Notes Data Eng.
Commun. Technol., vol. 95, pp. 677–689, 2022, doi: 10.1007/978-981-16-6636-0_51.
[16] S. S. Shravasti, “Smishing Detection: Using Artificial Intelligence,” Int. J. Res. Appl. Sci. Eng.
Technol., vol. 9, no. 8, pp. 2218–2224, 2021, doi: 10.22214/ijraset.2021.37737.
5389
Mahmood and Hameed Iraqi Journal of Science, 2023, Vol. 64, No. 10, pp: 5376- 5391
[17] A. Ghourabi, “SM-Detector: A security model based on BERT to detect SMiShing messages in
mobile environments,” Concurr. Comput. Pract. Exp., vol. 33, no. 24, pp. 1–15, 2021, doi:
10.1002/cpe.6452.
[18] Jain, A.K., Gupta, B.B. and Kaur, K. (2022) ‘A content and URL analysis ‐ based efficient
approach to detect smishing SMS in intelligent systems’, (July), pp. 1–25.
doi:10.1002/int.23035.
[19] A. K. Uysal and S. Gunal, “A novel probabilistic feature selection method for text
classification,” Knowledge-Based Syst., vol. 36, pp. 226–235, 2012, doi:
10.1016/j.knosys.2012.06.005.
[20] I. El Naqa and M. J. Murphy, “Machine Learning in Radiation Oncology,” in Machine Learning
in Radiation Oncology, 2015, pp. 3–11. doi: 10.1007/978-3-319-18305-3.
[21] Ayon Dey, “Machine Learning Algorithms: A Review,” Int. J. Comput. Sci. Inf. Technol., vol.
7, no. 3, pp. 1174–1179, 2016, doi: 10.21275/ART20203995.
[22] Y. Yao et al., “K-SVM: An effective SVM algorithm based on K-means clustering,” J. Comput.,
vol. 8, no. 10, pp. 2632–2639, 2013, doi: 10.4304/jcp.8.10.2632-2639.
[23] Y. Wang, F. Zhang, and L. Chen, “An approach to incremental SVM learning algorithm,” Proc.
- ISECS Int. Colloq. Comput. Commun. Control. Manag. CCCM 2008, vol. 1, no. 1, pp. 352–
354, 2008, doi: 10.1109/CCCM.2008.163.
[24] J. Ali, R. Khan, N. Ahmad, and I. Maqsood, “Random forests and decision trees,” IJCSI Int. J.
Comput. Sci. Issues, vol. 9, no. 5, pp. 272–278, 2012.
[25] Z. Jin, J. Shang, Q. Zhu, C. Ling, W. Xie, and B. Qiang, “RFRSF: Employee Turnover
Prediction Based on Random Forests and Survival Analysis,” Lect. Notes Comput. Sci.
(including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12343 LNCS, pp.
503–515, 2020, doi: 10.1007/978-3-030-62008-0_35.
[26] R. Wang, “AdaBoost for Feature Selection, Classification and Its Relation with SVM, A
Review,” in Physics Procedia, vol. 25, pp. 800–807, 2012. doi: 10.1016/j.phpro.2012.03.160.
[27] C. Tu, H. Liu, and B. Xu, “AdaBoost typical Algorithm and its application research,” in
MATEC Web of Conferences, vol. 139, 2017. doi: 10.1051/matecconf/201713900222.
[28] B. Pan, “Application of XGBoost algorithm in hourly PM2.5 concentration prediction,” in IOP
Conference Series: Earth and Environmental Science, vol. 113, no.1, 2018. doi: 10.1088/1755-
1315/113/1/012127.
[29] T. A. Almeida, J. M. G. Hidalgo, and A. Yamakami, “Contributions to the study of SMS spam
filtering: New collection and results,” DocEng 2011 - Proc. 2011 ACM Symp. Doc. Eng., pp.
259–262, 2011, doi: 10.1145/2034691.2034742.
[30] Pinterest,”smishing data set”,20 Nov 2018 [Online]. Available:
https://in.pinterest.com/seceduau/smishing-dataset/[ Accecced ]15 jan 2021.
5390