Classification of Shopify App User Reviews Using Novel Multi Text Features
Classification of Shopify App User Reviews Using Novel Multi Text Features
Corresponding authors: Arif Mehmood ([email protected]) and Gyu Sang Choi ([email protected])
This work was supported in part by the Ministry of Trade, Industry and Energy (MOTIE, South Korea) through the Industrial Technology
Innovation Program under Grant 10063130, and in part by the National Research Foundation of Korea (NRF) Grant funded by the Korean
Government (MSIT) under Grant NRF-2019R1A2C1006159.
ABSTRACT App stores usually allow users to give reviews and ratings that are used by developers to resolve
issues and make plans for their apps. In this way, these app stores collect large amounts of data for analysis.
However, there are several challenges that must first be addressed, related to redundancy and the volume
of data, by using machine learning. This study performs experiments on a dataset that contains reviews for
Shopify apps. To overcome the aforementioned limitations, we first categorize user reviews into two groups,
i.e., happy and unhappy, and then perform preprocessing on the reviews to clean the data. At a later stage,
several feature engineering techniques, such as bag-of-words, term frequency-inverse document frequency
(TF-IDF), and chi-square (Chi2), are used singly and in combination to preserve meaningful information.
Finally, the random forest, AdaBoost classifier, and logistic regression models are used to classify the reviews
as happy or unhappy. The performance of our proposed pipeline was evaluated using average accuracy,
precision, recall, and f1 score. The experiments reveal that a combination of features can improve machine
learning models performance and in this study, logistic regression outperforms the others and achieves an
83% true acceptance rate when combined with TF-IDF and Chi2.
INDEX TERMS Feature engineering, feature extraction, feature selection, machine learning, review
classification, text mining.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
30234 VOLUME 8, 2020
F. Rustam et al.: Classification of Shopify App User Reviews Using Novel Multi Text Features
machine learning algorithms, such as naïve Bayes (NB), methods used in this study. Section IV contains the proposed
random forest (RF), decision tree (DT), support vector methodology, and Section V contains the results and discus-
machine (SVM), and logistic regression (LR). As per pre- sion. Finally, section VI concludes the paper with possible
vious work, the researchers tried to solve different prob- directions for future research.
lems during app review analysis. This study solved the clas-
sification problems of Shopify app reviews on the basis II. RELATED WORK
of the ratings given by app users and performed a com- As mentioned above, data classification is an area explored
parative analysis between tree-based ensemble and linear by many data scientists. Researchers have done much work in
models. the text classification domain, using different approaches and
More formally, machine learning algorithms take the users’ introducing some new techniques in this field. In this section,
reviews as input and then perform analysis on these reviews we discuss previous work on app review classification and
to predict whether the users are happy or not. This work analysis.
does not investigate other types of information, such as user The study [9] works on app review classification using
properties, app names, and app descriptions. We intentionally ensemble algorithms and techniques. The dataset used in the
limited the inputs (i.e., user reviews and rating scores) to keep study was previously examined in [3], the dataset contains
the problem definition simple. We used a dataset obtained reviews from Apple’s app store and the Google Play app store.
from Kaggle, on which preprocessing using the natural lan- In the study [9], the authors used NB, SVM, LR, and neu-
guage toolkit (NLTK) [7] has been done to clean the reviews. ral network (NN) in various combinations for classification.
Further preprocessing steps for cleaning the reviews included They built three ensemble algorithms A, B, and C. In ensem-
tokenization, punctuation removal, lower-case conversion, ble A, four classifiers, NB, SVM, LR, and NN, were grouped
removal of numeric values, and stopword removal. Finally, for final prediction; in ensemble B, three classifiers, SVM,
the stemming technique was used to get the root-form of each LR, and NN, were grouped, and in ensemble C, the two clas-
feature in the reviews. Rating scores from users (1 to 5) were sifiers NB and SVM were grouped. The best performers from
used to create two classes, i.e., happy and unhappy, where the these individual and ensembles algorithms were LR and NN.
users who gave rating scores of 3 or above were assigned as This study also used ensemble models, such as RF and AC,
happy, and the rest are unhappy. More details regarding this which work with numbers of base learners (decision trees) to
phenomenon can be found in section IV(B). make final predictions.
After preprocessing, two different text feature extraction In another research [4], text analysis was performed for
techniques, bag-of-words (BoW) and term frequency-inverse mobile app feature requests. They designed MARA (mobile
document frequency (TF-IDF), were deployed to extract app review analyzer), a prototype for automatic retrieval of
the high-level features. Later, the chi-square (Chi2) feature mobile app feature requests from online reviews. MARA
selection technique was used to select the most important takes review content as input for feature request mining.
and less redundant features to fit several machine learning The feature request mining algorithm uses a set of linguistic
models. rules, which are defined for supporting the identification of
Prior to fitting the model, the data must be split into two sentences that indicate such requests. The linear discriminant
parts, i.e., training and testing, in a 70% to 30% ratio. Finally, analyzer model was used to identify topics that can be asso-
several machine learning classifiers, i.e., AdaBoost classifier ciated with these requests in user reviews. They used true
(AC), LR, and RF, were used to classify the reviews as positive (TP), false positive (FP), true negative (TN), false
happy or unhappy. RF and AC are both tree-based ensemble negative (FN), precision, recall, and Matthews correlation
models, while LR is a statistical method to solve classification coefficient as evaluation metrics to check the accuracy of the
problems [8]. For this study, accuracy, precision, recall, and algorithm.
score metrics are used as performance evaluation metrics. The Researchers perform analysis on app reviews to facilitate
key points of this study are as follows: app developers in finding out whether their customers are
• Categorization of happy and unhappy users on the basis happy are not, which is also a goal of this study. In study [10],
of reviews and ratings researchers tried to help mobile app developers by performing
• Preprocessing techniques to clean text reviews for effi- analysis on user reviews to categorize information that is
cient learning of models, i.e., stemming, stopwords important for app maintenance and evolution. For classifi-
removal techniques, convert to lower case, punctuation, cation purposes, they deduced a taxonomy of user review
and numeric value removal technique categories that are relevant to app maintenance. The authors
• Feature Engineering techniques, i.e., BoW, TF-IDF, merged three techniques, natural language processing, text
Chi2 analysis, and sentiment analysis.
• Machine learning models, i.e., RF, AC, and LR By merging these techniques, they achieved desirable
• Comparative analysis of the performance of learning results in terms of precision and recall (Precision Score 74%
models with respect to feature engineering techniques and Recall Score 73%). They also applied these techniques
The rest of this paper is organized as follows: Section II individually to classify user reviews. In another study [11],
presents related work. Section III describes the material and the authors tried to extract the values of comparison scores
of sentiment reviews using different feature extraction tech- TABLE 2. Sample of data from dataset.
niques, such as word2vec, word2doc, and TF-IDF, with SVM,
NB, and decision tree algorithms. In study [11], the authors
used grid search algorithms for parameter optimization
of machine learning algorithms and feature extraction
methods.
LR performs significantly better in the case of classifica-
tion, but LR is usually preferred by researchers when there
is a binary classification problem. Study [12] used LR for TABLE 3. Number of samples corresponding to each rating score.
N
Wi,j = TFi,j ( ) (1)
Df ,t
3) CHI2
Chi2 is the most common feature selection method, and it is
mostly used on text data [21]. In feature selection, we use
it to check whether the occurrence of a specific term and the
occurrence of a specific class are independent. More formally,
forgiven a document D, we estimate the following quantity
for each term and rank them by their score. Chi2 finds this
score using equation 2:
X X (Ne e − Ee e )2
FIGURE 4. After extracting 2552 examples from each rating score. X 2 (D, t, c) = t c t c
(2)
Eet ec
et ∈{0,1} ec ∈{0,1}
TABLE 4. Machine learning algorithms and their hyperparameters. during the training of the model. By using these two hyper-
parameters, we achieved good results with RF in this study.
where,
• e is the natural algorithm base (also known as Euler
Number).
• vo is the x-value of the sigmoid midpoint.
• L is the curve’s maximum value.
• m is the steepness of the curve.
For values of v in the domain of real numbers from −∞
to +∞, the S-curve of logistic function will be obtained,
with the graph of F approaching L as v approaches +∞ and
approaching zero as x approaches −∞. This study used the
liblinear algorithm for optimization because it works well
on small datasets, whereas ‘‘sag’’ and ‘‘saga’’ are faster for
large ones. The second parameter that was used in this study
with AC is ‘‘multi_class,’’ and we used it with the ‘‘ovr’’
value because it is good for binary classification. The third
parameter is ‘‘C.’’ This is the inverse regularization parameter
that holds the strength modification of regularization by being
inversely positioned to the Lambda regulator and reduces the
chance of overfitting the model [39]. LR models were used in
this study because LR is better for binary classification and
also effective for categorizing text [3], [40].
FIGURE 5. Methodology diagram: Green color represent the data flow
IV. METHODOLOGY while light blue color represents the techniques and methods.
In this section, we formulate the problem and the assumptions
and then describe the method and details of the techniques we In the next step of the experiment, review texts are cleaned
used to solve the user’s classification problem. by performing preprocessing steps, applying tokenization to
reviews, and then numeric values are removed from reviews.
A. PROBLEM STATEMENT Usually, numeric digits have no influence on the meaning
Assume that a company launches an app that helps its cus- of the text. In the next step of preprocessing, convert letters
tomers over the internet. Customers use this app and face to lower case and remove punctuation from text reviews,
some issues related to the app, which makes them uncom- because punctuation is not valuable for text analysis [12].
fortable. Customers want the company to solve these issues. Sentences might be more readable because of punctuation,
For this, the company gives the option for its customers to but it is difficult for a machine to differentiate punctua-
give reviews about the services or about any issues related tion from other characters. For that reason, punctuation was
to the app. Many customers give their reviews about the removed from the text during preprocessing. Then a stem-
company’s products or services. Then the company performs ming technique was applied to the reviews to get the root form
an analysis of these reviews to find the good things and the of each word using the PorterStemmer library [41]. At the end
bad things and tries to determine whether the customer is of preprocessing, stopwords are removed from text reviews
happy or not, which is very helpful for their business strategy. because they create confusion in text analysis. Table 6 shows
In other words, the problem is to predict whether the customer a sample data extract from the prepared dataset, and
is satisfied or not. As described in section I, to solve this Table 7 shows the results after preprocessing the sample data.
problem, we use text features of reviews and rating scores After preprocessing, we split the dataset into two subsets
as input. for training and testing. We divided the dataset in a ratio
of 70% to 30%. For training, use 70% of the data, and
for testing, use 30%. Feature engineering techniques (see
B. PROPOSED METHODOLOGY
section III(C)) were performed on both training and testing
This study uses different techniques to solve classification
problems, as shown in Figure 5. TABLE 5. Number of samples corresponding to each rating score.
Figure 5 illustrates the steps for solving the user’s classi-
fication problem. First, data goes through the preprocessing
phase. As discussed in section III, that dataset has ratings
from 1 to 5 that correspond to each review. The study places
the ratings into two classes, happy and unhappy. Ratings
equal to or greater than 3 are assigned to the happy class, and
ratings less than 3 are assigned to the unhappy class, as shown
in Table 5.
TABLE 6. Prepared dataset sample after conversion of rating into target C. EVALUATION CRITERIA
classes.
After all these steps, we come to the prediction phase. This
study used several evaluation metrics, which are accuracy,
f1 score, recall, and precision. These evaluation parameters
are used to evaluate machine learning models [43]. This study
also used confusion matrices to evaluate the performance
algorithms; a confusion matrix is a table that is mostly used
to describe the performance of a classifier on test data. It is
TABLE 7. Preprocessing of sample reviews. also known as an error matrix that allows visualization of the
performance of an algorithm.
1) ACCURACY
The accuracy score used to measure prediction correctness
for labels or target classes. This score’s highest value is 1,
and the lowest value is 0.
Number of correct predictions
Accuracy = (7)
Total number of predictions
For binary classification, accuracy can also be calculated in
sets to select and extract and important features from text terms of positives and negatives as follows:
reviews. The BoW and TF-IDF techniques, which are com-
monly used in text classification where the frequency of TP + TN
Accuracy = (8)
each word is used as a feature for training a classifier [42], TP + TN + FP + FN
were used for feature extraction. Three reviews were used as where
sample data (Table 8) to apply BoW and TF-IDF techniques,
• True Positives (TP): the model predicted happy (the
and the results are shown in Table 9 and 10 below.
user is happy), and the actual value is also happy.
TABLE 8. Sample of reviews.
• True Negatives (TN): the model predicted unhappy (the
user is not happy), and the actual value is also unhappy.
• False Positives (FP): the model predicted unhappy, but
the actual value is happy. (Also known as a ‘‘Type I
error.’’).
• False Negatives (FN): the model predicted happy, but
the actual value is unhappy. (Also known as a ‘‘Type II
error.’’).
TABLE 9. Results of BoW technique on preprocessed sample data.
2) RECALL
Recall is the completeness of our classifiers. Recall is the
number of true positives divided by the number of true pos-
itives plus the number of false negatives. The highest value
is 1, and the lowest value is 0.
TABLE 10. Results of TF-IDF technique on preprocessed sample data.
TP
Recall = (9)
TP + FN
3) PRECISION
Precision is the exactness of our classifiers. Precision is
Table 9 shows the frequency of each feature in the sample the number of true positives divided by the number of true
data. Table 10 shows the weight of each feature in the sample positives plus false positives. The highest value is 1, and the
data. The Chi2 feature selection technique was applied to the lowest value is 0.
results of BoW and TF-IDF to select important features from TP
the data. After feature engineering, machine learning models Precision = (10)
TP + FP
train using important features that were extracted by feature
engineering techniques. Machine learning models tune with
different hyperparameters as mentioned in Table 4. After 4) F1 SCORE
model training, test data was passed to the trained models to F1 score conveys the balance between the precision and the
evaluate the performance of the learning model. recall; in other words, the f1 score is the harmonic mean
between precision and recall. Like the other scores, f1 has the
same range of values from 1 to 0.
Precsicion ∗ Recall
F1 Score = 2 ∗ (11)
Precsicion + Recal
This study used the parameters mentioned above to evalu-
ate the performance of all algorithms. We compare algorithm
accuracies and propose the best performer on the Shopify
data.
[3] E. Guzman and W. Maalej, ‘‘How do users like this feature? A fine grained
sentiment analysis of app reviews,’’ in Proc. IEEE 22nd Int. Requirements
Eng. Conf. (RE), Aug. 2014, pp. 153–162.
[4] C. Iacob and R. Harrison, ‘‘Retrieving and analyzing mobile apps feature
requests from online reviews,’’ in Proc. 10th Work. Conf. Mining Softw.
Repositories (MSR), May 2013, pp. 41–44.
[5] W. Maalej and H. Nabil, ‘‘Bug report, feature request, or simply praise?
On automatically classifying app reviews,’’ in Proc. IEEE 23rd Int.
Requirements Eng. Conf. (RE), Aug. 2015, pp. 116–125.
[6] T. Pranckevičius and V. Marcinkevičius, ‘‘Comparison of naive Bayes, ran-
dom forest, decision tree, support vector machines, and logistic regression
classifiers for text reviews classification,’’ Baltic J. Mod. Comput., vol. 5,
no. 2, pp. 221–232, 2017.
[7] E. Loper and S. Bird, ‘‘NLTK: The natural language toolkit,’’ 2002,
arXiv:cs/0205028. [Online]. Available: https://arxiv.org/abs/cs/0205028
[8] J.-H. Xue and D. M. Titterington, ‘‘Comment on ‘on discriminative vs. gen-
erative classifiers: A comparison of logistic regression and naive Bayes,’’’
Neural Process. Lett., vol. 28, no. 3, pp. 169–187, Dec. 2008.
[9] E. Guzman, M. El-Haliby, and B. Bruegge, ‘‘Ensemble methods for
app review classification: An approach for software evolution (N),’’ in
Proc. 30th IEEE/ACM Int. Conf. Automat. Softw. Eng. (ASE), Nov. 2015,
FIGURE 12. Comparison between f1 scores of machine learning models pp. 771–776.
using all feature engineering techniques. [10] S. Panichella, A. Di Sorbo, E. Guzman, C. A. Visaggio, G. Canfora, and
H. C. Gall, ‘‘How can I improve my app? Classifying user reviews for
software maintenance and evolution,’’ in Proc. IEEE Int. Conf. Softw.
these techniques are helpful in improving the accuracy of Maintenance Evol. (ICSME), Sep. 2015, pp. 281–290.
learning models. All learning models achieve their desired [11] S. M. Isa, R. Suwandi, and Y. P. Andrean, Optimizing the Hyperparameter
results because feature selection techniques give us important of Feature Extraction and Machine Learning Classification Algorithms.
London, U.K.: The Science and Information Organization, 2019.
features from the extracted features, which is an effective way [12] F. Rustam, I. Ashraf, A. Mehmood, S. Ullah, and G. Choi, ‘‘Tweets clas-
to increase the accuracy of learning models. sification on the base of sentiments for US airline companies,’’ Entropy,
The performance of learning models without feature selec- vol. 21, no. 11, p. 1078, Nov. 2019.
[13] V. Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and
tion techniques (Chi2) and with simple feature extraction B. P. Feuston, ‘‘Random forest: A classification and regression tool for
techniques, such as BoW and TF-IDF, is also acceptable. As a compound classification and QSAR modeling,’’ J. Chem. Inf. Comput. Sci.,
comparison between learning models, LR wins the race in vol. 43, no. 6, pp. 1947–1958, Nov. 2003.
[14] F. F. Bocca and L. H. A. Rodrigues, ‘‘The effect of tuning, feature engi-
all situations to solve user’s classification problems. One can neering, and feature selection in data mining applied to rainfed sugar-
deploy the Wilcoxon test to further validate the performance cane yield modelling,’’ Comput. Electron. Agricult., vol. 128, pp. 67–76,
of our porposed model is statistically significant. Oct. 2016.
[15] J. Heaton, ‘‘An empirical analysis of feature engineering for predictive
modeling,’’ in Proc. SoutheastCon, Mar. 2016, pp. 1–6.
VI. CONCLUSION [16] S. C. Eshan and M. S. Hasan, ‘‘An application of machine learning to
detect abusive Bengali text,’’ in Proc. 20th Int. Conf. Comput. Inf. Technol.
This study, for the very first time to the best of our knowledge, (ICCIT), Dec. 2017, pp. 1–6.
exploits the use of different machine learning approaches [17] X. Hu, J. S. Downie, and A. F. Ehmann, ‘‘Lyric text mining in music mood
to solve user’s review classification problems based on classification,’’ Amer. Music, vol. 183, no. 5, 049, pp. 2–209, 2009.
[18] B. Yu, ‘‘An evaluation of text classification methods for literary study,’’
different feature engineering techniques, such as BoW, Literary Linguistic Comput., vol. 23, no. 3, pp. 327–343, Sep. 2008.
TF-IDF, and Chi2. The classifiers (RF, LR, and AC) were [19] S. Robertson, ‘‘Understanding inverse document frequency: On theoretical
trained on text reviews to predict the user’s review as being arguments for IDF,’’ J. Document., vol. 60, no. 5, pp. 503–520, Oct. 2004.
[20] W. Zhang, T. Yoshida, and X. Tang, ‘‘A comparative study of TF∗ IDF, LSI
happy or unhappy for Shopify apps only. The comparative and multi-words for text classification,’’ Expert Syst. Appl., vol. 38, no. 3,
analysis reveals that LR outperformed in the case of using pp. 2758–2765, Mar. 2011.
TF-IDF and Chi2 together. We end the conclusion by pointing [21] P. Meesad, P. Boonrawd, and V. Nuipian, ‘‘A chi-square-test for word
importance differentiation in text classification,’’ in Proc. Int. Conf. Inf.
out that the results and conclusion of our experiments are Electron. Eng., 2011, pp. 110–114.
based on a single dataset (the Shopify app dataset), which [22] S. Bird, ‘‘NLTK-Lite: Efficient scripting for natural language processing,’’
was never used before for classification purposes, and these in Proc. 4th Int. Conf. Natural Lang. Process. (ICON), 2005, pp. 11–18.
[23] S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, ‘‘Supervised machine learn-
algorithms have not yet been tested on other datasets. It is ing: A review of classification techniques,’’ Emerg. Artif. Intell. Appl.
possible that our results are specific to the dataset being used. Comput. Eng., vol. 160, pp. 3–24, 2007.
Our future work entails testing deep machine learning models [24] G. Biau and E. Scornet, ‘‘A random forest guided tour,’’ TEST, vol. 25,
no. 2, pp. 197–227, Jun. 2016.
on different text and categorical datasets for the purpose of [25] L. Breiman, ‘‘Bagging predictors,’’ Mach. Learn., vol. 24, no. 2,
user review classification. pp. 123–140, Aug. 1996.
[26] A. Liaw and M. Wiener, ‘‘Classification and regression by randomforest,’’
R News, vol. 2, no. 3, pp. 18–22, 2002.
REFERENCES [27] J. Benediktsson and P. Swain, ‘‘Consensus theoretic classification meth-
[1] S. R. Das and M. Y. Chen, ‘‘Yahoo! For Amazon: Sentiment extraction ods,’’ IEEE Trans. Syst., Man, Cybern., vol. 22, no. 4, pp. 688–704, 1992.
from small talk on the Web,’’ Manage. Sci., vol. 53, no. 9, pp. 1375–1388, [28] L. K. Hansen and P. Salamon, ‘‘Neural network ensembles,’’ IEEE Trans.
Sep. 2007. Pattern Anal. Mach. Intell., no. 10, pp. 993–1001, Oct. 1990.
[2] J. Chevalier and D. Mayzlin, ‘‘The effect of word of mouth on sales: [29] L. I. Kuncheva, ‘‘That elusive diversity in classifier ensembles,’’ in Proc.
Online book reviews,’’ Nat. Bureau Economic Res., Cambridge, MA, Iberian Conf. Pattern Recognit. Image Anal. Berlin, Germany: Springer,
USA, Tech. Rep. w10148, 2003. 2003, pp. 1126–1138.
[30] R. E. Schapire, ‘‘A brief introduction to boosting,’’ in Proc. IJCAI, vol. 99, MUHAMMAD AHMAD is currently an Assis-
1999, pp. 1401–1406. tant Professor with the Department of Com-
[31] Y. Freund and R. E. Schapire, ‘‘A decision-theoretic generalization of on- puter Engineering, Khwaja Fareed University
line learning and an application to boosting,’’ J. Comput. Syst. Sci., vol. 55, of Engineering and Information Technology,
no. 1, pp. 119–139, Aug. 1997. Pakistan. He is also associated with the Research
[32] D. S. Palmer, N. M. O’Boyle, R. C. Glen, and J. B. O. Mitchell, ‘‘Random Group, Advanced Image Processing Research Lab
forest models to predict aqueous solubility,’’ J. Chem. Inf. Model., vol. 47,
(AIPRL), the First Hyperspectral Imaging Lab,
no. 1, pp. 150–158, Jan. 2007.
[33] Y. Zhang, H. Zhang, J. Cai, and B. Yang, ‘‘A weighted voting classifier
Pakistan. He is also associated with the Univer-
based on differential evolution,’’ Abstract Appl. Anal., vol. 2014, 2014. sity of Messina, Messina, Italy, as a Research
[34] J. Belanich and L. E. Ortiz, ‘‘On the convergence properties of optimal Fellow. He has authored a number of research
Adaboost,’’ 2012, arXiv:1212.1108. https://arxiv.org/abs/1212.1108 articles in reputed journals and conferences. He is a Regular Reviewer of
[35] I. Sevilla-Noarbe and P. Etayo-Sotos, ‘‘Effect of training characteristics on Springer Nature journals, NCAA, the IEEE (TIE, TNNLS, TGRS, TIP,
object classification: An application using boosted decision trees,’’ Astron. GRSL, GRSM, JSTARS, TMC, the TRANSACTIONS ON MULTIMEDIA, ACCESS,
Comput., vol. 11, pp. 64–72, Jun. 2015. COMPUTERS, SENSORS, the TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND
[36] F. Elorrieta, S. Eyheramendy, A. Jordán, I. Dékány, M. Catelan, NETWORKING), MDPI Journals, Optik, Measurement Science and Technology,
R. Angeloni, J. Alonso-García, R. Contreras-Ramos, F. Gran, and IET Journals, and the Transactions on Internet and Information Systems. His
G. Hajdu, ‘‘A machine learned classifier for RR Lyrae in the VVV survey,’’ current research interests include machine learning, computer vision, remote
Astron. Astrophys., vol. 595, p. A82, Nov. 2016. sensing, hyperspectral imaging, and wearable computing.
[37] R. Zitlau, B. Hoyle, K. Paech, J. Weller, M. M. Rau, and S. Seitz, ‘‘Stacking
for machine learning redshifts applied to SDSS galaxies,’’ Monthly Notices
Roy. Astronomical Soc., vol. 460, no. 3, pp. 3152–3162, 2016.
[38] A. Mayr, H. Binder, O. Gefeller, and M. Schmid, ‘‘The evolution of
boosting algorithms,’’ Methods Inf. Med., vol. 53, no. 06, pp. 419–427,
2014. SALEEM ULLAH was born in Ahmedpur East,
[39] G. Grégoire, ‘‘Multiple linear regression,’’ in European Astronomical Soci- Pakistan, in 1983. He received the B.Sc. degree
ety, vol. 66. Les Ulis, France: EDP Sciences, 2014, pp. 45–72. in computer science from Islamia University
[40] F. Sebastiani, ‘‘Machine learning in automated text categorization,’’ ACM Bahawalpur, Pakistan, in 2003, the M.I.T. degree
Comput. Surv., vol. 34, no. 1, pp. 1–47, 2002. in computer science from Bahauddin Zakariya
[41] B. Issac and W. J. Jap, ‘‘Implementing spam detection using Bayesian University, Multan, in 2005, and the Ph.D. degree
and porter stemmer keyword stripping approaches,’’ in Proc. IEEE Region from Chongqing University, China, in 2012.
Conf. (TENCON), Jan. 2009, pp. 1–5. From 2006 to 2009, he worked as a Network/
[42] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, ‘‘Bag of tricks for IT Administrator in different companies. From
efficient text classification,’’ 2016, arXiv:1607.01759. [Online]. Available: August 2012 to February 2016, he worked as an
https://arxiv.org/abs/1607.01759
Assistant Professor with Islamia University Bahawalpur. He has been work-
[43] Y. J. Huang, R. Powers, and G. T. Montelione, ‘‘Protein NMR recall, pre-
ing as an Associate Professor with the Khwaja Fareed University of Engi-
cision, and F-measure scores (RPF scores): Structure quality assessment
measures based on information retrieval statistics,’’ J. Amer. Chem. Soc., neering and Information Technology, Rahim Yar Khan, since February 2016.
vol. 127, no. 6, pp. 1665–1674, 2005. He has almost 13 years of Industry experience in the field of IT. He is also
[44] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, ‘‘Squad: 100,000+ an active Researcher in the field of adhoc networks, congestion control, and
questions for machine comprehension of text,’’ 2016, arXiv:1606.05250. security.
[Online]. Available: https://arxiv.org/abs/1606.05250
[45] S. Mathanker, P. Weckler, T. Bowser, N. Wang, and N. Maness, ‘‘Adaboost
classifiers for pecan defect classification,’’ Comput. Electron. Agricult.,
vol. 77, no. 1, pp. 60–68, 2011.
[46] A. C. Tan and D. Gilbert, ‘‘Ensemble machine learning on gene expression DOST MUHAMMAD KHAN received the M.Sc.
data for cancer classification,’’ Appl. Bioinf., vol. 2, no. 3, pp. S75–83, degree (Hons.) in computer science from Bahaud-
2003. din Zakariya University (BZU), Multan, in 1990,
and the Ph.D. degree from the School of Innovative
FURQAN RUSTAM received the M.C.S. degree Technologies and Engineering (SITE), University
from the Department of Computer Science, of Technology, Mauritius (UTM), in 2013. He is
Islamia University of Bahawalpur, Pakistan, currently working as an Assistant Professor with
in October 2017. He is currently pursuing the the Department of Computer Science and IT, The
master’s degree in computer science with the Islamia University of Bahawalpur, Pakistan. His
Department of Computer Science, Khwaja Fareed areas of research are data mining and data mining
University of Engineering and Information Tech- techniques, multiagent system (MAS), object-oriented data base systems,
nology (KFUEIT), Rahim Yar Khan, Pakistan. He and formal methods in software engineering.
is also serving as a Research Assistant with the
Fareed Computing and Research Center, KFUEIT.
His recent research interests are related to data mining, mainly working
machine learning and deep learning-based IoT, and text mining tasks.
GYU SANG CHOI received the Ph.D. degree
ARIF MEHMOOD received the Ph.D. degree from the Department of Computer Science and
from the Department of Information and Com- Engineering, Pennsylvania State University, Uni-
munication Engineering, Yeungnam University, versity Park, PA, USA, in 2005. He was a Research
South Korea, in November 2017. He is cur- Staff Member with the Samsung Advanced Insti-
rently working as an Assistant Professor with the tute of Technology (SAIT) for Samsung Elec-
Department of Computer Science and IT, The tronics, from 2006 to 2009. Since 2009, he has
Islamia University of Bahawalpur, Pakistan. His been a Faculty Member with the Department of
recent research interests are related to data min- Information and Communication, Yeungnam Uni-
ing, mainly working on AI and deep learning- versity, South Korea. His research areas include
based text mining, and data science management non-volatile memory and storage systems.
technologies.