0% found this document useful (0 votes)
2 views34 pages

2020 - Medical Internet of Things Using Machine Learning

The paper evaluates various machine learning algorithms for lung cancer detection within the context of the Medical Internet of Things (IoT). It reviews approximately 65 studies on disease prediction using machine learning, identifying gaps and challenges in current methodologies. The research aims to enhance early detection of lung cancer through improved algorithmic approaches and data utilization, providing directions for future studies in this field.

Uploaded by

manojkumarece
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views34 pages

2020 - Medical Internet of Things Using Machine Learning

The paper evaluates various machine learning algorithms for lung cancer detection within the context of the Medical Internet of Things (IoT). It reviews approximately 65 studies on disease prediction using machine learning, identifying gaps and challenges in current methodologies. The research aims to enhance early detection of lung cancer through improved algorithmic approaches and data utilization, providing directions for future studies in this field.

Uploaded by

manojkumarece
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Journal of Management Analytics

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tjma20

Medical Internet of things using machine learning


algorithms for lung cancer detection

Kanchan Pradhan & Priyanka Chawla

To cite this article: Kanchan Pradhan & Priyanka Chawla (2020): Medical Internet of things using
machine learning algorithms for lung cancer detection, Journal of Management Analytics, DOI:
10.1080/23270012.2020.1811789

To link to this article: https://doi.org/10.1080/23270012.2020.1811789

Published online: 31 Aug 2020.

Submit your article to this journal

Article views: 2

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=tjma20
Journal of Management Analytics, 2020
https://doi.org/10.1080/23270012.2020.1811789

Medical Internet of things using machine learning algorithms for lung


cancer detection
Kanchan Pradhan and Priyanka Chawla *

Lovely Professional University, CSE, Phagwara, India


(Received 25 February 2020; revised 3 July 2020; accepted 14 August 2020)

This paper empirically evaluates the several machine learning algorithms adaptable
for lung cancer detection linked with IoT devices. In this work, a review of nearly 65
papers for predicting different diseases, using machine learning algorithms, has
been done. The analysis mainly focuses on various machine learning algorithms
used for detecting several diseases in order to search for a gap toward the future
improvement for detecting lung cancer in medical IoT. Each technique was
analyzed on each step, and the overall drawbacks are pointed out. In addition, it
also analyzes the type of data used for predicting the concerned disease, whether
it is the benchmark or manually collected data. Finally, research directions have
been identified and depicted based on the various existing methodologies. This
will be helpful for the upcoming researchers to detect the cancerous patients
accurately in early stages without any flaws.
Keywords: disease prediction; lung cancer; machine learning algorithms; internet of
things

1. Introduction
Over the past decades, an incessant development that pertains to the cancer research
has been offered to a high extent. Multiple research works have implemented numer-
ous models for the earlier recognition of cancer before suffering from signs (Zhong &
Song, 2019). By the invention of new models in clinical areas, huge cancer data are
gathered and are freely accessible by the medical research society. However, there is
one significant challenging task to physicians i.e. the disease should be predicted accu-
rately. The manifestation of a 20% significant decrease in death from lung cancer is
reported by USA NLST and corresponding resolution is given them. Medicare and
Medicaid Service Centers have paved the way for national lung cancer screening in
the USA to provide Medicare coverage for lung cancer screening (Al-Anni, Hou,
Abdu-aljabar, & Xiang, 2017). The IoT is “a global information society infrastructure
that enables sophisticated services by connecting (physical and virtual) things based on
present and developing communication and information technologies”. In view of its
full potential, IoT is one of the most important technological advances of the current
period.
One of the most significant ways to diminish the deaths due to lung cancer (Alah-
mari et al., 2018; Cirujeda et al., 2016; Emaminejad et al., 2016) is its earlier prediction

*Corresponding author. Email: [email protected]


© 2020 Antai College of Economics and Management, Shanghai Jiao Tong University
2 K. Pradhan and P. Chawla

(Alanni, Hou, Azzawi, & Xiang, 2019; Luo et al., 2019). Early detection needs an
accurate and steadfast diagnosis process, by which the surgeons are able to differen-
tiate a benign or malign cancer (Li, Xiang, et al., 2018; Ma, Wang, Zou, & Yan,
2017; Wu et al., 2019). In that case, pathological examinations and monitoring tests
were performed. In order to judge whether the lung cancer is present or not, screening
tests, consisting of smoking history, sputum examinations, physical tests, spiral CT
scans, and chest X-rays, will give doctors some early information (Al-Kadi &
Watson, 2008; Park, Lee, Weiss, & Motai, 2016). It is important to be aware of the
pathological staging of lung cancer because it can be utilized for predicting the diag-
nosis of a patient and can also allow specialists to provide an appropriate treatment.
Nevertheless, for determining the clinical stage of lung cancer (Hawkins et al., 2014;
Kumar, Sankar, Clausi, Taylor, & Wong, 2019), it generally consumes more time
and money for obtaining the report of pathology (Okada et al., 2012; Zamani,
Rezaeieh, & Abbosh, 2015).
The key intention of IoT is to make the surroundings smarter, by providing the
required data from historical or real basis and implement computational intelligence
automatically for taking smart decisions. Multiple types of research were reported in
the existing contributions and those are on the basis of various techniques have the
capacity to enable the early detection and prognosis (Arunkumar & Ramakrishnan,
2019; Pati, 2019). Data mining generally consists of many approaches like association
rule mining, NN, DT, etc. Every method evaluates the information in varied conducts
(Yu, Ni, Dan, & Xu, 2012; Zhang, Qi, et al., 2019). The information related to lung
cancer taken from IoT devices are utilized for knowing and managing difficult
environments, allowing great automation, more efficiency, accuracy, wealth gener-
ation, productivity, and better decision making (Das et al., 2019). In these environ-
ments, a significant challenge is the timely processing of huge amounts of data for
delivering highly steadfast and accurate observations and decisions so that IoT can
fulfill its promise.
The key contributions of the current survey are depicted below.

. To present a review on the development of different machine learning algorithms


for predicting the various types of diseases in order to conclude a decision
regarding the lung cancer prediction.
. To analyze the challenges of each machine learning algorithm, the environments
utilized by each contribution, and the utilized datasets that have the patient’s
health records related to the disease.
. To offer a systematic review on different machine learning algorithms for eval-
uating its capability and performance in predicting the lung cancer, and thus to
detect lung cancer with IoT integration.
. To suggest multiple integrative models to synthesize the existing research activi-
ties in the field of disease prediction, and gaps to overcome the existing challenges.

2. Literature works on various disease prediction using machine learning algorithms


2.1. Literal contributions on lung cancer detection
In 2009, Tan, Chen, and Xia (2009) have examined the viability of mixture of Ada-
boost with decision stumps, which is termed a weak classifier and discovered the
Journal of Management Analytics 3

evaluation to predict lung cancer at an early stage. In 2010, Kim, Koh, and Park
(2010) have proposed a novel framework, named DT, related to lung cancer for
finding the growth of lung cancer through occupational exposures. In 2014, Zięba,
Tomczak, Lubicz, and Świątek (2014) have proposed a boosted SVM model for resol-
ving the imbalanced data issues. The presented solution merged the advantages of
ensemble models for rough data together by cost-sensitive SVMs. Later, an oracle-
based technique was presented to extract decision rules from the boosted SVM. In
2015, Engchuan and Chan (2015) have suggested a pathway activity transformation
approach for multi-class data named AFS. The proposed model has high classification
power. In 2016, Azzawi, Hou, Xiang, and Alanni (2016) introduced the GEP tech-
nique for predicting the lung cancer from microarray data. Moreover, the suggested
model utilized two gene selection approaches for extracting the important lung
cancer genes, and significantly recommended various GEP-based prediction
approaches. In 2016, Petousis, Han, Aberle, and Bui (2016) modeled a group of
DBN and analyzed for producing the intuition into how longitudinal information
was utilized for assisting lung cancer monitoring decisions. In 2017, Lynch, Abdollahi
et al. (2017) suggested many supervised learning methods for the SEER database, for
categorizing lung cancer people regarding survival, consisting of Linear Regression,
GBM, SVM, DT, and custom ensemble. In 2019, Petousis et al. (2019) suggested a
new technique for learning the POMDP, which optimized the detection of lung
cancer by improving the specificity. With the help of Bayesian Network, the NLST
data were trained and inverse reinforcement learning was employed for finding the suc-
cessive function on the basis of decisions of experts. In 2019, ALzubi et al. (2019)
suggested an ensemble of WONN-MLB for lung cancer disease in big data. In the
feature selection phase, the required attributes were chosen using an integrated
Newton-Raphsons MLMR for reducing the classification time. Later, the Boosted
WONN Ensemble classification model was implemented for categorizing the patient
using the selected attributes that enhanced the accuracy and minimized the FPR.

2.2. Literal contributions on other diseases


In 2009, Akay (2009) developed a new SVM-based model merged with feature selec-
tion for diagnosing breast cancer. Çınar, Engin, Engin, and Ziya Ateşçi (2009) have
intended for a classifier-based expert model to detect the organ in limitation stage
for reaching notified decision making without biopsy by a few sorted-out features.
Moreover, the other thing was to examine the association among the smoking
factor, prostate cancer, and BMI. For classifying the relevant information, LM train-
ing models of ANNs, and polynomial, linear, and radial-based kernel functions related
to SVM, SCG, and BFGS were employed. Anand and Suganthan (2009) have
explored the multi-class classification related to microarray samples. In contrast, the
classification of two kinds of cancer from gene expression dataset, and multi-class
classification of more than two cancer types, was very tough. For maximizing the
usage of chosen genes, the class-wise optimized genes with respect to OVA-SVM
model were employed. From all the classifiers, the last prediction was done using
the probability scores. In order to estimate the probability from decision value, three
various classifiers were used. In 2009, Oztekin, Delen, and (James)Kong (2009)
aimed at enhancing the prediction results of merged heart–lung transplantation by
introducing an integrated data-mining approach. For developing machine learning
4 K. Pradhan and P. Chawla

algorithms and for extracting the significant predictive values, huge dataset was uti-
lized. Consolidated factors were created for introducing Cox-regression approaches
for heart–lung transplantation using machine learning methods such as DT, logistic
regression, and NN, the traditional predictive approaches, and common-sense-based
interaction variables. Tang, Jiang, Wu, Shen, and Yu (2009) introduced a new gene
programming on the basis of equivalence and the probability density functions on
every gene to the class of interest corresponding to others. In order to classify the
disease, LKT-SVM was employed. Moreover, in 2010, Barakat, Bradley, and
Barakat (2010) suggested many machine learning and data mining approaches for
diagnosing and prognosing diabetes. To diagnose diabetes, SVMs were utilized. More-
over, an extra-module was utilized to carry out the black box test of SVM under the
stable depiction of SVMs classification.
In 2016, Yin, Zeng, Chen, and Fan (2016) evaluated medical services and different
intelligent models used for IoT applications. On the contrary, our paper examines
health system from the point of view of empowering IoT-based frameworks in
medical system and businesses by employing new strategies. In 2014, Xu, He, and
Li (2014) improved the IoT businesses including key of empowering advancements,
big IoT applications in venture, and different application models and problems .
IoT enabled the development of best ventures very effectively. In 2014, Boyi Xu
et al. (2014) presented an IoT framework for emergency clinical administrative to
understand how to assemble IoT data. Experimental results showed that IoT is work-
able in a dispersed heterogeneous environment for data in a particular pattern and
saved in cloud. In 2018, Yang and Xu (2018) presented interdisciplinary research
investigation in IoT-empowered human services, including systems from software
engineering, designing, data science, and behavioral science. In 2017, Li, Xu, and
Zhao (2018) presented 4G and 5G system required to support the operation IoT secur-
ity. This paper audits the fresh look into the best of 5G IoT, key empowering inno-
vations, and primary research patterns and difficulties in 5G IoT. In 2019, Giuseppe
Aceto, Persico, and Pescapé (2020) provided a portrayal of principle advances and
standards comparable to Healthcare 4.0 and talked about their fundamental appli-
cation situations. Industry 4.0 technology benefit and novel cross-disciplinary pro-
blems also have been discussed. In 2010 Yuan, Li, Guan, and Xu (2010) proposed
an exact internet traffic order and techniques that ordered the internet traffic into
wide application classes agreeing to the system stream parameters acquired from
the parcel headers. An advanced list of capabilities is obtained by means of different
classifiers of choice strategies. In 2020, Akman, Karaman, and Kuzey (2020) presented
utilization of SVM (Support Vector Machine) and Neural Network in research to
encourage the effect of visa strategies on two-sided exchange, utilizing the fare infor-
mation from Turkey for 2000–2014. In 2019 Chi-Hsien and Nagasawa (2019), the
specialist made machine learning model to help limit those examination hindrances.
This examination broke down the Chinese extravagance utilization conduct, while
the Chinese contributed 33% of the worldwide extravagance advertisement in 2018
and paid as a development motor in the extravagance showcase. In 2019, Lu (2019)
built a broad study over the period 1961–2018 of AI and Deep Learning. The exam-
ination gives an important reference to scientists and professionals through the multi-
point precise investigation of AI from the hidden components to viable application,
from principal calculations to industrial accomplishments from the current status to
the future model.
Journal of Management Analytics 5

In 2017, Qi et al. (2017) discussed another transformation of the Internet named


Internet of Things. (IoT) is quickly making strides as another exploration theme in
numerous scholarly and modern controls, particularly in social insurance, strikingly
because of the quick multiplication of wearable gadgets and cell phone, a precise
survey on cutting-edge IoT-empowered PHS. In the work, the flow research of IoT
empowered with PHS, and key empowering advancements, has been discussed. In
2014, the authors Fan, Yin, Xu, Zeng, and Wu (2014) presented an ontology-based
mechanizing structure technique (ADM) for brilliant restoration frameworks in
IoT. Ontology helps PCs in further understanding of the side effects and clinical
assets which assists while making a recovery technique and reconfiguration of clinical
assets, as indicated by patients’ particular prerequisites very rapidly and consequently.
In the interim, IoT gives a viable stage to interconnect all the assets and gives quick
data communication. Fundamental examinations and clinical preliminaries showed
important data on the attainability, velocity, and adequacy.
In 2011, Choi, Yeo, Kwon, and Kim (2011) have introduced a machine learning
algorithm where the L1 penalty was applied to the SVM with a reject option and
named L1 SVM. A quick and reliable optimization technique to SVM was introduced
for evaluating the gene expression data. Chen, Yang, Liu, and Liu (2011) presented an
RS-SVM model to diagnose breast cancer. In RS-SVM, the RS reduction method was
used as a feature selection kit for eliminating the similar features and then the diagnos-
tic accuracy was enhanced using SVM. In 2011, Chen and Lin (2011) recognized the
genes and then utilized the genes for categorizing cancers using SVM or BPNN and
named this technique as SVST. Tong and Schierz (2011) aimed for implementing a
hybrid GANN approach, which has a special importance to feature selection and
functioned on unpre-processed microarray data. Moreover, the suggested model was
a hybrid approach, in which the fitness value of GA was relied on the sample
counts appropriately classified using a benchmark ANN. Lee et al. (2011) developing
a new prediction technique by clinical and genetic data, and compared misclassifi-
cation rates of various techniques. From the analysis, it has been verified that the
KNN model has provided the best performance. Capriotti and Altman (2011)
suggested a new machine learning technique for predicting the cancer-causing mis-
sense variants. A machine learning algorithm, SVM model, was trained using 3163
cancer-causing variants and the same count of neutral polymorphisms. Chen, Liu,
Yang, Liu, and Wang (2011) offered a hybrid algorithm called LFDA-SVM that incor-
porated a new feature extraction approach and a classification model for detecting
hepatitis. The developed LFDA was assumed as the feature extraction kit to reduce
the dimensions for further improvement, the accuracy of the principle SVM model
was employed. Mohabatkar, Beigi, and Esmaeili (2011) introduced an approach for
predicting the proteins by the features acquired from Chou’s pseudo-amino acid
decomposition theory using a robust algorithm called SVM. Sattlecker, Baker,
Stone, and Bessant (2011) developed an ensemble approach by a set of SVMs. In
addition, various ensemble models consisting of boosting, tree-based approaches,
and bagging were analyzed in FTIR dataset obtained from various kinds and
phases of breast cancer. Åström and Koker (2011) have employed NN for predicting
the Parkinson’s disease. By using new distinctive NNs, the probability of decision with
error was decreased. In order to make the last decision, the outcome of every NN was
analyzed with the rule-based approach. During training, the unknown information of
every NN was gathered and utilized in training data of the succeeding NN. Therefore,
6 K. Pradhan and P. Chawla

the developed NN extensively improved the prediction strength. Mohebian, Marateb,


Mansourian, AngelMañanas, and Mokarian (2017) suggested an approach for breast
cancer recurrence prediction. Clinicopathologic features of 579 breast cancer patients
were evaluated and the identified features were chosen by statistical feature selection
techniques. Later, the features were filtered using PSO as the inputs of the ensemble
learning approach. The performance of HPBCR was analyzed by the holdout and
4-fold cross-validation approach.
In 2012, Zhong, Chow, and He (2012) introduced a MLSVM, which arranged the
datasets into groups in a tree for providing the efficient patterns. For creating an effi-
cient global decision, a decision fusion method was utilized, which integrated the local
SVM decisions in various stages of the tree. Accordingly, MLSVM was able to control
complicated and conflicted data distributions in huge databases efficiently when com-
pared over single SVM-based and multiple SVM models. Anooj (2012) developed a
weight fuzzy rule-based CDSS to diagnose heart disease from patient’s medical infor-
mation. It included two steps: an automated technique for the generation of weighted
fuzzy rules and a fuzzy rule-based decision support model. In the first step, the attri-
bute selection and weightage approach was utilized for acquiring the weighted fuzzy
rules. Later, the fuzzy model was developed with respect to the weighted fuzzy rules
and the selected attributes. Subasi (2012) adopted distinct kinds of machine learning
techniques for categorizing the EMG signals and contrasted in association with the
classification accuracy of EMG signals. The suggested method automatically categor-
ized the attributes of EMG into normal and neurogenic, or myopathic.
In 2013, Kaya and Uyar (2013) recommended a novel medical decision support
model on the basis of RS and an ELM for diagnosing the hepatitis disease. The pro-
posed RS-ELM included two phases; the similar features were eliminated from the
dataset by the RS method in the first step. Later, the classification was done by
ELM with the help of the other features. In 2013, Babu and Suresh (2013) presented
a gene expression-based technique to predict the Parkinson’s disease by PBL-
McRBFN. It consists of two components: cognitive and meta-cognitive components.
The meta-cognitive components handle the learning procedure of cognitive com-
ponent with the selection of the best learning approach to the present example.
In 2014, Lee, Ku, Nam, Pham, and Kim (2014) aimed to forecast the status of
FPG, which was employed in detecting type 2 diabetes by the accumulation of
several metrics in Korean adults. The observations of AU-ROC curve in the predic-
tions were obtained using LR and NB classifiers. Valdés-Mas et al. (2014) developed
a new method on the basis of machine learning for predicting the vision gain of Ker-
atoconus patients after ring implantation. It was evaluated by the corneal curvature
and astigmatism. The performance of ANN with MLP attained best results. Majid,
Ali, Iqbal, and Kausar (2014) proffered a new technique for predicting breast and
colon cancer by various feature spaces. In the pre-processing phase, the MTD
approach was used for improving the minority class samples and as a result balancing
the dataset. Machine learning techniques, such as KNN and SVM, were utilized in the
predictor phase for introducing the hybrid MTD-SVM and MTD-KNN prediction
approaches.
In 2015, Munsell et al. (2015) analyzed machine learning techniques for predicting
the surgical results using TLE by only the structure of brain connectome. In order to
predict the results, two-stage connectome-based prediction framework was proposed,
which significantly chose the less amount of abnormal network connections for
Journal of Management Analytics 7

providing the surgical treatment results, and in every step, a linear kernel function was
employed for enhancing the accuracy. Memarian, Kim, Dewar, Engel, and Staba
(2015) implemented machine learning techniques, especially the accumulation of
mutual data-based feature selection and supervised learning methods on multi-
modal information for predicting the surgery results, which were diagnosed by the
MTLE and consequently provided best anteromedial temporal lobectomy. Barbieri
et al. (2015) presented a new model with the help of various machine learning
methods, as the difference of the CKD patients was clearly considered for providing
the normal and steadfast approach to predict Erythropoiesis Stimulating Agents or
Iron therapy response. Dai et al. (2015) aimed to predict heart-related hospitalizations
accurately and effectively on the basis of the existing medical history. Here, five
machine learning techniques, such as SVM, AdaBoost, Logistic Regression, NB,
and a differentiation of Likelihood Ratio Test, were developed. Every method was
trained on the training and testing datasets.
In 2016, Lee and Kim (2016) evaluated the connection among the HW phenotype
and type 2 diabetes in Korean adults and also assessed the prediction power of differ-
ent phenotypes, including the accumulations of unique anthropometric metrics and
TG levels. Here, LR and NB classifiers were employed for validation using 10-fold
cross-evaluation approach.
In 2017, Chen, Hao, Hwang, Wang, and Wang (2017) organized the machine
learning techniques well to predict the chronic disease outbreak in disease-frequent
groups effectively. Luo, Ding, Liang, Cao, and Chen (2017) offered a CPTL for prior-
itizing miRNAs associated with disease. By merging the similarities of disease and
miRNA, a miRNA-disease network to predict miRNA-disease was constructed. By
using transduction learning, the relevance score of the node consisting of miRNA
and disease was computed. Zhang et al. (2017) introduced a fast Fourier transform-
ation-coupled machine learning ensemble technique to predict short-term disease
risk for producing chronic heart disease patients with suitable suggestions regarding
the requirement of clinical tests. Here, the combination of ANN, LS-SVM, and NB
was built with an ensemble model. Nilashi, binIbrahim, Ahmadi, and Shahmoradi
(2017) suggested an analytical approach for predicting the disease by clustering,
removing noise, and prediction methods. In order to generate fuzzy rules, CART
was utilized. Kotsavasiloglou, Kostikis, Hristu-Varsakelis, and Arnaoutoglou (2017)
implemented an extensive machine learning technique for construction of a system,
which was able to categorize the unknown subjects on the basis of their line-
drawing performance.
In 2018, Khalid and Sezerman (2018) merged the features of structural and series
to predict HIV resistance by implementing SVM and RF methods. Jordanski,
Radovic, Milosevic, Filipovic, and Obradovic (2018) suggested a machine learning-
based technique for computing the WSS distribution for initiating and developing
the atherosclerosis. For capturing the associations among the blood density,
dynamic viscosity, and velocity, WSS distribution and geometric parameters of
AAA and Carotid bifurcation approaches, a MLR, Gaussian Conditional Random
Fields, and MLP were presented. Raweh, Nassef, and Badr (2018) intended to
predict the cancer by a hybrid model on the basis of feature selection and extraction
mechanisms. Moreover, the developed model used a filter feature selection technique
named F-score for overcoming the high-dimensionality issue, and introduced an
extraction approach that utilized the mean methylation density, symmetry among
8 K. Pradhan and P. Chawla

the methylation density and the mean methylation density, and FFT model of normal
and cancer persons for appropriate classification of cancer and decreasing training
time. Sedaghat, Fathy, Modarressi, and Shojaie (2018) implemented a two-step
process for improving the outcomes of sequence-based prediction models. The first
step was on the basis of consensus learning, and the second step consists of SVM in
both unary and binary modes for recognizing the evaluated interactions that rely on
the binding and network features of genes in the gene regulatory network. Wang
et al. (2018) employed the advantages of CNN for automatic learning features from
time series of necessary signs and absolute feature embedding for efficient encode
feature vectors by heterogeneous clinical features. The features learned by CNN and
statistical features by feature embedding were given to MLP to predict. Çarklı Yavuz,
Yurtay, and Ozkan (2018) offered a contribution for predicting the protein secondary
structure by the nature-inspired algorithms. In the first stage, the data were trained
using CSA that was designed with respect to the live immune model. Later, classifi-
cation was done using MLP, which was inspired from the biological nervous system.
In 2019, Mohan, Thirumalai, and Srivastava (2019) introduced a framework for
finding the important features using machine learning approaches by enhancing the
prediction accuracy of cardiovascular disease. By accumulating several features of
the existing classification methods, a new prediction model was developed:
HRFLM. Fitriyani, Syafrudin, Alfian, and Rhee (2019) developed a DPM for the
early prediction of type 2 diabetes and hypertension on the basis of one’s risk factor
information. Moreover, it included iForest-based outlier detection approach for eradi-
cating the outlier information, for balancing data distribution utilized SMOTETomek,
and for disease prediction, an ensemble approach was employed. Prince, Andreotti,
and De Vos (2019) suggested a multisource ensemble learning approach that
merged the dataset deconstruction and enabled the participants with partial infor-
mation for including during the training of machine learning techniques to obtain
high participant retention rate. Li et al. (2019) recommended a non-invasive and accu-
rate detection process for DS and for minimizing the cost of parental diagnosis. A cas-
caded machine learning algorithm was introduced for predicting DS on the basis of
three steps, such as pre-judgment with the iForest approach, ensemble approach by
voting mechanism, and the last judgment by the logistic regression technique.
Vásquez-Morales, Martínez-Monterrubio, Moreno-Ger, and Recio-García (2019)
developed a neural network-based classifier for predicting whether the person is at
the risk of growth of CKD or not. Haq et al. (2019) presented SVM for predicting
the Parkinson’s disease. In order to classify the healthy and the Parkinson’s disease-
affected people, the L1-Norm of SVM feature selection was employed and provided
a new feature subset from the dataset on the basis of feature weight value. Davi
et al. (2019) offered a machine learning technique to predict the severity of dengue
fever. Here, SVM was utilized for determining the loci classification subset, whereas
ANN was employed for classifying the patients into dengue fever and severe
dengue. Lai, Zhang, Zhang, Su, and Bin Heyat (2019) suggested an automated mech-
anism to predict SCD by an extreme level of accuracy with the help of measurable
arrhythmic markers. The arrhythmic parameters consisted of two-conduction -repo-
larization and three repolarization interval ratios. The computed markers were utilized
for the classification of SCD and normal SCD by machine learning classifiers such as
KNN, NB, DT, RF, and SVM. Yoon and Li (2019) presented the TL approach (PTL),
which leveraged remaining patients’ data, while constructing a predictive approach to
Journal of Management Analytics 9

a target patient. The special characteristic of the proposed model was able to choose
the patients for transferring and therefore averting a negative transfer. Ali et al. (2019)
suggested a new prediction model that utilized χ 2 statistical model and prevented the
model from over-fitting and under-fitting. To remove the redundant features, χ2 stat-
istical method was introduced when the optimal configuration of DNN was searched
with an extensive searching mechanism. Wang et al. (2019) developed a new prediction
model for DMP_MI. The missing values were reduced using NB classifier for data
normalization. Later, for decreasing the effect of class imbalance on the prediction
performance, an ADASYN was used. At last, for generating predictions, RF was
adopted and validated with the help of evaluation indicators. Zhang, Ren, Cheng,
Wang, and Wei (2019) implemented GBDT when the blood pressure rates were fore-
casted on the data gathered using EIMO tool. The tool consisted of both ECG and
PPG signals. The optimal parameters were chosen using the cross-validation approach
to prevent over-fitting. Perveen, Shahbaz, Keshavjee, and Guergachi (2019) explored
the association among the diabetes mellitus and one of the risk factors of MetS, in a
non-conservative setting, by using related risk factors of MetS, the future onset of dia-
betes was predicted and for probing the respective performance of machine learning
algorithms the data sampling methods were employed for creating balanced training
datasets. Dinh, Miertschin, Young, and Mohanty (2019) investigated data-driven
models that employed supervised machine learning algorithms for recognizing the
patients suffering from which kind of disease. The machine learning models, such as
SVM, gradient boosting, logistic regression, RF, were merged for introducing a
weighted ensemble technique, which is able to leverage the performance of different
techniques for enhancing the diagnosing accuracy. Ed-daoudy and Maalmi (2019)
introduced a new model for predicting the status of real-time health and analytical
model by big data methodology, which concentrated on implemented distributed
machine learning approach. At first, the DT model was transformed into distributed,
scalable, fast, and parallel DT with the help of Spark in the place of Hadoop Map
Reduce that turned out to be restriction to the distributed sources of different diseases
for predicting the status.

2.3. Chronological review


The chronological review on the contributions of various disease predictions, using
machine learning algorithms, is shown in Figure 1. From the graph, it is found that,
in 2009, the contribution for distinct disease prediction using machine learning algor-
ithms is 9.23%. For predicting the various types of diseases by machine learning tech-
niques, the contributions in 2010 and 2013 is 3.07%. Moreover, the contribution in
2011 seems to be 16.9% for predicting the different diseases. The machine learning
algorithms, used for predicting the discrete diseases in 2012 and 2016, provided the
contributions as 4.6%. In 2014, the contribution for dissimilar disease prediction is
6.1%. Similarly, in 2015, the different types of disease prediction contribution is
7.6%. In 2017 and 2018, the contribution is 9.2% for predicting several diseases.
Finally, in 2019, the contribution is 26.1%, which is much extensive in predicting
the various kinds of diseases than in other year contributions. Thus, it is confirmed
that the contributions of distinct disease predictions with the help of machine learning
algorithms in 2019 are more, less in 2010 and 2013, and yet to implement in upcoming
years.
10 K. Pradhan and P. Chawla

Figure 1. Choronological review on machine learning algorithms used for various disease
predictions.

3. Contribution of machine learning algorithms on detecting several diseases


3.1. Detected type of diseases
Based on different contributions, we have come to know that different researchers have
been focusing on different diseases using various machine learning algorithms. Hence,
a review is done for observing the types of diseases predicted by those machine learn-
ing algorithms for the past 10 years, as shown in Figure 2. In Figure 2, multiple types
of diseases are detected, such as lung cancer, diabetes, hypertension, heart disease,
heart–lung transplantation, chronic diseases, chronic kidney diseases, mi-RNA,
HIV, general cancer, lymphatic, Parkinson’s disease, dengue, breast cancer, prostate
cancer, hepatitis, neuromuscular disorders, keratoconus, and brain connectome
using machine learning algorithms. The detection of heart diseases and lung cancers
is done in 15% of the research studies. The diabetes detection was done in 10% of
the recent contributions. The other diseases, such as dengue, keratoconus, brain con-
nectome, lymphatic, HIV, heart–lung transplantation, and hypertension, are con-
sidered in minimum amount for detection. Moreover, the further research studies
need to develop significant improvements in predicting lung cancer as a challenging
point of the current survey using IoT integration.

3.2. Topology of machine learning algorithms


The classification of machine learning techniques to predict various diseases is shown
in Figure 3. In order to predict the various types of diseases, different machine learning
algorithms are used. Some of the machine learning algorithms, such as SVM, NN,
LR, NB, Fuzzy logic, transfer learning, RF, DBN, ELM, DT, ensemble learning,
Transduction learning, KNN, and Adaboost, are mostly utilized in diverse contri-
butions. Moreover, SVM is categorized into Boosted SVM (Zięba et al., 2014),
LKT-SVM (Tang et al., 2009), RS-SVM (Chen, Yang, et al., 2011), SVST (Chen &
Journal of Management Analytics 11

Figure 2. Use of Machine learning algorithms for different types of diseases till now.

Lin, 2011), MLSVM (Zhong et al., 2012), L1-SVMR (Choi et al., 2011), LFDA-SVM
(Chen, Liu, et al., 2011), Fuzzy SVM (Subasi, 2012), MTD-SVM (Majid et al., 2014),
and LS-SVM (Munsell et al., 2015) for predicting distinct diseases in the earlier con-
tributions. Similarly, NN is classified as DNN (Ali et al., 2019), CNN (Chen et al.,
2017), MLP, (Raweh et al., 2018), GANN (Tong & Schierz, 2011), and CSDNN
(Wang et al., 2018), which are employed for diagnosing different diseases in different

Figure 3. Categorization of machine learning algorithms for disease prediction.


12 K. Pradhan and P. Chawla

contributions. Moreover, GBDT (Zhang, Ren, et al., 2019) is the modified form of DT,
CVIFLR (Li et al., 2019) is the modified form of LR used for detecting diseases. More-
over, RF and Fuzzy logic are grouped into HRFLM (Mohan et al., 2019) and Fuzzy
SVM (Subasi, 2012), respectively in order to predict discrete diseases in various con-
tributions. Therefore, more research studies need to be improved for predicting lung
cancer in an efficient manner with the help of improved machine learning techniques.

3.3. Features and challenges


The advantages and disadvantages of frequently used mentioned machine learning
algorithms for the prediction of diverse diseases are depicted in Table 1. The specified
benefits and challenges will help for upcoming research studies for developing a well-
performing method for predicting lung cancer with IoT integration.

4. Deep learning frameworks and libraries


How deep learning emerged i.e. what are the limitations of previous technologies that
let evolution of deep learning.
Artificial Intelligent (AI): If a car exceeds the speed limits, then for a human to
monitor and note down all the numbers is not possible. We can use a machine to
capture the number plate picture and convert it into a text format. The best
example of AI is self-driving car. AI has the capability of a machine to imitate intel-
ligent human behavior. AI is implemented by studying how human brain thinks and
how human being learns, decides, and works while trying to solve a problem. The
applications of AI are speech recognition understanding natural language, and
image processing.
In 2006, deep learning came into picture to overcome the limitation of machine
learning. It overcomes the drawback of ML i.e. high dimensionality of the data with
large numbers of input and output. The second drawback that it handles is feature
extraction, one of the biggest challenges with the traditional ML model. For
complex problems, such as object recognition or hand writing recognition, this is
the huge challenge. ML has types of AI that provides computer the ability to learn
without being explicitly programmed.

(1) Supervised Learning: Explaining it with an example, where you have input
variable (X) and output variable (Y) and you use an algorithm to learn the
mapping function from the input to the output.
(2) Unsupervised Learning: It is the training of a model using information that is
neither classified nor labelled. This model can be used to cluster the input data
on the basis of their statistical properties.
(3) Reinforcement Learning: It is learning by interacting with space or an environ-
ment. Reinforcement learning agents learn from the consequences based on its
action rather than from being taught explicitly. It selects its action on the basis
of its past experiences (exploitation) and also by new choices (exploration).

Deep learning model is capable of focusing on the right feature by itself, requiring
little knowledge from the program. These models also partially solve the dimension-
ality problem. The idea of DL is to build learning algorithm that mimics the brain.
Journal of Management Analytics 13

Table 1. Merits and demerits of frequently used machine learning algorithm for predicting
diseases.

Methods Features Challenges


SVM

. It is efficient in high . The performance declines while


dimensional spaces. the target classes are
. It has more storage space. overlapping.
. It is not suitable for vast
datasets.

NN

. Has more fault tolerances . It is very tough to find the


and has the capability of number of neurons and layers.
parallel processing. . It is dependent on hardware.
. Has the ability for executing
without any knowledge on
the task.

DT

. The effort required for data . It requires more time to train a


preparation at the time of model.
pre-processing is very less. . Since it is very difficult and
. There is no need of requires more time, it is
normalization and scaling of computationally expensive.
data.

ELM

. The training time is very less. . It has the problem of over-


. The non-linear activation fitting.
function is still in working . The model is not satisfied and it
condition. impacts the accuracy of the
outcomes.

RF

. It has the capability for . They are very complex and take
solving regression and more time to build a DT.
classification issues. . It is highly expensive because
. It has the capacity to handle training more deep trees
the missing values requires more storage space.
automatically.
14 K. Pradhan and P. Chawla

Adaboost

. It is quite fast, simple, and . Weak classifiers might cause


easy to implement. less margins and over-fitting.
. It is flexible to combine with . It is very sensitive to outliers
any machine learning and noisy data.
techniques.

NB

. It is very fast when compared . The performance is sensitive to


over sophisticated skewed information.
approaches. . Feature interactions cannot be
. It requires some amount of integrated.
data to train.

DBN

. The response is very fast and . The performance is very poor


it is suitable for small and for high dimensional data.
incomplete datasets. . As these networks are acyclic, it
. It has the ability to detect doesn’t support feedback loops.
non-linear relationships for
acquiring efficient
connectivity.

KNN

. Training is very quick. . It requires more memory space.


. It is easy and simple to . The testing procedure is quite
implement. slow and the noise is very
sensitive.

Transduction
Learning
. It is a well-known approach . It doesn’t construct a predictive
for sparse data. model.
. It has the ability to consider . The device size is too large.
all the points along with the
unlabelled points.
Journal of Management Analytics 15

Transfer
Learning
. With the help of pre-trained . Negative transfer is a critical
system, the training process problem.
of the model becomes fast on . The data transfer is happened
the new job. only when it is suitable.
. It requires less training time,
and has better performance
of NN and it doesn’t require
more amount of data.

Ensemble
Learning
. They won’t have the problem . These are computationally
of over-fitting. expensive.
. The subsets of data are also . These models suffer from lack
trained well. of interpretability.

Fuzzy

. It is having the capacity for . Has to improve the strength in


coping up with non-linearity nature.
and uncertainty. . It treats all the factors with
. It is able to hold the problems equal significance, which needs
with incomplete and to be unified.
inaccurate information.

LR

. It is very strong and handles . Interpolation is very complex.


non-linear effects. . These are not viable for
. It is able to provide the final capturing complex
classification when compared relationships.
over other models.

A collection of statistical machine learning techniques are used to learn feature hier-
archies often used based on the artificial neural network. Some applications of DL are
self-driving cars, voice-controlled assistance, automatic machine translation, game
paying, etc. Deep learning skips the manual steps of extracting features, you can
directly feed images to the deep learning algorithm which predicts the objects.
Table 2 explains the comparison among various deep learning frameworks with
reference to framework, License, programming language, software support, release
date, and supporting algorithms such as CNN and RNN and DBN. In Table 2, it is
16 K. Pradhan and P. Chawla

observed that to develop any software using deep learning C++ and python program-
ming language are mostly used. In Guo et al. (2020), Python was used as programming
language with the software support Python3.3 or Jupyter Notebook. It was released in
2017 with the support of CNN, RNN, and DBN. Programming language C++ has
been used in frameworks such as PyTorch (Ketkar, 2017), Keras (Jakhar & Hooda,
2018), Caffe (Jia et al., 2014), MXNet (Chen et al., 2015), and TensorFlow (Abadi
et al. 2016) to increase speed. Likewise, conveyed estimation gets regular in some
recently discharged structures, for example, TensorFlow, MXNet, Keras, and
Chainer (Tokui et al., 2019). The objective is to additionally improve the figuring pro-
ficiency for deep learning. MXNet underpins a few interfaces including C++, Python,
R, Scala, Perl, MATLAB, Javascript,Go (Skoymind, 2017). It bolsters both calcu-
lation diagram affirmations and basic calculation presentations for engineering
plan. MXNet bolsters information and model parallelism as well as follows parameter
server plans to help circulated count too. MXNet is most useful, yet the exhibition
isn’t streamlined as much as other condition of the art structures.

5. Analysis on environments, datasets, and performance metrics of each


contribution
5.1. Utilized environments
The tools used for different types of disease prediction using machine learning tech-
niques are graphically represented in Figure 4. From the analysis, MATLAB software
is 18.4% utilized, which is frequently employed in earlier contributions for predicting
various kinds of diseases. Moreover, in the past research works, the tools LIBSVM and
SPSS are 6.1% adopted. The other technologies, such as Python, Java, Java and Spark,
springs web MVC, KEEL, and R studio, are utilized rarely. Hence, it is confirmed that
MATLAB is an extensive tool adopted for predicting different kinds of diseases in the
past contributions, and it still has the chances of advanced algorithms to be executed.

5.2. Suggested datasets


The different types of datasets employed for predicting various types of diseases by
diverse machine learning algorithms are depicted in Table 2. The UCI repository
dataset was mostly employed in the earlier contributions for predicting the multiple
kinds of diseases. In order to predict lung cancer with machine learning algorithms,
the frequently utilized dataset is NLST. In different contributions, many other bench-
marks and real datasets are used for predicting various diseases. From the table, it is
found that the NLST and UCI are the most frequently used datasets for disease pre-
diction, especially lung cancer.

5.3. Performance metrics


The evaluations on performance measures regarding disease prediction using machine
learning algorithms are shown in Table 3. Here, the performance metrics, such as
“accuracy, sensitivity, specificity, precision, F1-score, AUC, recall, and ROC” were
considered frequently in the earlier contributions. The performance measures specified
in miscellaneous are utilized rarely. From the analytical tabulation, metric accuracy is
taken in 80% of the contributions. Moreover, the sensitivity and specificity are
Table 2. The comparison of distinction deep learning framework.

Journal of Management Analytics


Deep learning
algorithms support

Framework Licence Programming language Software support Released year CNN & RNN DBN

Gluon(Guo et al., 2020) Apache 2.0 Python Python 3.3 or Jupyter Notebook 2017 Yes Yes
PyTorch(Ketkar, 2017) BSD Python, C++, Cuda Python 2016 Yes Yes
Keras(Jakhar & Hooda, 2018) MIT Licence Python, Python 2015 Yes Yes
java
Caffe(Jia et al., 2014) BSD C++ Python and Matlab 2015 Yes No
MXNet(Chen et al., 2015) Apache 2.0 C++ C++,R, Scala, Perl, Python. 2015 Yes Yes
TensorFlow(Abadi et al., 2016) Apache 2.0 C++ and Python Python, Java, C and C++ 2015 Yes Yes
Chainer(Tokui et al., 2019 MIT Python Python 2015 Yes Yes
Deeplearning 4j(Skoymind, 2017) Apache 2.0 java Java, Python scala 2014 Yes Yes
Theano(Team et al. 2016) BSD Python Python 2008 Yes Yes

17
18 K. Pradhan and P. Chawla

Figure 4. Bar chart representation of tools utilized for disease prediction.

considered in 49.2% and 44.6% contributions, respectively. In the past contributions,


the performance measure precision was considered in 23%. In addition, the measures
F1-score and AUC are measured in 16.1% and 19.2% of the earlier works, respectively.
Both recall and ROC are considered in 12.3% of the contributions. Finally, it seemed
that the measure accuracy is considered in most of the papers that can validate the effi-
ciency of each model, whereas the measures recall and ROC are taken into account
rarely (Table 4).

5.4. Best performance of detection accuracy


The best performance of accuracy in predicting the disease in various contributions
using different machine learning algorithms is represented in Figure 5. From Figure
5, the maximum accuracy attained for detecting hepatitis disease is 96.77% using
LFDA-SVM (Chen, Liu, et al., 2011) algorithm in 2011. In 2009, the accuracy
obtained by SVM is 91.40%, which is the minimum accuracy for predicting breast
cancer (Kim et al., 2010). The accuracy acquired in 2014 for detecting breast and
colon cancers with MTD-SVM (Majid et al., 2014) is 96.71%. Moreover, the accu-
racy obtained for predicting chronic kidney diseases is 95% by employing NN
(Vásquez-Morales et al., 2019) in 2019. With the help of SVM (Capriotti &
Altman, 2011), the accuracy obtained is 93% to predict cancer that leads to missense
variants. In 2015, the chronic disease predicted with LS-SVM (ALzubi et al., 2019)
has attained the accuracy as 95%. Many neurology- and physiology-related diseases
are predicted using SVM (Mohabatkar et al., 2011), which attains the accuracy as
94.12% in 2011. Thus, it is proved that the maximum accuracy obtained in 2011
is 96.77% for predicting hepatitis disease. Thus, many improved machine learning
algorithms have to be implemented for predicting the lung cancer in medical IoT
with high accuracy.
Journal of Management Analytics 19

Table 3. Various datasets utilized for predicting diseases using machine learning algorithms.

Lung Cancer

Author and Citation Datasets

Azzawi et al. (2016) Real Microarray lung cancer datasets


Petousis et al. (2019) NLST
Tan et al. (2009) NLST
Kim et al. (2010) Occupational Safety and Health Research Institute
Zięba et al. (2014) Wroclaw Thoracic Surgery Centre
ALzubi et al. (2019) Wroclaw and Lower-Silesian Centre
Engchuan and Chan Gene Expression Omnibus
(2015)
Petousis et al. (2016) NLST
Lynch, Abdollahi, et al. Surveillance, epidemiology, and end results database
(2017)
Other Diseases
Barakat et al. (2010) Real-life diabetes dataset
Lee et al. (2014) National Institute of Health Ethics and the Institutional Review
Board of the Korean Health and Genomic Study
Lee and Kim (2016) November 2006 and August 2013 from hospitals in Ansan, Anseong,
and other cities in the Republic of Kokmimmmrea
Chen et al. (2017) Real-life hospital data
Luo et al. (2017) Human miRNA-Disease Database
Zhang et al. (2017) Real-life Tunstall dataset
Khalid and Sezerman PDB RSCB
(2018)
Raweh et al. (2018) The Cancer Genome Atlas
Sedaghat et al. (2018) TGCT and KIRC datasets
Wang et al. (2018) Barnes-Jewish Hospital
Çarklı Yavuz et al. Protein Data Bank
(2018)
Mohan et al. (2019) UCI repository
Li et al. (2019) MSS dataset
Vásquez-Morales et al. Global dataset
(2019)
Haq et al. (2019) Repository of the University of Oxford
Davi et al. (2019) Genome data
Lai et al. (2019) AHA database
Yoon and Li (2019) 10 medical centers in the USA and Intel Corporation
Ali et al. (2019) Cleveland heart disease dataset
Wang et al. (2019) Pima Indians diabetes dataset
Perveen et al. (2019) Canadian Primary Care Sentinel Surveillance Network
Dinh et al. (2019) National Health and Nutrition Examination Survey dataset
Ed-daoudy and Maalmi Kaggle and cleveland data
(2019)
Akay (2009) Wisconsin breast cancer dataset
Oztekin et al. (2009) heart–lung organ transplant data provided by UNOS
Choi et al. (2011) Real datasets

(Continued )
20 K. Pradhan and P. Chawla

Table 3. Continued.

Lung Cancer

Author and Citation Datasets

Chen, Yang, et al. Wisconsin Breast Cancer Dataset


(2011)
Chen and Lin (2011) www.genome.wi.mit.edu/MPR
Tong and Schierz (2011) Broad Institute website
Lee et al. (2011) Korean Genome and Epidemiology Study
Capriotti and Altman Cancer and Neutral missense variants only dataset
(2011)
Chen, Liu, et al. (2011) UCI repository
Mohabatkar et al. NCBI database
(2011)
Sattlecker et al. (2011) FTIR dataset
Åström and Koker Voice recording originally done at the University of Oxford by Max
(2011) Little
Mohebian et al. (2017) Cohort database
Zhong et al. (2012) HCUP-3 databases
Anooj (2012) Data Mining Repository of the University of California
Kaya and Uyar (2013) UCI Repository
Babu and Suresh (2013) Microarray gene expression data set from ParkDB database
Majid et al. (2014) Sjoblom and Dob-son groups
Munsell et al. (2015) Connectome dataset
Barbieri et al. (2015) Patients undergoing hemodialysis in Fresenius Medical Care clinics
in Portugal, Spain and Italy
Dai et al. (2015) Boston Medical Center
Nilashi et al. (2017) UCI repository

6. Research gaps and challenges


Lung cancer is the second major disease occurs in humans and it mainly leads to
cancer mortality in the entire world. The whole 5-year survival rate of patients with
lung cancer is not beyond 14% that is drastically less than the patients suffering
from cancer in other organs such as breast, cervix, bladder, prostate, and colon
(Huang et al., 2003; Jemal et al., 2011). Thus, early prediction of lung cancer is
very important for the appropriate treatments for decreasing the deaths. In big data,
healthcare is one of the significant sources. Accurate examination of healthcare infor-
mation is mostly demanded for detecting lung cancer in an early stage. Multiple
research studies are being designed newly to recognize lung cancer with more
quality using big data. Still, there is a necessity for the classification approach for
improving the detection accuracy with respect to time. In addition, machine learning
techniques are modeled for enhancing the detection accuracy in big data. Specifically,
lung cancer is not well known that means which kind of approaches will give high
detection data and which data attributes must be employed for the detection purpose.
Delen, Walker, and Kadam (2005), with the help of huge datasets, prediction
methods for breast cancer survivability were introduced by implementing two
famous data mining techniques such as ANN and DT, and also utilized a common
Table 4. Analysis on performance metrics concerned for disease prediction using machine learning algorithms.

Citations Accuracy Sensitivity Specificity Precision F1-score AUC Recall ROC Miscellaneous
Barakat et al. (2010) ✓ ✓ ✓ - - ✓ - - TPR, FPR
Lee et al. (2014) - ✓ ✓ ✓ ✓ ✓ - - -
Azzawi et al. (2016) ✓ ✓ ✓ - - ✓ - - -
Lee and Kim (2016) - - - - - ✓ - - -
Chen et al. (2017) ✓ - - ✓ ✓ - ✓ - -
Luo et al. (2017) - - - ✓ - - ✓ - FPR and TPR
Zhang et al. (2017) ✓ - - - - - - - Work load saving and risk

Journal of Management Analytics


Khalid and Sezerman ✓ ✓ ✓ - - - - - -
(2018)
Jordanski et al. (2018) ✓ - - - - - - - -
Raweh et al. (2018) ✓ - - - ✓ - - - RMSE and MAE
Sedaghat et al. (2018) - - - ✓ - - - - -
Wang et al. (2018) ✓ ✓ ✓ - ✓ ✓ - ✓ PPV, and NPV
Çarklı Yavuz et al. (2018) - - - - - - - - Success change
Mohan et al. (2019) ✓ ✓ ✓ ✓ ✓ - - - -
Petousis et al. (2019) - ✓ ✓ ✓ - - - - -
Fitriyani et al. (2019) ✓ - - ✓ ✓ - ✓ - -
Prince et al. (2019) ✓ - - - ✓ - - - -
Li et al. (2019) - - - - - - - - PRC, AUPRC, and AUROC
Vásquez-Morales et al. ✓ ✓ ✓ ✓ ✓ ✓ ✓ - -
(2019)
Haq et al. (2019) ✓ ✓ ✓ ✓ - - ✓ - L1-Norm and execution time
Davi et al. (2019) ✓ ✓ ✓ ✓ ✓ ✓ - - -
Lai et al. (2019) ✓ ✓ ✓ - - - - - -
Yoon and Li (2019) ✓ - - - - - - - -
Ali et al. (2019) ✓ ✓ ✓ - - - - - MCC
Wang et al. (2019) ✓ - - ✓ ✓ ✓ ✓ - -
Zhang, Ren, et al. (2019) ✓ - - - - - - - MAE
Perveen et al. (2019) - - - ✓ ✓ - ✓ - AUROC

21
✓ ✓ ✓ ✓

22
Dinh et al. (2019) - - - - -
Ed-daoudy and Maalmi ✓ ✓ ✓ - - - - ✓ -
(2019)
Tan et al. (2009) ✓ ✓ ✓ - - - - - -
Akay (2009) ✓ ✓ ✓ - - - - ✓ PPV, and NPV
Çınar et al. (2009) ✓ ✓ ✓ - - - - - -
Anand and Suganthan ✓ - - - - - - - -
(2009)
Oztekin et al. (2009) ✓ ✓ ✓ - - - - - -
Tang et al. (2009) ✓ ✓ - ✓ - - - - -
Kim et al. (2010) ✓ - - - - - - - -
Choi et al. (2011) ✓ - - - - - - - -

K. Pradhan and P. Chawla


Chen, Yang et al. (2011) ✓ ✓ ✓ - - - - ✓ Confusion matrix
Chen and Lin (2011) ✓ ✓ ✓ - - - - - -
Tong and Schierz (2011) ✓ - - - - - - - -
Lee et al. (2011) - - - - - - - - Misclassification rates
Capriotti and Altman ✓ ✓ - - - ✓ - - PPV, and correlation coefficient
(2011)
Chen, Liu et al. (2011) ✓ ✓ ✓ - - - - - Confusion matrix
Mohabatkar et al. (2011) ✓ ✓ ✓ - - ✓ - - -
Sattlecker et al. (2011) ✓ - - - - - - - -
Åström and Koker (2011) ✓ - - - - ✓ - - TPR, TNR, and MSE
Mohebian et al. (2017) ✓ ✓ ✓ ✓ ✓ ✓ - - MCC, alpha, beta, DOR, DP, and Kappa
Zhong et al. (2012) ✓ - - - - ✓ - - MCC
Anooj (2012) ✓ ✓ ✓ - - - - - -
Subasi (2012) ✓ ✓ ✓ - - ✓ - - -
Kaya and Uyar (2013) ✓ ✓ ✓ - - - - - -
Babu and Suresh (2013) ✓ - - - ✓ - - - -
Valdés-Mas et al. (2014) - - - - - - - - ME, MAE, RMSE, and correlation coefficient
Majid et al. (2014) ✓ ✓ ✓ - ✓ ✓ - ✓ GMean, and MCC
Zięba et al. (2014) ✓ - - - - ✓ - - Gmean
ALzubi et al. (2019) ✓ - - - ✓ - - - FPR, classification time, space complexity, and
feature selection rate
Munsell et al. (2015) ✓ ✓ ✓ - - - - - PPV, and NPV
Memarian et al. (2015) ✓ - - - - - - - -
Barbieri et al. (2015) - - - - - - - - ME, RMSE, and MAE
Dai et al. (2015) ✓ ✓ ✓ - - - - ✓ False alarm rate, and detection rate
Engchuan and Chan ✓ - - - - - - - AUROC
(2015)
Petousis et al. (2016) ✓ - - ✓ ✓ ✓ ✓ ✓ -
Lynch, Abdollahi et al. - - - - - - - - Mean, standard deviation, and RMSE

Journal of Management Analytics


(2017)
Nilashi et al. (2017) ✓ - - - - - - - -
Kotsavasiloglou et al. ✓ ✓ ✓ - - ✓ - - -
(2017)

23
24 K. Pradhan and P. Chawla

Figure 5. Line representation of the best achieved accuracy during different contributions of
disease prediction.

statistical approach, LR. For measuring the unbiased assessment of three detection
models, ten-fold cross-validation mechanisms were used for the performance compari-
son. The outcomes have proved that DT was the well-performing classifier for predict-
ing the disease with an accuracy of 93.6% on the holdout sample; ANN was standing
the second best position with an accuracy of 91.2%. Similarly, logistic regression has
attained the accuracy of 89.2%. A research was done by Delen (2009) for developing
detection techniques to know the survivability of prostate cancer, using SVM along
with those three methods that were mentioned earlier. Here, the outputs have revealed
that the singled-out SVM acquired higher accuracy than ANN and DT . Moreover,
prostate cancer survivability was examined by ANNs, DTs, and LR methods by
Delen and Patil (2006). Multiple methods were contrasted by Hoogendoorn,
Moons, Numans, and Sips (2014) in SEER colon cancer patient dataset for predicting
survival rate, and it recognized that NNs were best for predicting the survival rate.
Ensemble voting of three outperformed classifiers present by Al-Bahrani, Agrawal,
and Choudhary (2013) was resulted in optimal prediction, and AU-ROC curve to
colon cancer survival rate. In some research studies, the survival of lung cancer
patient was examined by evaluating the SEER database using machine learning algor-
ithms, consisting of SVM, LR (Fradkin, Muchnik, & Schneider, 2005), unsupervised
approaches (Lynch, Berkel, & Frieboes, 2017), and clustering-based techniques (Chen
et al., 2009). In Arshadi and Jurisica (2005), data classification approaches were
assessed for finding the chances of patients with definite indications for the growth
of lung cancer. The performances of DT and NB classifiers were compared by Dimi-
toglou, Adams, and Jim (2012), and they were implemented for lung cancer data
acquired from SEER database. This attained approximately 90% precision in detect-
ing the survival of patients. Ensemble voting of five DTs and meta-classifiers existing
by Agrawal, Misra, Narayanan, Polepeddi, and Choudhary (2011, 2012) was resolute
for acquiring the best prediction survival rate of lung cancer regarding precision and
AU-ROC curve. Many challenges related to the machine learning algorithms are
Journal of Management Analytics 25

associated with manual training. The significant thing is complexity in accurate recog-
nition of nature for pre-processing them correspondingly before subjected to machine
learning algorithms. The time and the experts linked with this job were majorly high.
According to the research, it was manifested that there is lack of consistency in the
detection accuracy of machine learning techniques over classical prediction tech-
niques. With the present literature, this was made reliable. Many investigations that
compared the machine learning models with classic statistical model have been con-
firmed that their outcomes were different.
Even though multiple strategies were utilized for predicting different types of dis-
eases, the predictive models using the machine learning algorithms reported in the
literal works are fewer for lung cancer detection with IoT integration. Hence, there
is a high scope to implement more well-performing deep learning models that might
produce best prediction outcomes. Moreover, the enlarged availability of adequate his-
torical data of patients has paved the way for the development of novel deep learning
algorithms for lung cancer prediction. In addition, the optimization algorithms have
the ability to improvise the deep learning models. GA (Tong & Schierz, 2011) is
very simple to implement, which has the ability to find appropriate solutions within
a short span of time. However, there are few disadvantages such as it is not able to
find the optimal solution to the problem defined, and it is complex to select par-
ameters. Moreover, the benefit of PSO (Mohebian et al., 2017) is its ability to solve
the complex optimization problem. But, the convergence concept is not applicable.
Some of the positives of SMO (Zięba et al., 2014) are useful for solving quadratic pro-
blems that occur in the training of SVM, and also it reduces the memory storage. Yet it
has to improve by introducing a new variant. The ability of machine learning to solve
composite tasks with dynamic environment and knowledge has contributed to its
success in prediction research especially lung cancer, enabled with novel met-heuristic
algorithms.

7. Conclusion
The presented paper made an effort to study multiple machine learning methodologies
suitable for detection of lung cancer associated with IoT devices. The review has made
a research of approximately 65 papers detecting various kinds of diseases by machine
learning techniques and mentioned the important defects with the existing method-
ologies. The research has concentrated on different machine learning approaches uti-
lized to detect many diseases for search to a gap in future enhancement to predict lung
cancer in clinical IoT. Each and every method was examined and the entire challenges
were mentioned. In numerous contributions, the performance metrics were specified
with its simulation platforms. Moreover, the dataset utilized to predict the related dis-
eases was also examined, whether the dataset was standard or manually gathered
information. Finally, a complete research gap was also given on the basis of pro-
gression of intelligent approaches that will help to the research studies for detecting
the lung cancer patients precisely in early stages.

Disclosure statement
No potential conflict of interest was reported by the author(s).
26 K. Pradhan and P. Chawla

ORCID
Priyanka Chawla http://orcid.org/0000-0002-6029-4122

References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., … Ghemawat, S. (2016).
“Tensorflow: Large-scale machine learning on heterogeneous distributed systems,”
arXiv:1603.04467.
Aceto, G., Persico, V., & Pescapé, A. (2020). Industry 4.0 and health: internet of things, big data,
and cloud computing for healthcare 4.0. Journal of Industrial Information Integration, 18,
100129.
Agrawal, A., Misra, S., Narayanan, R., Polepeddi, L., & Choudhary, A. (2011). A lung cancer
outcome calculator using ensemble data mining on SEER data. Proceedings of the Tenth
International Workshop on Data mining in Bioinformatics, ACM.
Agrawal, A., Misra, S., Narayanan, R., Polepeddi, L., & Choudhary, A. (2012). Lung cancer
survival prediction using ensemble data mining on seer data. Scientific Programming, 20
(1), 29–42.
Akay, M. F. (2009). Support vector machines combined with feature selection for breast cancer
diagnosis. Expert Systems with Applications, 36(2), 3240–3247.
Akman, E., Karaman, A. S., & Kuzey, C. (2020). Visa trial of international trade: Evidence
from support vector machines and neural networks. Journal of Management Analytics, 7
(2), 231–252.
Al-Anni, R., Hou, J., Abdu-aljabar, R. D., & Xiang, Y. (2017). Prediction of NSCLC recurrence
from microarray data with GEP. IET Systems Biology, 11(3), 77–85.
Al-Bahrani, R., Agrawal, A., & Choudhary, A. (2013). Colon cancer survival prediction using
ensemble data mining on SEER data. 2013 IEEE International Conference on Big Data,
Silicon Valley, CA, pp. 9–16.
Al-Kadi, O. S., & Watson, D. (2008). Texture analysis of aggressive and nonaggressive lung
tumor CE CT images. IEEE Transactions on Biomedical Engineering, 55(7), 1822–1830.
Alahmari, S. S., Cherezov, D., Goldgof, D. B., Hall, L. O., Gillies, R. J., & Schabath, M. B.
(2018). Delta radiomics improves pulmonary nodule malignancy prediction in lung cancer
screening. IEEE Access, 6, 77796–77806.
Alanni, R., Hou, J., Azzawi, H., & Xiang, Y. (2019). Cancer adjuvant chemotherapy prediction
model for non-small cell lung cancer. IET Systems Biology, 13(3), 129–135.
Ali, L., Rahman, A., Khan, A., Zhou, M. I., Javeed, A., & Khan, J. A. (2019). An automated
diagnostic system for heart disease prediction based on χ 2 statistical model and optimally
configured deep neural network. IEEE Access, 7, 34938–34945.
ALzubi, J. A., Bharathikannan, B., Tanwar, S., Manikandan, R., Khanna, A., & Thaventhiran,
C. (2019). Boosted neural network ensemble classification for lung cancer disease diagnosis.
Applied Soft Computing, 80, 579–591.
Anand, A., & Suganthan, P. N. (2009). Multiclass cancer classification by support vector
machines with class-wise optimized genes and probability estimates. Journal of Theoretical
Biology, 259(3), 533–540.
Anooj, P. K. (2012). Clinical decision support system: Risk level prediction of heart disease
using weighted fuzzy rules. Journal of King Saud University – Computer and Information
Sciences, 24(1), 27–40.
Arshadi, N., & Jurisica, I. (2005). Data mining for case-based reasoning in high-dimensional
biological domains. IEEE Transactions on Knowledge and Data Engineering, 17(8), 1127–
1137.
Arunkumar, C., & Ramakrishnan, S. (2019). Prediction of cancer using customised fuzzy rough
machine learning approaches. Healthcare Technology Letters, 6(1), 13–18.
Åström, F., & Koker, R. (2011). A parallel neural network approach to prediction of Parkinson’s
disease. Expert Systems with Applications, 38(10), 12470–12474.
Azzawi, H., Hou, J., Xiang, Y., & Alanni, R. (2016). Lung cancer prediction from microarray
data by gene expression programming. IET Systems Biology, 10(5), 168–178.
Journal of Management Analytics 27

Babu, G. S., & Suresh, S. (2013). Parkinson’s disease prediction using gene expression – a pro-
jection based learning meta-cognitive neural classifier approach. Expert Systems with
Applications, 40(5), 1519–1529.
Barakat, N., Bradley, A. P., & Barakat, M. N. H. (2010). Intelligible support vector machines for
diagnosis of diabetes mellitus. IEEE Transactions on Information Technology in Biomedicine,
14(4), 1114–1120.
Barbieri, C., Mari, F., Stopper, A., Gatti, E., Escandell-Montero, P., Martínez-Martínez, J. M.,
& Martín-Guerrero, J. D. (2015). A new machine learning approach for predicting the
response to anemia treatment in a large cohort of End stage renal disease patients undergoing
dialysis. Computers in Biology and Medicine, 61, 56–61.
Capriotti, E., & Altman, R. B. (2011). A new disease-specific machine learning approach for the
prediction of cancer-causing missense variants. Genomics, 98(4), 310–317.
Çarklı Yavuz, B., Yurtay, N., & Ozkan, O. (2018). Prediction of protein secondary structure
With clonal selection algorithm and multilayer perceptron. IEEE Access, 6, 45256–45261.
Chen, A. H., & Lin, C.-H. (2011). A novel support vector sampling technique to improve classi-
fication accuracy and to identify key genes of leukaemia and prostate cancers. Expert Systems
with Applications, 38(4), 3209–3219.
Chen, D., Xing, K., Henson, D., Sheng, L., Schwartz, A. M., & Cheng, X. (2009). Developing
prognostic systems of cancer patients by ensemble clustering. Journal of Biomedicine and
Biotechnology, 2009, 1–7.
Chen, H.-L., Liu, D.-Y., Yang, B., Liu, J., & Wang, G. (2011). A new hybrid method based on
local fisher discriminant analysis and support vector machines for hepatitis disease diagnosis.
Expert Systems with Applications, 38(9), 11796–11803.
Chen, H.-L., Yang, B., Liu, J., & Liu, D.-Y. (2011). A support vector machine classifier with
rough set-based feature selection for breast cancer diagnosis. Expert Systems with
Applications, 38(7), 9014–9022.
Chen, M., Hao, Y., Hwang, K., Wang, L., & Wang, L. (2017). Disease prediction by machine
learning over Big data from healthcare communities. IEEE Access, 5, 8869–8879.
Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., … Zhang, Z. (2015). MXNet: A flexible
and efficient machine learning library for heterogeneous distributed systems.CoRR abs/
1512.01274. Retrieved from http://arxiv
Chi-Hsien, K., & Nagasawa, S. (2019). Applying machine learning to market analysis: Knowing
your luxury consumer. Journal of Management Analytics, 6(4), 404–419.
Choi, H., Yeo, D., Kwon, S., & Kim, Y. (2011). Gene selection and prediction for cancer classi-
fication using support vector machines with a reject option. Computational Statistics & Data
Analysis, 55(5), 1897–1908.
Çınar, M., Engin, M., Engin, E. Z., & Ziya Ateşçi, Y. (2009). Early prostate cancer diagnosis by
using artificial neural networks and support vector machines. Expert Systems with
Applications, 36(3), 6357–6361.
Cirujeda, P., Cid, Y. D., Müller, H., Rubin, D., Aguilera, T. A., Loo, B. W., … Depeursinge, A.
(2016). A 3-D riesz-covariance texture model for prediction of nodule recurrence in lung CT.
IEEE Transactions on Medical Imaging, 35(12), 2620–2630.
Dai, W., Brisimi, T. S., Adams, W. G., Mela, T., Saligrama, V., & Paschalidis, I. (2015).
Prediction of hospitalization due to heart diseases by supervised learning methods.
International Journal of Medical Informatics, 84(3), 189–197.
Das, A., Rad, P., Choo, K. K. R., Nouhi, B., Lish, J., & Martel, J. (2019). Distributed machine
learning cloud teleophthalmology IoT for predicting AMD disease progression. Future
Generation Computer Systems, 93, 486–498.
Davi, C., Pastor, A., Oliveira, T., Neto, F. B. L., Braga-Neto, U., Bigham, A. W., … Acioli-
Santos, B. (2019). Severe dengue prognosis using human genome data and machine learning.
IEEE Transactions on Biomedical Engineering, 66(10), 2861–2868.
Delen, D. (2009). Analysis of cancer data: A data mining approach. Expert Systems, 26(1), 100–
112.
Delen, D., & Patil, N. (2006). Knowledge extraction from prostate cancer data. Proceedings of
the 39th Annual Hawaii International Conference on, vol.5.
Delen, D., Walker, G., & Kadam, A. (2005). Predicting breast cancer survivability: A compari-
son of three data mining methods. Artificial Intelligence in Medicine, 34(2), 113–127.
28 K. Pradhan and P. Chawla

Dimitoglou, G., Adams, J. A., & Jim, C. M. (2012). Comparison of the C4.5 and a naive Bayes
classifier for the prediction of lung cancer survivability. Journal of Computing, 4(8), 1–9.
Dinh, A., Miertschin, S., Young, A., & Mohanty, S. D. (2019). A data-driven approach to pre-
dicting diabetes and cardiovascular disease with machine learning. BMC Medical Informatics
and Decision Making, 19(211), 1–15.
Ed-daoudy, A., & Maalmi, K. (2019). A new internet of things architecture for real-time predic-
tion of various diseases using machine learning on big data environment. Journal of Big Data,
6, 104.
Emaminejad, N., Qian, W., Guan, Y., Tan, M., Qiu, Y., Liu, H., & Zheng, B. (2016). Fusion of
quantitative image and genomic biomarkers to improve prognosis assessment of early stage
lung cancer patients. IEEE Transactions on Biomedical Engineering, 63(5), 1034–1043.
Engchuan, W., & Chan, J. H. (2015). Pathway activity transformation for multi-class classifi-
cation of lung cancer datasets. Neurocomputing, 165, 81–89.
Fan, Y. J., Yin, Y. H., Xu, L., Zeng, Y., & Wu, F. (2014). Iot based smart rehabilitation system.
IEEE Transactions on Industrial Informatics, 10(2), 1568–1577.
Fitriyani, N. L., Syafrudin, M., Alfian, G., & Rhee, J. (2019). Development of disease prediction
model based on ensemble learning approach for diabetes and hypertension. IEEE Access, 7,
144777–144789.
Fradkin, D., Muchnik, I., & Schneider, D. (2005). Machine learning methods in the analysis of
lung cancer survival data. DIMACS Technical Report.
Guo, J., He, H., He, T., Lausen, L., Li, M., & Lin, H. (2020). GluonCV and GluonNLP: Deep
learning in computer vision and natural language processing. Journal of Machine Learning
Research, 21, 1–7.
Haq, A. U., Li, J. P., Memon, M. H., khan, J., Malik, A., Ahmad, T., … Shahid, M. (2019).
Feature selection based on L1-norm support vector machine and effective recognition
system for Parkinson’s disease using voice recordings. IEEE Access, 7, 37718–37734.
Hawkins, S. H., Korecki, J. N., Balagurunathan, Y., Gu, Y., Kumar, V., Basu, S., … Gillies, R. J.
(2014). Predicting outcomes of nonsmall cell lung cancer using CT image features. IEEE
Access, 2, 1418–1426.
Hoogendoorn, M., Moons, L. M. G., Numans, M. E., & Sips, R.-J. (2014). Utilizing data
mining for predictive modeling of colorectal cancer using electronic medical records.
International Conference on brain Informatics and Health BIH 2014: Brain Informatics and
Health (pp 132–141).
Huang, Z. W., Mcwilliams, A., Lui, H., Mclean, D., Lan, S., & Zeng, H. S. (2003). Near-infra-
red Raman spectroscopy for optical diagnosis of lung cancer. International Journal of Cancer,
107(6), 1047–1052.
Jakhar, K., & Hooda, N. (December). Big data deep learning framework using Keras: A case
study of Pneumonia prediction. 2018 4th International Conference on computing communi-
cation and automation (ICCCA) (pp. 1–5). IEEE.
Jemal, A., Bray, F., Center, M. M., Ferlay, J. J., Ward, E., & Forman, D. (2011). Global cancer
statistics. Cancer Journal for Clinicians, 61(2), 69–90.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., & Girshick, R. (2014). Caffe:
Convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM inter-
national conference on Multimedia, (pp. 675–678).
Jordanski, M., Radovic, M., Milosevic, Z., Filipovic, N., & Obradovic, Z. (2018). Machine
learning approach for predicting wall shear distribution for abdominal aortic aneurysm
and carotid bifurcation models. IEEE Journal of Biomedical and Health Informatics, 22(2),
537–544.
Kaya, Y., & Uyar, M. (2013). A hybrid decision support system based on rough set and extreme
learning machine for diagnosis of hepatitis disease. Applied Soft Computing, 13(8), 3429–
3438.
Ketkar, N. (2017). Deep learning with python: A hands-on introduction. Berkeley, CA: Apress.
Khalid, Z., & Sezerman, O. U. (2018). Prediction of HIV drug resistance by combining sequence
and structural properties. IEEE/ACM Transactions on Computational Biology and
Bioinformatics, 15(3), 966–973.
Kim, T.-W., Koh, D.-H., & Park, C.-Y. (2010). Decision tree of occupational lung cancer using
classification and regression analysis. Safety and Health at Work, 1(2), 140–148.
Journal of Management Analytics 29

Kotsavasiloglou, C., Kostikis, N., Hristu-Varsakelis, D., & Arnaoutoglou, M. (2017). Machine
learning-based classification of simple drawing movements in Parkinson’s disease. Biomedical
Signal Processing and Control, 31, 174–180.
Kumar, D., Sankar, V., Clausi, D., Taylor, G. W., & Wong, A. (2019). SISC: End-to-end inter-
pretable discovery radiomics-driven lung cancer prediction via stacked interpretable sequen-
cing cells. IEEE Access, 7, 145444–145454.
Lai, D., Zhang, Y., Zhang, X., Su, Y., & Bin Heyat, M. B. (2019). An automated strategy for
early risk identification of sudden cardiac death by using machine learning approach on mea-
surable arrhythmic risk markers. IEEE Access, 7, 94701–94716.
Lee, B. J., & Kim, J. Y. (2016). Identification of type 2 diabetes risk factors using phenotypes
consisting of anthropometry and triglycerides based on machine learning. IEEE Journal of
Biomedical and Health Informatics, 20(1), 39–46.
Lee, B. J., Ku, B., Nam, J., Pham, D. D., & Kim, J. Y. (2014). Prediction of fasting plasma
glucose status using anthropometric measures for diagnosing type 2 diabetes. IEEE Journal
of Biomedical and Health Informatics, 18(2), 555–561.
Lee, J., Keam, B., Jang, E. J., Park, M. S., Lee, J. Y., Kim, D. B., … Kim, H.-L. (2011).
Development of a predictive model for type 2 diabetes mellitus using genetic and clinical
data. Osong Public Health and Research Perspectives, 2(2), 75–82.
Li, L., Liu, W., Zhang, H., Jiang, Y., Hu, X., & Liu, R. (2019). Down syndrome prediction using
a cascaded machine learning framework designed for imbalanced and feature-correlated
data. IEEE Access, 7, 97582–97593.
Li, M., Xiang, Z., Lian, Z., Xiao, L., Zhang, J., & Wei, Z. (2018). Prediction of lung motion
from four-dimensional computer tomography (4DCT) images using Bayesian registration
and trajectory modelling. IEEE Access, 6, 2803–2811.
Li, S., Xu, L., & Zhao, S. (2018). 5G internet of things: A survey. Journal of Industrial
Information Integration, 10, 1–9.
Lu, Y. (2019). Artificial intelligence A survey on evolution models applications and future
trends. Journal of Management Analytics, 6(4), 404–419.
Luo, J., Ding, P., Liang, C., Cao, B., & Chen, X. (2017). Collective prediction of disease-associ-
ated miRNAs based on transduction learning. IEEE/ACM Transactions on Computational
Biology and Bioinformatics, 14(6), 1468–1475.
Luo, Y., Shan, D. M., Ray, D., Matuszak, M., Jolly, S., Lawrence, T., … Naqa, I. E. (2019).
Development of a fully cross-validated Bayesian network approach for local control predic-
tion in lung cancer. IEEE Transactions on Radiation and Plasma Medical Sciences, 3(2), 232–
241.
Lynch, C. M., Abdollahi, B., Fuqua, J. D., de Carlo, A. R., Bartholomai, J. A., Balgemann, R.
N., … Frieboes, H. B. (2017). Prediction of lung cancer patient survival via supervised
machine learning classification techniques. International Journal of Medical Informatics,
108, 1–8.
Lynch, C. M., Berkel, V. H. V., & Frieboes, H. B. (2017). Application of unsupervised analysis
techniques to lung cancer patient data. PLoS One, 12(9), 1–18.
Ma, L., Wang, D. D., Zou, B., & Yan, H. (2017). An eigen-binding site based method for the
analysis of anti-EGFR drug resistance in lung cancer treatment. IEEE/ACM Transactions
on Computational Biology and Bioinformatics, 14(5), 1187–1194.
Majid, A., Ali, S., Iqbal, M., & Kausar, N. (2014). Prediction of human breast and colon cancers
from imbalanced data using nearest neighbor and support vector machines. Computer
Methods and Programs in Biomedicine, 113(3), 792–808.
Memarian, N., Kim, S., Dewar, S., EngelJr, J., & Staba, R. J. (2015). Multimodal data and
machine learning for surgery outcome prediction in complicated cases of mesial temporal
lobe epilepsy. Computers in Biology and Medicine, 64, 67–78.
Mohabatkar, H., Beigi, M. M., & Esmaeili, A. (2011). Prediction of GABAA receptor proteins
using the concept of chou’s pseudo-amino acid composition and support vector machine.
Journal of Theoretical Biology, 281(1), 18–23.
Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective heart disease prediction using
hybrid machine learning techniques. IEEE Access, 7, 81542–81554.
Mohebian, M. R., Marateb, H. R., Mansourian, M., AngelMañanas, M., & Mokarian, F.
(2017). A hybrid computer-aided-diagnosis system for prediction of breast cancer recurrence
30 K. Pradhan and P. Chawla

(HPBCR) using optimized ensemble learning. Computational and Structural Biotechnology


Journal, 15, 75–85.
Munsell, B. C., Wee, C. Y., Keller, S. S., Weber, B., Elger, C., da Silva, L. A. T., … Bonilha, L.
(2015). Evaluation of machine learning algorithms for treatment outcome prediction in
patients with epilepsy based on structural connectome data. NeuroImage, 118, 219–230.
Nilashi, M., binIbrahim, O., Ahmadi, H., & Shahmoradi, L. (2017). An analytical method for
diseases prediction using machine learning techniques. Computers & Chemical Engineering,
106, 212–223.
Okada, H., Hontsu, S., Miura, S., Asakawa, I., Tamamoto, T., Katayama, E., … Hasegawa, M.
(2012). Changes of tumor size and tumor contrast enhancement during radiotherapy for
Non-small-cell lung cancer May Be suggestive of treatment response. Journal of Radiation
Research, 53(2), 326–332.
Oztekin, A., Delen, D., & (James)Kong, Z. (2009). Predicting the graft survival for heart–lung
transplantation patients: An integrated data mining methodology. International Journal of
Medical Informatics, 78(12), e84–e96.
Park, S., Lee, S. J., Weiss, E., & Motai, Y. (2016). Intra- and inter-fractional variation prediction
of lung tumors using fuzzy deep learning. IEEE Journal of Translational Engineering in
Health and Medicine, 4, 1–12.
Pati, J. (2019). Gene expression analysis for early lung cancer prediction using machine learning
techniques: An eco-genomics approach. IEEE Access, 7, 4232–4238.
Perveen, S., Shahbaz, M., Keshavjee, K., & Guergachi, A. (2019). Metabolic syndrome and
development of diabetes mellitus: Predictive modeling based on machine learning techniques.
IEEE Access, 7, 1365–1375.
Petousis, P., Han, S. X., Aberle, D., & Bui, A. A. T. (2016). Prediction of lung cancer incidence
on the low-dose computed tomography arm of the national lung screening trial: A dynamic
Bayesian network. Artificial Intelligence in Medicine, 72, 42–55.
Petousis, P., Winter, A., Speier, W., Aberle, D. R., Hsu, W., & Bui, A. A. T. (2019). Using sequen-
tial decision making to improve lung cancer screening performance. IEEE Access, 7, 119403–
119419.
Prince, J., Andreotti, F., & De Vos, M. (2019). Multi-Source ensemble learning for the remote
prediction of Parkinson’s disease in the presence of source-wise missing data. IEEE
Transactions on Biomedical Engineering, 66(5), 1402–1411.
Qi, J., Yang, P., Min, G., Amft, O., Dong, F., & Xu, L. (2017). Advanced internet of things for
personalised healthcare systems: A survey. Pervasive and Mobile Computing, 41, 132–149.
Raweh, A. A., Nassef, M., & Badr, A. (2018). A hybridized feature selection and extraction
approach for enhancing cancer prediction based on DNA methylation. IEEE Access, 6,
15212–15223.
Sattlecker, M., Baker, R., Stone, N., & Bessant, C. (2011). Support vector machine ensembles for
breast cancer type prediction from mid-FTIR micro-calcification spectra. Chemometrics and
Intelligent Laboratory Systems, 107(2), 363–370.
Sedaghat, N., Fathy, M., Modarressi, M. H., & Shojaie, A. (2018). Combining supervised and
unsupervised learning for improved miRNA target prediction. IEEE/ACM Transactions on
Computational Biology and Bioinformatics, 15(5), 1594–1604.
Skoymind. (2017, April 18). Deeplearning4j deep learning framework. Retrieved from https://
deeplearning4j.org
Subasi, A. (2012). Medical decision support system for diagnosis of neuromuscular disorders
using DWT and fuzzy support vector machines. Computers in Biology and Medicine, 42(8),
806–815.
Tan, C., Chen, H., & Xia, C. (2009). Early prediction of lung cancer based on the combination
of trace element analysis in urine and an adaboost algorithm. Journal of Pharmaceutical and
Biomedical Analysis, 49(3), 746–752.
Tang, L.-J., Jiang, J.-H., Wu, H.-L., Shen, G.-L., & Yu, R.-Q. (2009). Variable selection using
probability density function similarity for support vector machine classification of high-
dimensional microarray data. Talanta, 79(2), 260–267.
Team, T. T. D., Al-Rfou, R., Alain, G., Almahairi, A., Angermueller, C., Bahdanau, D., …
Belikov, A. (2016). Theano: A python framework for fast computation of mathematical
expressions. arXiv:1605.02688.
Journal of Management Analytics 31

Tokui, S., Okuta, R., Akiba, T., Niitani, Y., Ogawa, T., Saito, S., … Vincent, H. Y. (2019).
“Chainer: A deep learning framework for accelerating the research cycle” KDD 19,
August 4–8, 2019, Anchorage, AK, USA.
Tong, D. L., & Schierz, A. C. (2011). Hybrid genetic algorithm-neural network: Feature extrac-
tion for unpreprocessed microarray data. Artificial Intelligence in Medicine, 53(1), 47–56.
Valdés-Mas, M. A., Martín-Guerrero, J. D., Rupérez, M. J., Pastor, F., Dualde, C., Monserrat,
C., & Peris-Martínez, C. (2014). A new approach based on machine learning for predicting
corneal curvature (K1) and astigmatism in patients with keratoconus after intracorneal
ring implantation. Computer Methods and Programs in Biomedicine, 116(1), 39–47.
Vásquez-Morales, G. R., Martínez-Monterrubio, S. M., Moreno-Ger, P., & Recio-García, J. A.
(2019). Explainable prediction of chronic renal disease in the Colombian population using
neural networks and case-based reasoning. IEEE Access, 7, 152900–152910.
Wang, H., Cui, Z., Chen, Y., Avidan, M., Abdallah, A. B., & Kronzer, A. (2018). Predicting
hospital readmission via cost-sensitive deep learning. IEEE/ACM Transactions on
Computational Biology and Bioinformatics, 15(6), 1968–1978.
Wang, Q., Cao, W., Guo, J., Ren, J., Cheng, Y., & Davis, D. N. (2019). DMP_MI: An effective
diabetes mellitus classification algorithm on imbalanced data With missing values. IEEE
Access, 7, 102232–102238.
Wu, J., Lian, C., Ruan, S., Mazur, T. R., Mutic, S., Anastasio, M. A., … Li, H. (2019).
Treatment outcome prediction for cancer patients based on radiomics and belief function
theory. IEEE Transactions on Radiation and Plasma Medical Sciences, 3(2), 216–224.
Xu, L., He, W., & Li, S. (2014). Internet of things in industries: A survey. IEEE Transactions on
Industrial Informatics, 10(4), 2233–2243.
Xu, B., Xu, L., Cai, H., Xie, C., Hu, J., & Bu, F. (2014). Ubiquitous data accessing method in
IoT-based information system for emergency medical services. IEEE Transactions on
Industrial Informatics, 10(2), 1578–1586.
Yang, P., & Xu, L. (2018). The Internet of Things (IoT): Informatics methods for IoT-enabled
health care. Journal of Biomedical Informatics, 87, 154–156.
Yin, Y., Zeng, Y., Chen, X., & Fan, Y. (2016). The internet of things in healthcare: An overview.
Journal of Industrial Information Integration, 1, 3–13.
Yoon, H., & Li, J. (2019). A novel positive transfer learning approach for telemonitoring of
Parkinson’s disease. IEEE Transactions on Automation Science and Engineering, 16(1),
180–191.
Yu, H., Ni, J., Dan, Y., & Xu, S. (2012). Mining and integrating reliable decision rules for imbal-
anced cancer gene expression data sets. Tsinghua Science and Technology, 17(6), 666–673.
Yuan, R., Li, Z., Guan, X., & Xu, L. (2010). An SVM-based machine learning method for accu-
rate internet traffic classification. Information Systems Frontiers, 12(2), 149–156.
Zamani, A., Rezaeieh, S. A., & Abbosh, A. M. (2015). Lung cancer detection using frequency-
domain microwave imaging. Electronics Letters, 51(10), 740–741.
Zhang, B., Qi, S., Monkam, P., Li, C., Yang, F., Yao, Y.-D., & Qian, W. (2019). Ensemble lear-
ners of multiple deep CNNs for pulmonary nodules classification using CT images. IEEE
Access, 7, 110358–110371.
Zhang, B., Ren, J., Cheng, Y., Wang, B., & Wei, Z. (2019). Health data driven on continuous
blood pressure prediction based on gradient boosting decision tree algorithm. IEEE
Access, 7, 32423–32433.
Zhang, J., Lafta, R. L., Tao, X., Li, Y., Chen, F., Luo, Y., & Zhu, X. (2017). Coupling a fast
Fourier transformation With a machine learning ensemble model to support recommen-
dations for heart disease patients in a telehealth environment. IEEE Access, 5, 10674–10685.
Zhong, H., & Song, M. (2019). A fast exact functional test for directional association and cancer
biology applications. IEEE/ACM Transactions on Computational Biology and Bioinformatics,
16(3), 818–826.
Zhong, W., Chow, R., & He, J. (2012). Clinical charge profiles prediction for patients diagnosed
with chronic diseases using multi-level support vector machine. Expert Systems with
Applications, 39(1), 1474–1483.
Zięba, M., Tomczak, J. M., Lubicz, M., & Świątek, J. (2014). Boosted SVM for extracting rules
from imbalanced data in application to prediction of the post-operative life expectancy in the
lung cancer patients. Applied Soft Computing, 14, 99–108.
32 K. Pradhan and P. Chawla

Appendix

Abbreviations Descriptions
SVM Support Vector Machine
LR Logistic Regression
NB Naïve Bayes
NN Neural Network
LKT-SVM SVM with Local Kernal Transform
L1-SVMR L1-SVM with Reject option
RF Random Forest
DT Decision Tree
KNN K-Nearest Neighbour
DBN Dynamic Bayesian Network
ELM Extreme Machine Learning
CNN Convolutional Neural Network
MLP Multi Layer Perceptron
CSDNN Cost-Sensitive Deep Neural Network
GANN Genetic Algorithm-Neural Network
CVIFLR Cascaded framework of Voting Isolation Forests
and Logistic Regression
GBDT Gradient Boosting Decision Tree
HRFLM Hybrid Random Forest with Linear Model
SVST Support Vector Sampling Technique
RS-SVM Rough Set SVM
LFDA-SVM Local Fisher Discriminant Analysis-SVM
MLSVM Multi-Level SVM
MTD-SVM Mega-Trend Diffusion
LS-SVM Least Squares-SVM
IoT Internet of Things
AUC Area Under Curve
ROC Receiver Operating Curve
NLST National Lung Screening Trial
IoT Internet of Things
AFS Analysis-of-Variance-based Feature Set
GEP Gene Expression Programming
GBM Gradient Boosting Machines
POMDP Partially-Observable Markov Decision Process
WONN-MLB Weight Optimized Neural Network with Maximum Likelihood Boosting
MLMR Maximum Likelihood and Minimum Redundancy
FPR False Positive Rate
BMI Body Mass Index
LM Levenberg–Marquardt
SCG Scaled Conjugate Gradient
BFGS Broyden-Fletcher-Goldfarb-Shanno
OVA-SVM One-Versus-All SVM
RS Rough Set
RS-SVM RS -based SVM
BPNN Back-Propagation Neural Networking
FTIR Fourier Transform Infrared
PSO Particle Swarm Optimization
Journal of Management Analytics 33

HPBCR Hybrid Predictor of Breast Cancer Recurrence


CDSS Clinical Decision Support System
PBL-McRBFN Projection Based Learning for Meta-Cognitive
Radial Basis Function Network
FPG Fasting Plasma Glucose
AU-ROC Area Under Receiver Operating Characteristic
MTD Mega-Trend Diffusion
TLE Temporal Lobe Epilepsy
MTLE Mesial Temporal Lebo Epilepsy
CKD Chronic Kidney Disease
HW Hypertriglyceridemic Waist
TG Triglyceride
CPTL Collective Prediction based on Transduction Learning
CART Classification and Regression Trees
WSS Wall Shear Stress
AAA Abdominal Aortic Aneurysm
MLR Multi-variate Linear Regression
FFT Multi-variate Linear Regression
CSA Clonal Selection Algorithm
HRFLM Hybrid Random Forest with Linear Model
DPM Disease Prediction Model
IForest isolation Forest
SMOTETomek Synthetic Minority Oversampling Technique Tomek link
DS Down Syndrome
SCD Sudden Cardiac Death
TL Transfer Learning
PTL Positive TL
DMP_MI Diabetes Mellitus Classification on Imbalanced
information by Missing Values
ADASYN Adaptive Synthetic Sampling approach
MetS Metabolic Syndrome
AUC Area Under Curve
ROC Receiver Operating Curve
TPR True Positive Rate
FPR False Positive Rate
RMSE Root Mean Square Error
MAE Mean Absolute Error
PPV Positive Predictive Value
NPV Negative Predictive Value
PRC Precision and Recall Curve
MCC Matthews Correlation Coefficient
TNR True Negative Rate
MSE Mean Square Error
GA Genetic Algorithm
SMO Sequential Minimal Optimization

You might also like