Machine Learning in Healthcare Data Analysis A Survey
Machine Learning in Healthcare Data Analysis A Survey
Copyright ©2019 Arwinder Dhillon et al. This is an open access paper distributed under the Creative Commons Attribution License.
Journal of Biology and Today’s World is published by Lexis Publisher; Journal p-ISSN 2476-5376; Journal e-ISSN 2322-3308.
Abstract
In recent years, healthcare data analysis is becoming one of the most promising research areas. Healthcare includes data in various types
such as clinical data, Omics data, and Sensor data. Clinical data includes electronic health records which store patient records collected during
ongoing treatment. Omics data is one of the high dimensional data comprising genome, transcriptome and proteome data types. Sensor data
is collected from various wearable and wireless sensor devices. To handle this raw data manually is very difficult. For analysis of data, machine
learning is emerged as a significant tool. Machine learning uses various statistical techniques and advanced algorithms to predict the results
of healthcare data more precisely. In machine learning different types of algorithms like supervised, unsupervised and reinforcement are used
for analysis. In this paper, different types of machine learning algorithms are described. Then use of machine learning algorithms for analyzing
various healthcare data are surveyed..
KEYWORDS: Healthcare, Machine Learning, Clinical Data, Sensor Data, Omics Data.
1
J. Biol. Today's World. 2019 Jan; 8 (2): 1-10
play strategic games like chess. It is the mechanism of making in unsupervised learning are K-mean clustering, Association
machines to learn automatically without being explicitly Rule Mining, Topic Modeling and Dimensionality Reduction
programmed. The main focus of Machine Learning is to Techniques [3].
develop a computer program which can access the data and
Semi-supervised learning: As supervised learning works
use this data for learning purpose. It is the ability of machine
on labeled data and unsupervised learning on unlabeled data,
to make use of statistical techniques and advanced algorithms
then a lot of information is lost from labeled data which can be
to make more powerful prediction and making the data driven
obtained from unlabeled data. So, in this case semi-supervised
system more powerful by replacing the rule-based system.
learning comes to mind. It is a mixture of supervised and
The main component of machine learning is data which is the
unsupervised learning in which it takes both the unlabeled
backbone for any model. The more relevant data is the more
and labeled data. Labeled data should be of shorter length as
accurate predictions are. After data, we need to select the
compared to unlabeled data. The idea behind semi-supervised
algorithm based on the problem for more accurate predictions.
learning is that there is a considerable change in performance
Machine Learning can be used in many fields such as finance,
when both labeled and unlabeled data is used in conjunction.
retail, health care and social data [3].
The training set used is of shorter length. It is normally used
Types of machine learning algorithms to detect outliers.
Machine learning can be used for different purpose. Reinforcement learning: Reinforcement Learning works
Machine learning algorithms are basically classified into three by developing a system which improves its performance by
categories based on their objective which varies from each taking feedback from the environment and taking possible
other. It includes supervised learning, unsupervised learning steps to improve them. It is an act of learning from environment
and reinforcement learning. by interacting with it without any help from humans. It is an
iterative process.
Supervised learning: Supervised learning involves
training the model on the labeled data and uses this trained The different types of machine learning algorithms and
model to make predictions on the new data. It involves their applications are shown in Figure 2 above.
splitting of data into two sets including training set and testing
Related surveys
set. First the model is trained on training set and afterwards
the performance is tested on the testing set. The performance As healthcare is emerging now days, researchers are
of the model can be evaluated using performance metrics focusing on types of data used for prediction. For Example,
[4]. Supervised learning can be classification problem or Ajay et al. focus on clinical and genomic data and used machine
regression problem. In supervised classification, the labeled learning algorithms to analyze them. But other data types are
value is a discrete value. The algorithms in this are used to also present to work upon including sensor and Omics data.
classify to which class or category the problem belongs. On The prime motive of our survey is to include all types of data
the other side, the models are used to predict the outcome and analyze them using machine learning. It is described in the
based on continuous (numeric) data is supervised regression Table 1 as follows.
learning [4]. For the classification of raw data, first the data Paper organization
is selected and then preprocessing is performed in which all
NA values are removed. Then the data is normalized using z- Section 2 presents different type of data used by authors
score or min max normalization. Once the normalization is
performed feature selection procedure is applied to select the
best features. After the features are selected, some supervised
ML algorithms includes K Nearest neighbor, Decision trees,
Support Vector Machines, Naïve Bays Classifier, Neural
Network and Ensemble methods [3] are used for classification
of raw data as shown in Figure 2.
Unsupervised learning: Unsupervised Learning also
involves training of the data except for the fact that the labeled
value or target value is not known. In this, machine try to
cluster the similar type of the data by finding the hidden pattern.
Rather than making prediction, the main aim of unsupervised
learning is to discover the patterns. The performance of the
model in unsupervised learning cannot be evaluated as the
label value is absent or unknown. The algorithms involved
Figure 2: Types of Machine Learning algorithm.
2
J. Biol. Today's World. 2019 Jan; 8 (2): 1-10
for diagnosis and prevention of certain kind of disease and Wengert et al. [5] proposed ML algorithms for early prediction
their work done for achieving it. Section 3 shows conclusion of pathological complete response (pcr) to neoadjuvant
achieved from the related surveys. chemotherapy and survival outcome of breast cancer patients
using Multiparametric Magnetic Resonance Imaging (mpMRI)
RESULTS AND DISCUSSION data. Samples of 38 women with breast cancer were taken
Healthcare analysis using ML and eight classifiers including linear support vector machine,
linear discriminant analysis, logistic regression, random
As in healthcare sector, there is enormous information forests, stochastic gradient descent, adaptive boosting and
about the patient health. So it is impossible for humans to Extreme Gradient Boosting (XGBoost) were applied to rank
process it. Consequently, ML provides a technique to recognize the features for pcr including residual cancer burden (RCB),
patterns from the massive data and use algorithms to predict Recurrence Free Survival (RFS) and disease-specific survival
future outcome of the patients. ML in healthcare helps users to DSS. Area under Curve value was extracted for each feature
perceive understanding about the potency of existing programs of pcr. From the experimental results, XGBoost produces
and identify the treatment that provides best result for patients the best result with higher accuracy for RCB and DSS and
according to their condition. logistic regression for RSS as compared to other classifiers.
Types of healthcare data Dagli et al. [6] defined multilevel perception model for two
year survival prediction of non-small cell lung cancer patients.
Different types of data have come into view in healthcare Samples of 559 patients were taken and attributes were ranked
now days including clinical data, sensor data, Omics data and with RelifF feature selection method. From results, Multilayer
so on. This type of data includes different mining methods to Neural Network was found as the best prediction model with
extract the more relevant features and then different algorithms area under curve value of 0.75. Kayal et al. [7] proposed new
needs to be trained for better future prediction. improved classification approach for survival prediction of
Hepatocellular Carcinoma (HCC) patients. Samples of 165
Clinical data: Clinical data is the data which is collected
patients were taken from which authors defined that out of 49
during the ongoing treatment of the patient including the
risk factors, 15 risk factors were responsible for HCC. The
Electronic Health Record (EHR) data which is comprised of
outcome of the experiment proved that the accuracy obtained
laboratory tests, radiology images, allergies and so on (Figure
by Deep Neural Network is significantly higher than Cox
3). The work on clinical is applied by following authors. models (SVM) and Unsupervised model (KNN).
3
J. Biol. Today's World. 2019 Jan; 8 (2): 1-10
Omics Sensor
Authors Clinical
Genomic Transcriptomic Proteomic
This Survey ✓ ✓ ✓ ✓ ✓
Zheng et al. [8] proposed a framework to identify Type- regression model who survived for one, two or five years and
2 Diabetes Mellitus (T2DM) patients using Electronic Health then the patient who died after five years were excluded and
Record (EHR) data. A total of 300 patient samples were different machine learning models comprising random forest,
taken and 114 features were extracted on which different logistic regression, support vector regression, decision tree
machine learning algorithms including k-Nearest Neighbor and ada boost were applied on the remaining patients. From
(kNN), Random Forest (RF), Decision Tree (DT), naïve experiment results, logistic regression performed best with
bayes, Support Vector Machine (SVM) and logistic regression 11% improvement in AUV curve value. Stephen H.
were applied. From results, SVM produces the best result
Weng et al. [14] defined machine learning algorithms
with accuracy of 96%. Sumei et al. [9] developed computer
including random forest, logistic regression, gradient boosting
assistant classification method by combining convolution MRI
machines and neural networks on samples of 378,256
and profusion MRI data for diagnosis of different type of brain
patients for the prediction of cardiovascular risk. After data
tumor and for grading of gliomas. Samples of 102 brain tumor
was prepared and features were extracted. Authors applied
patient were taken and Support vector machine recursive
the different machine learning algorithms and identified that
feature elimination (SVM-RFE), k-nearest neighbor and linear
neural network performed best with AUC value of 0.72 as
discriminant analysis were applied to them. The result showed
shown in Table 2.
that SVM RFE produced the best result with accuracy of 85%
for classification of tumor and 88% for grading the gliomas. Sensor data: Data elements produced by sensors including
Kristin et al. [10] defined different machine learning algorithms time series signals which is an ordered sequence of pairs is
including penalized logistic regression, random forest models, sensor data. These data elements are processed by computing
and extreme gradient boosted decision trees for identification devices and can be simple numerical or categorical value or can
of high-risk surgical patients. Authors trained the algorithms be more complex data. The work on sensor data is applied by
on Pythia data containing electronic health records having following authors. Luca et al. [15] proposed machine learning
194 clinical features including patient demographics, smoking algorithms to detect Parkinson’s disease (PD) by using data
status, medications, comorbidities, procedure information, streams collected from wearable sensors.
and proxies for surgical patients. The experimental results
Experiment was performed on 20 individuals and
show that the best result was produced by penalized logistic
movement of individual was recorded by 6 wearable sensors.
regression model with AUC value of 0.924. Andrew et al.
Total 13 tasks were performed by individuals and experiment
[11] investigated five machine learning algorithms comprising was conducted on one day and was repeated 2 weeks later.
penalized logistic regression, gradient boosting machine, From this a total of 41,802 data clips were used. After the data
artificial neural network with a single hidden layer, linear was trained using convolutional neural networks and random
support vector machine and random forest for delirium risk forest classifier for the detection of bradykinesia and tremor.
prediction based on electronic health record data. A total Results proved that random forest classifier performed better
of 18223 patient samples were taken and experiment was with AUROC value of 0.73 for the detection of bradykinesia
performed. From results, it was proved that gradient boosting and 0.79 for the detection of tremor.
algorithm produced the best result with AUC value of 0.855.
Fatemeh et al. [12] proposed machine learning models for first David et al. [16] defined machine learning classification
emergency admission prediction based on EHR data. Authors algorithms for detection of risk of developmental arrays (AD)
applied Cox model on a sample of 4.6 million patient samples and Typical Development (TD) in infants. Long day inertial
for prediction of risk for first emergency admission and then movement of infants were recorded using Opal sensors fixed on
random forest and gradient boosting algorithm were used. the ankle of the infant and data was divided into two sets, 0 to
Authors identified that gbm model performed best with AUC 6 months and 6 to 12 months. A total of 19 movement features
value of 0.779. Maryam et al. [13] investigated Seattle heart including movement count, duration, average acceleration
failure model for the prediction of heart failure by using EHR and peak acceleration from two sets were extracted using
data. Samples of 5044 patient were taken and features were univariate feature selection methods which were Recursive
extracted to calculate the survival score. Authors first calculated Feature Elimination (RFE), and stepwise feature selection.
the survival score of heart patients with Cox proportional Authors used three machine learning algorithms support vector
4
J. Biol. Today's World. 2019 Jan; 8 (2): 1-10
machine, logistic regression and adaboost for prediction and attached to the trunk for detection of assistance motion with
the outcome of the result proved that SVM performed best for different foot. A total of 8 Flexiforce sensors were attached
0-6 month infants with accuracy of 90% and adaboost for 6-12 to the sole. 5 people were asked to perform the experiment
month infant with accuracy of 83%. Sota et al. [17] proposed and were asked to perform in two variations including short
shoe-type pressure sensor and single inertial measurement unit step and long step. Features were extracted from the data
5
J. Biol. Today's World. 2019 Jan; 8 (2): 1-10
obtained and were trained using classification method. Network and Long Short-term Memory Recurrent Neural
Experimental results show that proposed system performed Network (CNN-LSTM) for emotion detection using smart
best with accuracy of 90%. Prabhjot et al. [18] investigated phones and wearable sensor devices data. Sample of 40 female
hybrid approach comprising Bayesian network and heuristic patients were taken from which 550,432 sensor data values
technique in neural network for stress detection using mobile were collected comprising of on-body data, environmental data
phone sensing mechanism by measuring the Blood Pressure and self-report emotion level data captured using mobile phone
Management (BPM) and Heart Rate (HR) value. Data was app. Then data was preprocessed and trained using hybrid
collected using sensors embedded in mobile phones and hybrid CNN-LSTM model for emotion detection. The outcome of
approach was applied to detect stress using BPM values and the result proved that the proposed hybrid approach performed
HR values as shown in Table 3. From result, hybrid approach best with accuracy of 95%. Diana et al. [21] investigated four
performed well with accuracy of 92.86% for BPM and 85.71% machine learning classifier including decision trees, ensemble,
for HR. Shamsul et al. [19] proposed Deep-belief network for logistic regression and Deepnets for the detection of fall in
recognition of human activity using data from body sensors. elderly people using 3D-axis accelerometer fitted in 6lowPAN
Sensor data was collected and important feature were extracted wearable device. The accelerometer reading was collected and
using Kernel Principle Component Analysis (KPCA) and feature was extracted with sliding window technique. Fall was
Linear Discriminant Analysis (LDA). Then, the model was detected using machine learning classifiers and the outcome
trained using deep-belief network with 40 hidden layers. From of result proved that ensemble algorithm performs best with
results, it is cleared that deep belief network performed best accuracy of 94%. Jessica et al. [22] proposed 90 second fear
for activity detection with an accuracy of 97.5%. Elimen et al. induction task to measure the motion of participant using a
[20] defined hybrid approach comprising Convolutional Neural wearable sensor for the detection of anxiety and depression
6
J. Biol. Today's World. 2019 Jan; 8 (2): 1-10
among young children. Samples of 64 children were taken and algorithm. From experimental results, it was proved that deep
they were subjected to 20 second potential threat phase. Data learning algorithm performed best with AUC value of 0.93.
was collected after 20 second threat phase and features were
Transcriptomic data: Transcriptomic data is a collection
extracted from sensor data. Authors then subjected the data to
of multiple mRNA transcripts data within a biological sample.
k-nearest neighbor model and proved that the proposed model
These samples are analyzed and extracted to generate different
produced best result with an accuracy of 75%.
datasets. The work on transcriptomic data is applied by
Omics data: Omics data is collection of huge amount of following authors. Carly et al. [27] proposed a framework to
complex and high dimensional data consisting of genomic, integrate multiple gene expression datasets to identify gene
transcriptomic and proteomics data. Handling this type data signatures for the diagnosis of tuberculosis. Samples of 1164
required various techniques including machine learning patients were taken by integrating 4 datasets. Features were
algorithms. extracted and machine learning algorithms including random
forest, support vector machine with polynomial kernel and
Genomic data: Genomic data is collection of gene
Partial least square discriminant analysis applied and results
expression, copy number variation, sequence number and
were evaluated. From results, it was proved that random forest
DNA data and is used in bioinformatics. The work on
performed best with an accuracy of 95%. Suhas et al. [28]
genomic data is applied by following authors. Patrick et al.
proposed a hybrid approach comprising a deep unsupervised
[23] proposed machine learning algorithms for improving
single cell clustering which integrates the feature generated by
hazard characterization in microbial risk assessment. Because
deep learning model for profiling of single-cell RNAsequencing
of high dimensionality of genomics data, authors defined ML
data. Samples were taken and features were extracted. Model
based predictive risk modelling for risk assessment. Dataset
was trained and the proposed model performed the best result
related to DNA isolation and sequencing were collected
with accuracy of 96%. Marin et al. [29] investigated machine
and feature extraction was performed to extract the relevant
learning algorithm for tracking age related changes of human
features. Machine Learning classifiers including random
muscle skeleton on transcriptomic data. Gene-expression
forest, support vector machine, logic boost were applied
profiles of donor were analyzed to compare signatures of old
and results were evaluated. From results, it was proved that
and young donors. Machine learning algorithm comprising
logic boost performed best with an accuracy of 75%. Yaron
neural network was applied on signature data which built a
et al. [24] proposed DeepGestalt, a deep learning framework
biomarker for aging. The outcome of the result proved that
for identification of facial phenotypes of genetic disorders.
proposed technique produced best result with accuracy of 80%.
Samples of 17000 patients with 200 syndromes were taken.
Features were extracted and DeepGestalt was applied in which Proteomic data: Proteomic data is a collection of proteins
face detection was done using deep convolutional neural expressed in the form cell, tissue or an organism. It is the
network (DCNN) and then image is normalized and cropped representation of actual functional molecules in the cell. The
into different segments which is then converted to grey scale. work on proteomic data is applied by following authors.
After Gestalt model was trained and predict the syndrome Christine et al. [30] proposed deep learning algorithms for
with 91% accuracy. Marcus et al. [25] investigated machine the analysis of FLT3-ITD in acute leukemia patients. Samples
learning algorithm XGBoost for prediction of minimum of 191 patients with protein data were taken which have
antimicrobial concentration among patients. Samples of 5278 serum level of 231 patients. Deep learning with stacked auto-
non-typhoidal Salmonella genomes were collected. Short read encoders was used and dimensionality reduction reduces the
sequenced data was collected for each strain with genome proteins from 291 to 20. From results, it was proved that the
assembled service and XGBoost was applied which used proposed model performed best with accuracy of 97% as
gradient boosting ensemble method to reduce the error. The shown in Table 4. Babita et al. [31] proposed a hybrid space
outcome of the result proved that XGBoost produced best for the prediction of protein structure class. A hybrid approach
result with accuracy of 95%. Kumardeep et al. [26] defined including SkipGram based word2hovac and Atchleys space
deep learning model and six machine learning algorithms II, III, IV for electron ion interaction were applied for amino
comprising random forest, support vector machine, linear acid sequence representation [32]. For feature extraction of
discriminant analysis, prediction analysis for microarrays, time and frequency domain, Stockwell transformation was
recursive partitioning and regression trees and generalized applied. It was applied on six datasets including small sized
boosting model for prediction of estrogen receptor status in samples comprising 498, 277 and 204 and large sized samples
breast cancer patients based on metabolomics data. Samples of comprising PDB25, 640 and FC699. Deep recurrent neural
271 patients were taken in which 204 patients are with positive network was used for classification. The result proved that
estrogen receptor and 67 with negative receptor. K-nearest proposed approach performed best with accuracies of 95.9%,
neighbor was used for normalization of data. The normalized 94.9%, 85.36%, 84.2%, 94.3% and 93.1% for both small sized
data was trained using machine learning and deep learning and large sized datasets.
7
J. Biol. Today's World. 2019 Jan; 8 (2): 1-10
discriminant analysis,
learning model
Deep unsupervised
Proposed technique produced
Suhas Srinivasan single
Single-cellRNA best result with accuracy of
Accuracy
sequencing data
et al. [28] cell clustering
95.
Accuracy,
Deep feature selection
pearson
Random Forest,Support Proposed technique can
Marin Volosniko
transcriptomic produced best result with
correlation,
vector machine, Elastic be applied to further
vaet al.[29]
data net, Deep feature accuracy of 95,0.96,0.92 and
selection coefficient of determination
disease prognosis
mean average
5.6.
error
8
J. Biol. Today's World. 2019 Jan; 8 (2): 1-10
survey, it is concluded that for analyzing different types of 10. Corey KM, Kashyap S, Lorenzi E, Lagoo-Deenadayalan SA, Heller K,
data in healthcare, various machine learning algorithms and Whalen K, et al. Development and validation of machine learning models
to identify high-risk surgical patients using automatically curated electronic
feature extraction techniques are proposed by various authors health record data (Pythia): A retrospective, single-site study. PLoS Med.
for survival prediction of cancer patients. 2018;15(11):e1002701.
11. Wong A, Young AT, Liang AS, Gonzales R, Douglas VC, Hadley D.
ACKNOWLEDGEMENT Development and validation of an electronic health record–based machine
learning model to estimate delirium risk in newly hospitalized patients without
I am thankful to Dr. Ashima Singh for her assistance in known cognitive impairment. JAMA Netw Open. 2018;1(4):e181018.
preparing article 12. Rahimian F, Salimi-Khorshidi G, Payberah AH, Tran J, Solares RA, Raimondi
F, et al. Predicting the risk of emergency admission with machine learning:
AUTHORS CONTRIBUTION Development and validation using linked electronic health records. PLoS
Med. 2018;15(11): e1002695.
Arwinder Dhillon designed the study, contributed to the
13. Panahiazar M, Taslimitehrani V, Pereira N, Pathak J. Using EHRs and
study, contributed in analyzing the data and also wrote the machine learning for heart failure survival analysis. Stud Health Technol
paper. Dr. Ashima Singh read, help in required changes and Inform. 2015; 216:240.
approved the final manuscript. 14. Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine-learning
improve cardiovascular risk prediction using routine clinical data? PloS One.
CONFLICT OF INTEREST 2017;12(4): e0174944.
The authors declare no potential conflicts of interests with 15. Lonini L, Dai A, Shawen N, Simuni T, Poon C, Shimanovich L, et al. Wearable
sensors for Parkinson’s disease: which data are worth collecting for training
respect to the authorship and/or publication of this paper. symptom detection models. Npj Digit Med. 2018;1:64.
REFERENCES 16. Goodfellow D, Zhi R, Funke R, Pulido JC, Mataric M, Smith BA. Predicting
Infant Motor Development Status using Day Long Movement Data from
1. Raheja K, Dubey A, Chawda R. Data analysis and its importance in health Wearable Sensors. arXiv preprint. 2018; arXiv:1807.02617.
care. Int Computer Trends and Technology J. 2018;48:176-180.
17. Kitagawa K, Uezono T, Nagasaki T, Nakano S, Wada C. Classification Method of
2. Bisaso KR, Anguzu GT, Karungi SA, Kiragga A, Castelnuovo B. A survey of Assistance Motions for Standing-up with Different Foot Anteroposterior Positions
machine learning applications in HIV clinical research and care. Comput Biol using Wearable Sensors. 2018 International Conference on Information and
Med. 2017;91:366-371. Communication Technology Robotics (ICT-ROBOT). 2018;1-3.
3. Alpaydin E. Introduction to Machine Learning. MIT press; 2009. 18. Kaur P, Malhotra S. Improved SLReduct Framework for Stress Detection
4. Linthicum KP, Schafer KM, Ribeiro JD. Machine learning in suicide science: Using Mobile Phone-Sensing Mechanism in Wireless Sensor Network.
Applications and ethics. Behav Sci Law. 2019;37(3):214-222 Advanced Computing and Intelligent Engineering. 2019;499-507.
5. Tahmassebi A, Wengert GJ, Helbich TH, Bago-Horvath Z, Alaei S, Bartsch R, 19. Hassan MM, Huda S, Uddin MZ, Almogren A, Alrubaian M. Human activity
et al. Impact of machine learning with multiparametric magnetic resonance recognition from body sensor data using deep learning. J of med syst.
imaging of the breast for early prediction of response to neoadjuvant 2018;42(6):99.
chemotherapy and survival outcomes in breast cancer patients. Invest
20. Kanjo E, Younis EM, Ang CS. Deep learning analysis of mobile physiological,
9
J. Biol. Today's World. 2019 Jan; 8 (2): 1-10
environmental and location sensor data for emotion detection. Information estrogen receptor status in breast cancer metabolomics data. J Proteome
Fusion. 2019;49:46-56. Res. 2017;17(1):337-347.
21. Yacchirema D, de Puga JS, Palau C, Esteve M. Fall detection system 27. Bobak CA, Titus AJ, Hill JE. Comparison of common machine learning
for elderly people using IoT and Big Data. Procedia computer science. models for classification of tuberculosis using transcriptional biomarkers from
2018;130:603-610. integrated datasets. Applied Soft Computing. 2019;74:264-273.
22. McGinnis RS, McGinnis EW, Hruschak J, Lopez-Duran NL, Fitzgerald K,
28. Srinivasan S, Johnson NT, Korkin D. A hybrid deep clustering approach
Rosenblum KL, et al. Wearable sensors and machine learning diagnose
anxiety and depression in young children. 2018 IEEE EMBS International for robust cell type profiling using single-cell RNA-seq data. BioRxiv.
Conference on Biomedical & Health Informatics (BHI). 2018;410-413. 2019:511626.
23. Njage PM, Leekitcharoenphon P, Hald T. Improving hazard characterization 29. Mamoshina P, Volosnikova M, Ozerov IV, Putin E, Skibina E, Cortese F, et
in microbial risk assessment using next generation sequencing data and al. Machine learning on human muscle transcriptomic data for biomarker
machine learning: Predicting clinical outcomes in shigatoxigenic Escherichia discovery and tissue-specific drug target identification. Front Genet.
coli. Int J Food Microbiol. 2019;292:72-82. 2018;9:242.
24. Gurovich Y, Hanani Y, Bar O, Nadav G, Fleischer N, Gelbman D, et al. 30. Liang CA, Chen L, Wahed A, Nguyen AN. Proteomics analysis of FLT3-ITD
Identifying facial phenotypes of genetic disorders using deep learning. Nat mutation in acute myeloid leukemia using deep learning neural network. Ann
Med. 2019;25(1):60. Clin Lab Sci. 2019;49(1):119-126.
25. Nguyen M, Long SW, McDermott PF, Olsen RJ, Olson R, Stevens RL, et
31. Panda B, Majhi B. A novel improved prediction of protein structural class
al. Using machine learning to predict antimicrobial minimum inhibitory
using deep recurrent neural network. Evolutionary Intelligence. 2018;1-8.
concentrations and associated genomic features for nontyphoidal Salmonella.
J Clin Microbiol. 2018;57(2). 32. Ajay K, Sushil R, Tiwari A. cancer survival analysis using Machine Learning.
26. Alakwaa FM, Chaudhary K, Garmire LX. Deep learning accurately predicts 2019;26-28.
10