A Systematic Review
A Systematic Review
https://doi.org/10.1007/s11831-021-09556-z
SURVEY ARTICLE
Abstract
Advancement in genome sequencing technology has empowered researchers to think beyond their imagination. Researchers
are trying their hard to fight against various genetic diseases such as cancer. Artificial intelligence has empowered research
in the healthcare sector. The availability of open-source healthcare datasets has motivated the researchers to develop applica-
tions which helps in early diagnosis and prognosis of diseases. Further, Next-generation sequencing has helped to look into
detailed intricacies of biological systems. It has provided an efficient and cost-effective approach with higher accuracy. The
advent of microRNAs also known as small noncoding genes has begun the paradigm shift in oncological research. We are
now able to profile expression profiles of RNAs using RNA-seq data. microRNA profiling has helped in uncovering their
relationship in various genetic and biological processes. Here in this paper, we present a review of the machine learning
perspective in cancer research. The best way to develop effective cancer treatment/drugs is to better understand the intricacies
and complexities involved in the cancer microenvironment. Although there has been a plethora of methods and techniques
proposed in the literature, still the deadliness of cancer can’t be reduced. In such a situation Artificial intelligence (AI) or
machine learning is providing a reliable, fast, and efficient way to deal with such stringent diseases.
Bioinformatics is playing a critical role in fighting against Availability of a huge amount of oncological and pharma-
various severe diseases such as cancer, diabetics, Alzhei- cogenomics online data sources has boosted the research in
mer’s, etc.. Cancer is caused as a result of mutations and this field. Unlike traditional statistical and computational
variations in the genetic microenvironment of an individual. approaches, bioinformaticians are using machine learning
There is huge amount of complexity in cancer microenviron- techniques to improve the treatment options in genetic dis-
ment which results in treatment difficulty. Even if patients eases. Cells are the basic building block of all living organ-
have same type of cancer still they will response differently isms. There are variety of cells available in the human body
towards same type of therapy. Clinical trials and the tradi- such as blood cells, muscle cells, fat cells, etc. Genes are
tional drug discovery process is a time demanding and tedi- responsible for variation in these cells. Gene helps to carry
ous task. Hence, researchers are trying their hard to design heredity information and is responsible for various physical
optimal treatment options for such stringent diseases. and functional processes in the body. Genes are responsible
for heterogeneity in genotype and phenotype traits among
species. All the information regarding the inheritance of
phenotypic traits is carried by genes. Overall if one wants to
fight against genetic disease then their root cause i.e. genes
need to be studied. Advancement in computational biology
* Aman Sharma
[email protected]; [email protected] and high throughput sequencing is helping to find biomark-
ers (genes) that are responsible for various diseases.
1
CS/IT, Jaypee University of Information Technology, Solan, Further, chip technology in healthcare is considered the
H.P, India future of the healthcare industry which also provided lab-
2
Computer Science and Engineering Department, Thapar on-a-chip devices. These chips help in proper diagnosis
Institute of Engineering and Technology, Patiala 147001, and prognosis of patients based on their genetic profiles.
India
13
Vol.:(0123456789)
A. Sharma, R. Rani
Various researchers are trying hard to find gene or gene set Figure 2 shows the omic data used for machine learning
that are causing genetic diseases. Microarray technology modeling. Cancer is a complex genetic disease involving
helps to measure the gene expression levels of a particu- various subtypes. There is a need to develop computational
lar micro-environment. Along with gene expression data, approaches that could aid in the treatment of tumor sub-
we can collect (genome, transcriptome, and proteome) types. Over the past decade, oncological research has gained
data such as copy number variations, gene mutation, etc.. serious attention and researchers are trying to personalize
Gene expression, drug response data is extensively used in treatment therapies for cancer patients [1]. Apart from bio-
identifying anti-cancer drugs, drug targets, and biomarkers. marker identification researchers are also working for devel-
Some researchers are working to explore various biological oping computational (in-silico) models/algorithms that can
pathways corresponding to genetic diseases. The ratio of predict disease-specific drug responses, drug synergy, and
the expression level of an individual gene under two vari- drug-target interactions.
able conditions, obtained by DNA microarray hybridiza- Many researchers are using machine learning algo-
tion is called gene expression value. The quantity of mRNA rithms to solve biological research problems. The super-
released by gene determines the gene expression value of the vised machine learning method is divided into three stages:
individual gene. This quantity may vary based on external learning, training, testing. In the learning phase, the machine
stimuli. mRNA helps to carry the information from the genes learning algorithm is developed. In the training phase, a
about protein synthesis. Gene expression data has enormous large amount of data is fed to the machine learning model
potential in biological research. It can help to identify the to help it in making generalized rules out of it. In the testing
genomic reason behind the occurrence of the physical pro- phase, new data is fed to test the accuracy of the model pre-
cess. Disease biomarkers can be identified with the help of diction. Whereas, in unsupervised learning, data points are
differentiating genomic traits. Genomic assays of MNase- given but no labels are provided. The problem is to partition
seq, m-RNA, DNase-seq can be fed to machine learning the data point in such a way that there should be maximum
models to predict a variety of disease-related information. relevance and minimum redundancy.
Figure 1 explains the central dogma of molecular biology The best way to develop effective cancer drugs is to bet-
explains the flow of genetic information. Many researchers ter understand the intricacies and complexities involved in
are exploiting gene expression data related to genetic dis- the cancer microenvironment. Although there has been a
eases like cancer to better understand the microenvironment. plethora of methods and techniques proposed in the litera-
ture, still the deadliness of cancer can’t be reduced. In such
a situation Artificial intelligence (AI) or machine learning
is providing a reliable, fast, and efficient way to deal with
such stringent diseases. For example, PathAI is one of the
powerful AI-based tools, which is helping in the field of
pathology. AI-enabled diagnosis is more fast, reliable, and
accurate. AI can help to reduce the time lapse of clinical
trials and the success rate of clinical trials can be predicted
well in advance.
The task of the pathologist is to take out mass or lesion
from a patient to put it on a glass slide for further obser-
vations. One slide can contain thousands of different cells.
Even if there are one or two tumor cells in the sample, they
are also important in the patient’s treatment. So, pathologists
have to deal with a large number of slides manually each day
before making any decision regarding patients. Apart from
this there are lots of other challenges such as there may be
few cells to look upon because of the small size of a tissue
or there may be so many cells that it is impossible to identify
cancerous cells. So as a pathologist it is very challenging to
pick cancerous cells out of normal cells.
In such a situation pathology slides can be digitized
into digital pathology images. These images can be fed to
the computer to recognize cancer cells Vs normal cells.
Once the computer has finished earning you can apply
Fig. 1 Central Dogma of Biology the algorithms across all the images in your dataset. AI
13
A Systematic Review of Applications of Machine Learning in Cancer Prediction and Diagnosis
can help to find all the different cells that the pathologists 1.2 Our contribution and organization of paper
manually classify on an image and do that automatically.
AI can help us to match patients with the therapy that will Here in this paper, a review of the machine learning perspec-
maximize their chances of long-term survival. tive in cancer research is presented. We have discussed various
13
A. Sharma, R. Rani
applications cancer using machine learning and their possible 3 Comparison with Existing Survey Papers
limitations, research issues etc. in detail. We have adressed
various research questions and challenges corresponding to Various authors have attempted to review the literature on
cancer research using machine learning. Further, we have also cancer research using machine learning. But most of the
focus on machine learning techniques using microarray and surveys are either focused on a single type of cancer or are
NGS data. not covering all the review questions mentioned in Table 1.
The rest of the paper is organized as follows: Section 2 Based on the review questions summarized in Table 1 we
discusses the research methodology. This section describes have compared the most relevant surveys with this survey.
the methods for selecting the literature. Section 3 presents a Table 3 summarizes the comparison of existing surveys on
comparative summary of this survey with the already exist- cancer research with our survey and highlights the prime
ing related surveys. Section 4 discusses challenges in using difference of focus between them.
machine learning for cancer research. Section 5 provides a
detailed discussion on Applications of Machine Learning
in Cancer Research. Section 6 covers the future of cancer
research using machine learning. Section 7 concludes the 4 Challenges in Cancer Research using
paper and discusses future directions. Figure 3 represents the Machine learning
complete layout of the manuscript.
(a) High dimensionality and imbalance class problem
2 Research Methodology Cancer data classification suffers from several issues
like high dimensionality, imbalanced class problem.
To conduct any kind of research or survey a research method- High dimensionality in data refers to the presence of
ology has to be adopted. In this section, we have discussed the an exceptionally large number of features as compared
research methodology that helped us to conduct an extensive to samples. To deal with high dimensionality feature
survey. selection algorithms are designed. There are various
methods and techniques [51–54] proposed in the lit-
2.1 Research Questions asked by Researchers erature for feature selection. However, still, no generic
approach is developed which could handle all types of
The main motive of this review is to help young research- datasets and domains.
ers in this field. There are many research questions that are (b) Model Biasedness In class imbalance problem there is
addressed in this review paper. This review paper will help miss-match between the numbers of samples available
them to understand the basic terminology of cancer research for each class. It results in the biasedness of predic-
using machine learning and to identify the key research prob- tive models towards majority class samples. Various
lems in this area. These research questions are discussed in researchers have contributed solutions to this problem
the Table 1. [55, 56]. But most of the existing work on cancer data
classification is done using binary imbalanced classes;
2.2 Keywords for Searching Relevant Research there is a need to address the imbalance problem in
Papers multi-class paradigm.
(c) Heterogeneity in drug responses Cancer patients show-
Searching of papers is done based on the keywords related ing heterogeneous response with the same cancer type
to cancer research using machine learning. Table 2 lists the has raised a major challenge of precision medication
keywords used for searching relevant papers. Initially, 4000 [57]. There is a need to develop a drug prediction
papers were shortlisted based on the searching and relevancy model which could help in strengthening the present
of this review. After that, further filtering is done to get insights status of precision medication. There is no effective
from the most relevant papers. We have included papers from method to predict the drug responses of individual
reputed journals and conferences. This search criterion helped patients precisely and reliably. Genetic instability and
a lot in this survey. Different keywords are used to find relevant variations among individuals are responsible for varied
research papers and articles. drug responses.
(d) Efficient feature selection technique Further, there is
a need to propose a computationally efficient feature
selection technique that could eliminate the need for
the data cleaning procedures while generating high can-
13
A Systematic Review of Applications of Machine Learning in Cancer Prediction and Diagnosis
Secon 1.1: Secon 1.1: Cancer Secon 1.1: Genome Data Secon 1.2: Our
Introducon to research as a Representaon for Machine contribuon &
Bioinformacs machine learning organizaon of Paper
Learning Approaches
bl
Secon 1: Introducon
Secon 5.5.7: Role of deep Secon 5.5.6: microRNA Secon 5.5.5: Recent
learning in microRNA gene Predicon using research related to
analysis in NGS machine learning microRNA in cancer
cer prediction accuracy with an optimal set of protein As cancer is a complex disease and its complexity var-
properties for drug design. ies from patient to patient and one cannot rely on gener-
(e) Model Scalability Scalable feature selection technique alized medication and hence a scalable drug sensitivity
is required which could consider maximum genetic criterion need to be taken into consideration.
aberrations simultaneously and efficiently [58]. There is (f) Drug Synergy Prediction Issue Machine learning poten-
a need to predict sensitive drugs for individual patients. tial for optimal drug synergy prediction are unexplored
13
A. Sharma, R. Rani
hence relevant machine learning models need to be (g) Next Generation Sequencing (NGS) Analyzing NGS
developed for the proper diagnosis and treatment of dataset using machine learning is also one of the big-
stringent diseases like cancer. Drug synergism helps in gest challenges that researchers are facing. Advance-
designing novel drug combinations which could com- ment in genome sequencing technology has empowered
plement each other to suppress the progression of the researchers to think beyond their imagination. Next-
disease. There is a need to extract potential drug com- generation sequencing has helped to look into detailed
bination features to understand drug-disease interaction intricacies of biological systems. It has provided an effi-
in a holistic manner. cient and cost-effective approach with higher accuracy.
13
A Systematic Review of Applications of Machine Learning in Cancer Prediction and Diagnosis
Advent of microRNAs also known as small non-cod- computational biology research, there is a huge demand
ing genes has begun the paradigm shift in oncological for microarray data. It is helping in developing predictive
research. We are now able to profile expression profiles machine earning models which could help in cancer clas-
of RNAs using RNA-seq data. microRNA profiling sification. Moreover, microarray data helps in the precise
is helping in uncovering their relationship in various prediction of cancer types. Figure 4 describes the cancer
genetic and biological processes. classification steps using machine learning.
Many researchers have contributed different methods/
techniques for tumor classification using microarray data [2].
5 Applications of Machine Learning These techniques varies from statistical methods to machine
in Cancer learning techniques for tumor classification. Microarray data
suffers from the issue of high dimensionality of data. Fea-
Microarray data analysis deals with gene classification, clus- ture selection algorithms are used to deal with this issue [3].
tering using statistical approaches. Apart from statistical With the use of feature selection algorithms, model training
approaches, machine learning algorithms such as Decision time gets reduced as a result of the removal of irrelevant
Tree, Neural Networks, Support Vector Machine (SVM), and features. Scalability and generalization are two constraints
Random Forest are also used for microarray data analysis. that restrict the functioning of traditional feature selection
Moreover we find literature evidences for various compu- algorithms. Deep Neural Networks (DNN) can be used in
tational approaches using machine learning for drug syn- automatic feature extraction and develop generalized and
ergy prediction, drug response prediction, and drug-target scalable models.
interaction prediction and biomarker identification. All these Technological advancement in DNA-microarray has
computational approaches help in identifying potential drug widely pushed the research in bioinformatics. Further,
molecules for various diseases. Cancer is one of the most with the introduction of NGS (Next Generation Sequenc-
researched diseases which have gained huge attention from ing) we can sequence the whole genome structure of any
academia and pharmacy industries. individual. Scientists are performing parallel screening of
gnomonic data to fetch the hidden patterns which could
5.1 Cancer Classification help in drug discovery. Such a parallel screening helps to
identify gene–gene relationships, potential biomarkers for
As we have already discussed that gene expression data has different genetic diseases, and genetic mutations/altera-
enormous potential in interpreting the significance of genes tions. This parallel screening helps to early detect many
and their correlation with disease. To better understand the rigorous diseases such as cancer. Over the last two decades,
disease, the patient’s gene expression data is collected in various bioinformaticians have collaborated to contribute to
different biological environments. A comparison-based data open-source tumor data sets [4–6] to boost cancer research.
analysis is performed to understand the disease state. The These datasets are generally microarray data of thousands
amount of mRNA produced by a gene tells about the active of genes for different tissues (Patients). These are used as
and inactiveness of genes. With the rapid advancement in benchmark datasets to carry out data analysis/prediction for
Fig. 4 Cancer classification
using machine learning
13
A. Sharma, R. Rani
personalized medication and cancer classification. Machine contains a summary of selected cancer classification tech-
learning is also used to exploit the potential of these data- niques using machine learning.
sets. Table 4 contains the datasets available for tumour clas-
sification. Various researchers have developed tumour classi- 5.2 Drug Synergy Prediction
fication techniques using machine learning [7–10]. Machine
learning majorly focuses on identifying hidden patterns in Targeted drug therapy is the most commonly used treatment
data that could help to generalize the biological process/ given to cancer patients. These drugs are specially designed
system. The key idea in cancer classification is to improve based on their targets which help to suppress cancer. These
the classification model prediction accuracy and to find a targets are known as anti-oncogene which is responsible
minimum set of potential gene biomarkers. for tumor suppression by suppressing mitosis (cell-divi-
Although all this seems to very interesting and easy the sion) [19]. Any alteration, changes in these genes lead to
reality is that there are many key issues involved while uncontrollable cell growth. Unlike these genes, there are
designing the biological predictive modeling. Genes iden- oncogenes that promote tumor growth. Most of the targeted
tification for tumor sub-type analysis is a tedious task as it drug therapies are designed considering oncogenes as anti-
depends on feature selection algorithms. These feature selec- oncogenes are hard to target. Various studies revealed the
tion algorithms are dependent on optimization algorithms or resistance of targeted drug therapies and hence results in
statistical approaches that need to be defined very carefully nonresponsive drug behavior [20, 21]. This resistance may
for proper results. Broadly feature selection algorithms are have occurred because of many reasons such as cell death
classified as a wrapper, hybrid and filter methods. The filter inhibition, change in drug targets, etc.. Heterogeneous tumor
method depends on the statistical background on data to microenvironment can also result in drug resistance [22].
identify the key genes which could serve as biomarkers [11]. Combination drug therapy can help to avoid drug resistance.
Wrapper methods are based on a suitable learning approach It helps in overcoming the drug resistance by delaying tumor
to filter out the most relevant genes [11]. Wrapper methods growth. It includes the usage of two or more drugs in fixed
have the benefit of delivering higher accuracy [12]. dose proportion and as a single dose formulation. Table 6
Microarray data has the issue of data high-dimensionality contains the datasets available for anti-cancer drug synergy
and this makes tumour classification a NP-hard problem. To prediction.
solve such problems meta-heuristic algorithms are treated Combination therapy is showing excellent results in
as an optimal choice [11]. Multi-objective functions are tumor suppression by reducing the chances of multiple
the real beauty of these algorithms as they help to find the mutations [23] and a single mutation [24] that can escape
global best solution. Conflicts between different objective all the drugs.
functions have been resolved to fetch the optimal results. Additionally, combination therapy helps in lowering
Many of these algorithms are bio-inspired optimization drug dosage, side-effects [23]. A combination of two or
algorithms [13, 14, 17, 18]. Broadly they are classified as more drugs is considered effective if the tumor suppres-
posterior-based [16] and prior-based [15] approaches. The sion rate of combination is higher than individual drugs.
concept of weighted multi-objective functions is used in Such a combination of drugs is known as synergistic drugs
prior approaches. Posterior approaches focus on the perfor- otherwise antagonistic. The proposition of dose also mat-
mance of the problem of finding an optimal solution. Table 5 ters in drug synergy, we cannot mix them in any random
13
Table 5 Summary of selected cancer classification techniques using machine learning
References Proposed technique Contribution Data sets Performance parameters
Guyon et al. [7] SVM technique based on Recursive Gene Selection for Cancer Classifica- Leukemia [4], Colon cancer [5] leave-one-out success rate
Feature Elimination (RFE) tion
Shen et al. [112] Penalized Logistic Regression Tumour Classification Using Microar- Breast, Colon, Acute Leukemia lung, Classification Accuracy, Computational
ray Data Ovarian, Prostate cancer, Central Time, Penalty Parameter
Nervous system
Wang et al. [113] correlation-based feature selector, Gene selection from microarray data Leukemia [4] CPU time (in seconds), Accuracy
decision trees, naïve Bayes and SVM
Feng et al. [114] Fuzzy Neural Network Gene Selection and Cancer Classifica- Lymphoma Data [115], SRBCT Data Number of genes, Accuracy
tion [116], Liver Cancer Data [117]
Wang et al. [118] Gene Importance Ranking, Support Finding the smallest set of genes Lymphoma Data [115], SRBCT Data Number of genes, Accuracy
Vector Machines (SVMs) [116], Liver Cancer Data [117],
GCM [119]
Cho & Won [120] Ensemble of neural networks Cancer classification Leukemia, Colon, and Lymphoma data Number of genes, Accuracy, Principal
component analysis
Tan et al. [121] Fuzzy neural network Ovarian cancer diagnosis Micro-array gene expression [122], Sensitivity, Specificity, Accuracy, Train-
Blood assays, Proteomic spectra ing time (s)
[123]
Glaab et al. [124] Rule-Based Machine Learning Gene Prioritization and Sample Clas- Prostate cancer [125], lymphoma Average accuracy, Friedman test
sification [126], Breast cancer [127]
Liu et al. [128] Recursive Feature Addition, Super- Gene selection and classification Six benchmark microarray gene Accuracy, Minimize the redundancy of
vised learning expression data sets the genes
Chen et al. [129] Particle swarm optimization, Decision Cancer classification 10 datasets from GEMS, Taiwan Can- Accuracy, ANOVA, p-value
tree classifier cer Registry [130]
Margoosian & Abouei [131] Ensemble-based Classifiers Cancer Classification BENCHMARK FOURTEEN CAN- Classification accuracy
A Systematic Review of Applications of Machine Learning in Cancer Prediction and Diagnosis
13
F-measure, H-mean
A. Sharma, R. Rani
Table 6 Datasets available for anti-cancer drug synergy prediction 5.4 Drug‑Target Interaction Prediction
S. No. Dataset Link
Drug discovery involves finding novel drugs and their novel
1 DrugComb https://bit.ly/319f2Js potential targets. Identifying such drug target interaction
2 NCI-ALMANAC https://bit.ly/3fmZp6i out of pool of drugs and targets is a tedious task. Current
3 DREAM Challenge Dataset https://bit.ly/2XghYCH research on drugs aims to repurpose an already existing drug
4 Combination therapies dataset https://pubmed.ncbi. for new diseases and targets. Drug repurposing for new dis-
for melanoma nlm.nih.gov/23239
741/
eases and target helps in saving time and money as the repur-
posed drugs are already approved. Drug-target interactions
involve two sets of agents: Chemicals form the drug set and
amino acid form a target set. This research problem has a
proportions. Quantify drug synergy is a very complex task vital role in discovering new drugs and to recognize new
but still few researchers have given metrics to measure it. potential targets of it. They play a significant contribution
Some of the quantitative methods for drug synergy are the to understand the operation of drugs and their side effects.
Bliss independence model [27], Dose equivalence, Isobolo- However, there exist some key issues related to drug dis-
graphic analysis [25], and Chou-Talalay [26]. Table 7 con- covery such as toxicity towards patients, drug resistance,
tains a summary of the anti-cancer drug synergy prediction time-consuming clinical trials. Difference in drug effects on
approaches with respect to ML. patients [35, 36] and mapping of drug effect with the drug
interaction pathway [28] are the key issues discussed in the
5.3 Drug Response Prediction literature. Table 10 contains the datasets available for drug
target interaction prediction.
Abnormal mutations and changes in genes lead to cancer We can predict drug target interactions using either of
and also disrupt the normal functioning of cellular activi- the two methods: clinical/experimental (in vivo) or with the
ties. Exposure of cells to an unfavourable environment pro- help of computational (in silico) methods. These methods
motes tumor growth. Understanding tumor microenviron- are classified as: Docking [38, 39], ligand-based [40], lit-
ment complexity is one of the challenging tasks. Even if erature text mining [41], and pharmacogenomics [42, 43]
patients have same type of cancer still they will response methods. Clinical methods are inefficient, tiring, and even
differently towards same type of therapy. Genetic differences difficult to reproduce [44].
among patients are the main reason for the difference in drug Clinical docking techniques are most widely used tech-
responses. Cancer patients can’t be given medications based niques but their time-consuming simulations and non
on their anatomical origin. An individual patient’s genomic availability of 3-D structure of proteins are major draw-
profile needs to be considered while making suitable pre- backs. Using simulation techniques these methods predict
scriptions [29]. Treating cancer patients with better drugs about the target site for a given drug. There are some other
and diagnosis is still a challenging task. Table 8 contains similarity based techniques too that uses the similarity
datasets available for drug sensitivity prediction. between targets (ligands) but no proper information about
Large-scale drug screening data is providing a helping majority of the target ligands resulted in less popularity of
hand in identifying the relationship between genes and drug these methods. One another method Literature text mining
responses. Datasets(Pharmacogenomics) are produced as a explores the literature to find out the relationship between
result of such large scale screenings. GDSC [30] and CCLE the given drug and target. But they are also not so popular
[31] are two such large databases which helps to promote because of lack of information. Apart from these methods
oncological research. computational methods such as machine learning tech-
Machine learning techniques are used in modelling can- niques and kernel-based are also used to find out potential
cerous research problems such as predicting drug responses, drug-target interactions [45, 46]. Various online databases
genomic biomarkers. Machine learning models such as ran- are available that provide access to the data related to com-
dom forest and elastic net regularization are the most fre- pounds and target proteins [47–50]. These databases help
quently used in drug response prediction. Matrix factoriza- to boost the research related to DTI. Various researches
tion is one of the popularly used technique in drug response have used these databases in their studies to identify novel
prediction [32]. Trust prorogation based technique is used drug target interactions [42]. Table 11 is the summariza-
by Jamali et al. [33] for predicting drug responses. Regular- tion of drug target interaction prediction techniques with
ized factorization methods is also used in bioinformatics, respect to ML.
brain activities prediction [34]. Table 9 contains a summary
of selected drug sensitivity prediction techniques using
machine learning.
13
Table 7 Summary of Anti-cancer Drug Synergy Prediction approaches with respect to ML
References Proposed technique Contribution Data sets Performance parameters
Kim et al. [146] Deep neural networks Anti-cancer Drug Synergy Prediction Genetic data from multiple databases Sensitivity, AUC, Accuracy
Jiang et al. [147] Graph Convolutional Network (GCN) Prioritizing synergistic anticancer drug O’Neil et al.’s dataset [149] AUC, AUPRC, Accuracy, Kappa
model combinations
Ekşioğlu & Tan [148] Ensemble Learning Prediction of Drug Synergy Large compound oncology dataset Mean Squared Error (MSE), Pearson cor-
[149] relation coefficient
Zhang et al. [150] Deep Learning Model Predicting Tumor Cell Response to NCI ALMANAC database, Cancer cell Pearson correlation coefficient
Synergistic Drug Combinations line encyclopedia (CCLE) database,
KEGG (Kyoto Encyclopedia of Genes
and Genomes)
Kuru et al. [151] Deep learning framework Drug Synergy Prediction DrugComb Correlation, Mean squared error (MSE)
Preuer et al. [152] Deep Learning Predicting anti-cancer drug synergy large-scale oncology screen [149] MSE, P-value, RMSE, Pearson’s r
Wildenhan et al. [153] Random forest and Naive Bayesian Prediction of Synergism from Chemi- CGM dataset AUC, ROC, Gini Index
learner cal-Genetic Interactions
A Systematic Review of Applications of Machine Learning in Cancer Prediction and Diagnosis
Janizek et al. [154] Extreme gradient boosted tree-based Prediction of synergistic drug combina- O’Neil et al.’s dataset [149] Mean squared error (MSE), Five fold
approach tions cross-validation
Mason et al. [155] Machine Learning Predict Synergistic Antimalarial Com- 1,540 antimalarial drug combinations CV (cross-validation)
pound Combinations
Chen et al. [156] Deep belief network, ontology finger- Predict effective drug combination DREAM Challenge dataset Precision, Recall, F1
prints
Sharma and Rani [157] Machine learning algorithms Identification of effective and synergis- DREAM Challenge Dataset, Held et al. Accuracy, Specificity, Sensitivity
tic anti-cancer drug combinations [158]
13
A. Sharma, R. Rani
Table 8 Datasets available for Drug sensitivity prediction microRNA is a seamlessly regulated process where mul-
S. No. Dataset Link tiple sub-processes are involved.
13
Table 9 Summary of selected Drug sensitivity prediction techniques using machine learning
References Proposed technique Contribution Data sets Performance parameters
Jang et al. [159] 110,000 different models, multifacto- DRUG Cancer cell lines (CCLE), Sanger IC50, AUC, ANOVA
rial experimental design testing SENSITIVITY PREDICTION
DRUG
SENSITIVITY PREDICTION
Drug sensitivity prediction
Menden et al. [160] Neural networks and Random forests Prediction of Cancer Cell Sensitivity GDSC Root mean square error (RMSE), Coef-
to Drugs using Genomic and Chemi- ficient of determination (R2), Pearson
cal Properties correlation coefficient ( Rp)
Turki et al. [161] Transfer Learning Drug Sensitivity Prediction in Multi- (GEO) repository (http://www.ncbi. P-values of t-test, AUC, Mean AUC
ple Myeloma Patients nlm.nih.gov/geo/) (MAUC)
Wan & Pal [162] Ensemble Learning Drug Sensitivity Prediction NCI-DREAM Challenge dataset, Accuracy, Leave-one-out errors, Statisti-
Cancer Cell Line Encyclopedia cal significance
Dong et al. [163] Support Vector Machine (SVM) and a Anticancer drug sensitivity prediction CCLE, CGP Accuracy, AUC
recursive feature selection
Rahman et al. [164] Ensemble mode, Random Forests Drug Sensitivity Prediction CCLE, GDSC databases MSE, AUC
Yuan et al. [165] Multitask learning Prediction of cancer drug sensitivity CCLE, CTD2, NCI60 MSE, fivefold cross-validation
Ali and Aittokallio [166] Machine learning, feature selection Drug response prediction NCI-DREAM Challenge MSE, Accuracy
A Systematic Review of Applications of Machine Learning in Cancer Prediction and Diagnosis
Haider et al. [167] Multivariate Random Forests Drug Sensitivity Prediction GDSC and CCLE Accuracy, AUC
He et al. [168] Kernelized Rank Learning (KRL) Personalized drug recommendation Cancer cell lines, Clinical trials Precision, Standard deviations
Matlock et al. [169] Random Forests Drug sensitivity prediction Synthetic data, CCLE data Prediction accuracy and AUC, MSE
Riddick et al. [170] Random Forests, Ensemble Predicting in vitro drug sensitivity NCI-60, 19 Breast Cancer and 7 R2, correlation coefficients
approach,classification and regres- Glioma cell lines
sion trees
Sharma and Rani et al. [171] Ensemble and multi-task learning Drug sensitivity prediction NCI-Dream dataset, CCLE dataset Wilcoxon ranksum test, CV std. error,
MSD (Mean square deviation)
Sharma and Rani et al. [172] Ensembled machine learning Drug sensitivity prediction GDSC, CCLE MSE, paired t-test, Wilcoxon signed-
rank test
13
A. Sharma, R. Rani
Table 10 Datasets available for drug-target interaction prediction uncovering their relationship in many diseases and predict-
S. No. Dataset Link of the dataset
ing their role in precision medication.
Ezzat et al. [173] Ensemble learning, dimen- Drug-target interaction drug-target interaction data Sensitivity Analysis, AUC
sionality reduction prediction [174], Second dataset
[175]
Chen et al. [176] Machine Learning Drug-Target Review on databases such
Interaction Prediction as DrugBank, KEGG, and
STITCH
Wen et al. [177] Deep learning Drug-target interaction ‘golden standard’ dataset TPR, TNR, Accuracy, AUC
prediction [41]
Yuan et al. [178] Ensemble learning, k -near- Improving drug–target DrugBank [179] AUPR, precision, recall
est neighbor, Bipartite interaction prediction
Local Model with support
vector classification
Ezzat et al. [174] Class imbalance-aware Drug-target interaction DrugBank database [48] AUC
ensemble learning
Zhang et al. [180] A random projection ensem- Drug-target interaction Dataset [181] Precision, recall, Accuracy,
ble approach prediction F1-measure
Xie et al. [182] Deep learning Transcriptome data clas- LINCS project, DTI data- Accuracy, Predictive errors
sification for drug-target base [183]
interaction prediction
Tian et al. [183] Deep neural network (DNN) Compound-protein interac- STITCH database [185], Accuracy, Sensitivity,
tion prediction PubChem database [185], specificity, F1-measure
Pfam database [186]
Feng et al. [187] Deep Learning Drug-Target Interaction He et al. [188] Davis dataset R2, RMSE
Prediction [189], Metz dataset [190]
and KIBA dataset [191]
Xie et al. [192] Deep-learning-based model Drug–target interaction L1000 dataset Accuracy, F-score, propor-
prediction tion of positive cases and
predictive error
Sharma and Rani [193] Dimensionality Reduction Drug Target Interaction Drug Bank [45], Ezzat et al. AUPR, AUC, sensitivity,
and Active Learning Prediction [174] specificity, accuracy
13
A Systematic Review of Applications of Machine Learning in Cancer Prediction and Diagnosis
technologies has raised the possibility of easy identification that has shown tumor suppressor capabilities in breast can-
of microRNA and their targets. Assays/Sequences generated cer animal models [91]. Breast cancer is one of the most
through these technologies are platform-independent and researched cancer types because of its intricacy and com-
provides better analytical and statistical inference. Table 12 monality among women, therefore deeper knowledge about
summarizes the various microRNAs as potential cancer its subtypes and drug therapies can help to fight against
diagnostic biomarkers in blood. it. Circulating microRNAs have been identified as poten-
tial biomarkers for early detection of breast cancer [92].
5.5.5 Recent research related to microRNA in cancer Recently a study revealed the deregulation of microRNAs
in the tumor environment and their role in cancer cell lines
As we have already discussed the significant role of micro- [93]. Laura Cantini, et al. have proposed a pipeline for sub-
RNA in various diseases and drug therapies, still identifica- type identification and analysis of colorectal cancer using an
tion of novel microRNAs and their targets is a challeng- interaction network of mRNA-microRNA [94].
ing issue. Although various tools and pipelines have been
developed the inconsistency between their results and no 5.5.6 micoRNA Gene Prediction using machine learning
standardized approach has raised a serious issue among
researchers in this field. microRNAs are the essence and indeed need to present bio-
Still, researchers are trying their hard to relate these medical research. We have already discussed the importance
newly identified disease predictors with targeted drug ther- and role of microRNAs in various biological systems. Many
apies. Recently microRNA 374b is identified as a resistive microRNAs have been identified but still many more to be
agent in pancreatic cancer drug therapy [90]. The major goal discovered. Due to the limitation of biological experimental
of new generation drug prediction is to predict novel drugs approaches microRNA identification suffers from serious
that could be useful in a wide range of diseases. The Scripps bottleneck and hence efficient computational approaches
Research Institute (TSRI) researchers have designed a drug are needed for the identification and prediction of novel
Fig. 5 Biogenesis of microRNA
miRNA Gene
Pri-miRNA Pasha
Nucleus Drosha
Pre-miRNA
Exporn 5
Pre-miRNA Dicer
Cytoplasm AGO
miRNA
RISC RISC
13
A. Sharma, R. Rani
microRNAs. Most of the real-world problems are complex prediction in bioinformatics. The homology technique is a
in nature, which makes them difficult to model. simple method that predicts microRNAs based on existing
ML approaches can help in modeling such complex prob- information from already identified microRNAs. It is the
lems and to incorporate data-driven decision-making capa- sequence alignment technique, nothing new can be predicted
bilities in resultant models. We can apply ML approaches regarding microRNAs.
on microRNA data for their identification, their target There are various tools available based on the homol-
genes, and then further analysis of microRNA expression ogy technique and ML such as ProMir [97] and MirFinder
data. NGS has given a powerful platform for discovering [98]. In contrast to homology-based methods, ab-initio
new microRNAs and their targets. NGS platforms like Illu- methods are not similarity-based, they do not require any
mina/Solexa GA are popularly used platforms that give more additional reference sequence for predicting microRNAs.
accurate expression values as compared to hybridization- Proper parameter selection can lead to the prediction of
based technologies. As a result, significant improvement new microRNAs but if not selected properly can result in
has been seen in microRNA identification and their targets. high false-positive predictions. Ab-initio methods also use
ML approaches can classify candidate target genes corre- ML capabilities and there is software such as MiPred [99],
sponding to identified microRNA. Classifiers such as Ran- MiRenSVM [100], Triplet-SVM [101] based on it. The
dom Forest, SVM, and Decision Trees are used popularly. availability of a huge amount of biological data has raised
Figure 6 describes the generalized workflow for microRNA the need for new data handling, prediction, and classification
gene prediction using ML. The basic idea here is that ML algorithms. Traditional methods are no more reliable enough
will generalize the prediction rules based on the positive and to handle such an enormous growth of data. In such a sce-
negative data sets. A positive data set contains microRNA nario ML approaches are considered an optimal choice for
sequences that have been already identified and a negative better results. ML approaches are used in various fields of
dataset contains microRNA look-alike sequences, that are bioinformatics such as genomics, proteomics, transcriptom-
not microRNAs. Most of the available methods for micro- ics, and system biology. ML algorithms for the prediction
RNA gene prediction rely on structure similarity of the of microRNAs start with the training step to build an expert
hairpin structure of pre-microRNA. They are based on the model. A model is designed based on the learning it gathers
principle of homologous structure identification, if we could from sequence data, microRNA structure, and intensity data
find microRNA in one genome, then there is a possibility of of microRNAs. Based on learning from these features it can
identifying it in another genome too. Homology modeling classify unknown sequences as microRNA or not. But these
and ab-initio are presently available methods for microRNA ML algorithms suffer from serious class imbalance problem
13
A Systematic Review of Applications of Machine Learning in Cancer Prediction and Diagnosis
and most of the algorithms consider fixed loop size stems, new features too which can be used in future research.
reducing the overall prediction accuracy. Learning in ML Similarly Cheng S., et al. developed MiRTDL [104] an
models for microRNA predictions is based on positive and algorithm for microRNA target prediction using CNN. It
negative data. Majority of the time these datasets are derived automatically extracts desired information from the data
from mirBase [104], although a little of pre-processing is itself rather than relying on information fed manually.
needed before actually using it. These algorithms have shown efficient results and have
improved prediction results. The use of deep learning tech-
5.5.7 Role of deep learning in microRNA analysis in NGS niques in microRNA and their target prediction can help in
novel microRNA predictions and one can investigate better
"Big Data" has been a buzz topic in the recent years; it has knowledge about the underlying mechanism.
gained huge interest from academics as well as industry. The
rate at which data is being produced has increased to many
folds and so is the research in this field. Data related to bio-
informatics has also evolved over many years. An increase in 6 Conclusion and Future Directions
computational capabilities and the emergence of HTS tech-
nology has lead to a sudden outburst of biomedical data. Although various researchers are working in the field of
This data serves a great potential in identifying disease bio- cancer but still there are various possible future directions
markers, discovering new drugs, but unfortunately, it is not which need to be addressed. Heterogeneous omic data can
effectively utilized. NGS technologies have created a serious be considered to further improve the performance of can-
need for new technologies and algorithms. In such a scenario cer classification. Drug synergy data need to be extracted
deep learning using neural networks is considered an effec- so as to foster the research in this field. The heterogene-
tive choice. Although ML approaches have been used for ous drug response of individuals need to be understand
many years they have a limitation of processing raw data. and considered while developing predictive models. Copy
Deep learning is a new version of ML algorithms that number variation, somatic mutation, and pathways can be
incorporate artificial intelligence using multilayer neural further considered in predicting drug responses. Genomic
networks. In contrast to traditional ML approaches, deep data integration can be performed to further improve pre-
learning can extract features from data itself. In efforts to diction results.
apply deep learning algorithms to microRNA prediction, Further, apart from microarray data, we can use micro-
researchers have proposed various deep learning algo- RNAs which are small non-coding RNAs that bind to 3
rithms. Seunghyun Park, et al. has proposed deepMiRGene UTR regions of their target mRNA. They play an impor-
[103] an algorithm to predict microRNA precursor. They tant role in controlling the posttranslational regulation of
used RNN, there is no need to input features manually, coding genes, either by degrading them or inhibiting their
and the algorithm automatically identifies features from translation. Various microRNA have been identified and
input data. This approach leads to the discovery of various many more to be discovered from a pool of genomic data.
Various computational and statistical approaches are pro-
posed to leverage the best results out of sequencing data.
NGS technology is popularly used these days due to cost
reduction, higher accuracy; as a result, we need efficient
algorithms and pipelines which could cater to the present
need. Machine learning and deep learning algorithms can
prove useful in handling NGS data and develop biomedi-
cal applications. Using these technologies we can pre-
dict promising microRNA biomarkers which could later
be used as drug targets for a variety of diseases. Hence
microRNAs have paved the path for the precision medi-
cation in fighting against cancer. Identifying novel and
tissue-specific microRNA can help to differentiate signifi-
cantly between healthy and diseased cell states. This paper
attempts to highlight the possible application areas of anti-
cancer drug prediction using machine learning, NGS data
using machine learning, and how microRNAs can help
Fig. 6 Generalized Workflow for machine learning microRNA gene in better diagnosis and prognosis of cancer. This review
prediction
13
A. Sharma, R. Rani
paper is an attempt to summarize the various research 16. Marler RT, Arora JS (2004) Survey of multi-objective opti-
directions for cancer using machine learning. mization methods for engineering. Struct Multidiscip Optim
26(6):369–395
17. BoussaïD I, Lepagnot J, Siarry P (2013) A survey on optimiza-
tion metaheuristics. Inf Sci 237:82–117
Funding None. 18. Chakraborty A, Kar AK (2017) Swarm intelligence: a review
of algorithms. In: Nature-inspired computing and optimization.
Compliance with ethical standards Springer, pp 475–494
19. Weinberg RA (1991) Tumor suppressor genes. Science
254(5035):1138–1146
Conflict of interest None. 20. Knoechel B, Roderick JE, Williamson KE, Zhu J, Lohr JG, Cot-
ton MJ, Gillespie SM (2014) An epigenetic mechanism of resist-
ance to targeted therapy in T cell acute lymphoblastic leukemi.
Nat Genet 46(4):364–370
21. Rini BI, Atkins MB (2009) Resistance to targeted therapy in
References renal-cell carcinoma. Lancet Oncol 10(10):992–1000
22. Housman G, Byler S, Heerboth S, Lapinska K, Longacre M,
1. Błaszczyński J, Stefanowski J (2015) Neighbourhood sampling Snyder N, Sarkar S (2014) Drug resistance in cancer an overview.
in bagging for imbalanced data. Neurocomputing 150:529–542 Cancers 6(3):1769–1792
2. Ying Lu, Han J (2003) Cancer classification using gene expres- 23. Fitzgerald JB, Schoeberl B, Nielsen UB, Sorger PK (2006) Sys-
sion data. Inf Syst 28(4):243–268 tems biology and combination therapy in the quest for clinical
3. Oleg O (2013) Survey of novel feature selection methods for efficacy. Nat Chem Biol 2(9):458–466
cancer classification. Biological knowledge discovery handbook 24. Cokol M, Chua HN, Tasan M, Mutlu B, Weinstein ZB, Suzuki
preprocessing mining, and postprocessing of biological data, pp Yo, Nergiz ME (2011) Systematic exploration of synergistic drug
379–398 pairs. Mol Syst Biol 7(1):544–553
4. Golub Todd R, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, 25. Tallarida RJ (2011) Quantitative methods for assessing drug syn-
Mesirov JP, Coller H (1999) Molecular classification of cancer ergism. Genes Cancer 2(11):1003–1008
class discovery and class prediction by gene expression monitor- 26. Ashton JC (2015) Drug combination studies and their synergy
ing. Science 286(5439):531–537 quantification using the Chou-Talalay method. Cancer Res
5. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, 75(11):2400–2400
Levine AJ (1999) Broad patterns of gene expression revealed by 27. Foucquier J, Guedj M (2015) Analysis of drug combinations
clustering analysis of tumor and normal colon tissues probed by current methodological landscape. Pharmacol Res Perspect
oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750 3(3):00149
6. Van’t Veer LJ, Dai H, Van De Vijver MJ, He YD, Hart AA, Mao 28. Kotelnikova E, Yuryev A, Mazo I, Daraselia N (2010) Computa-
M, Friend SH (2002) Gene expression profiling predicts clinical tional approaches for drug repositioning and combination therapy
outcome of breast cancer. Nature 415(6871):530–536 design. J Bioinf Comput Biol 8(3):593–606
7. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection 29. Xiao G, Ma S, Minna J, Xie Y (2014) Adaptive prediction model
for cancer classification using support vector machines. Mach in prospective molecular signature-based clinical studies . Clin
Learn 46(1–3):389–422 Cancer Res 20(3):531–539
8. Shevade SK, Sathiya Keerthi S (2003) A simple and efficient 30. Garnett MJ, Edelman EJ, Heidorn SJ, Greenman CD, Dastur
algorithm for gene selection using sparse logistic regression. A, Lau KW, Greninger P (2012) Systematic identification of
Bioinf 19(17):2246–2253 genomic markers of drug sensitivity in cancer cells. Nature
9. Furlanello C, Serafini M, Merler S, Jurman G (2003) Gene selec- 483(7391):570–575
tion and classification by entropy-based recursive feature elimi- 31. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin
nation. In: Proceedings of the international joint conference on AA, Kim S, Wilson CJ (2012) The Cancer Cell Line Encyclope-
neural networks, 4:3077–3082. IEEE dia enables predictive modelling of anticancer drug sensitivity.
10. Chu W, Ghahramani Z, Falciani F, Wild DL (2005) Biomarker Nature 483(7391):603–607
discovery in microarray gene expression data with Gaussian pro- 32. Yamada M, Lian W, Goyal A, Chen J, Wimalawarne K, Khan
cesses. Bioinf 21(16):3385–3393 SA, Chang Y (2017) Convex factorization machine for toxicog-
11. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection enomics prediction. In: Proceedings of the 23rd ACM SIGKDD
techniques in bioinformatics. Bioinf 23(19):2507–2517 international conference on knowledge discovery and data min-
12. Inza I, Larrañaga P, Blanco R, Cerrolaza AJ (2004) Filter versus ing, pp 1215–1224
wrapper gene selection approaches in DNA microarray domains. 33. Jamali M, Ester M (2010) A matrix factorization technique with
Artif Intell Med 31(2):91–103 trust propagation for recommendation in social networks. In:
13. Shen Qi, Shi W-M, Kong W (2008) Hybrid particle swarm opti- Proceedings of the fourth ACM conference on Recommender
mization and tabu search approach for selecting genes for tumor systems, pp 135–142
classification using gene expression data. Comput Biol Chem 34. Wang L, Li X, Zhang L, Gao Q (2017) Improved anticancer drug
32(1):53–60 response prediction in cell lines using matrix factorization with
14. Li S, Xixian Wu, Tan M (2008) Gene selection using hybrid similarity regularization. BMC Cancer 17(1):513–524
particle swarm optimization and genetic algorithm. Soft Comput 35. Evans WE, McLeod HL (2003) Pharmacogenomics drug disposi-
12(11):1039–1048 tion, drug targets, and side effects. N Engl J Med 348(6):538–549
15. Branke J, Deb K, Dierolf H, Osswald M (2004) Finding knees in 36. Wei D-Q, Wang J-F, Chen C, Li Y, Chou K-C (2008) Molecular
multi-objective optimization. International conference on parallel modeling of two CYP2C19 SNPs and its implications for per-
problem solving from nature. Springer, Berlin, Heidelberg, pp sonalized drug design. Protein Pept Lett 15(1):27–32
722–731
13
A Systematic Review of Applications of Machine Learning in Cancer Prediction and Diagnosis
37. Mizutani S, Pauwels E, Stoven V, Goto S, Yamanishi Y (2012) 58. Lee Y, Kim M, Han J, Yeom K-H, Lee S, Baek SH, Narry Kim
Relating drug–protein interaction network with drug side effects. V (2004) Microrna genes are transcribed by rna polymerase ii.
Bioinformatics 28(18):i522–i528 EMBO J 23(20):4051–4060
38. Rarey M, Kramer B, Lengauer T, Klebe G (1996) A fast flexible 59. Xie Z, Allen E, Fahlgren N, Calamar A, Givan SA, Carrington
docking method using an incremental construction algorithm. J JC (2005) Expression of arabidopsis mirna genes. Plant Physiol
Mol Biol 261(3):470–489 138(4):2145–2154
39. Xie Li, Evangelidis T, Xie L, Bourne PE (2011) Drug discov- 60. Richard Lu, Barca O (2012) Fine-tuning oligodendrocyte devel-
ery using chemical systems biology weak inhibition of multiple opment by micrornas. Front Neurosci 6:13
kinases may contribute to the anti-cancer effect of nelfinavir. 61. Hayes DF, Bast RC, Desch CE, Fritsche H, Kemeny NE, Jes-
PLoS Comput Biol 7(4):e1002037 sup JM, Locker GY, Macdonald JS, Mennel RG, Norton L et al
40. Jacob L, Vert J-P (2008) Protein-ligand interaction predic- (1996) Tumor marker utility grading system: a framework to
tion an improved chemogenomics approach. Bioinformatics evaluate clinical utility of tumor markers. J Natl Cancer Inst
24(19):2149–2156 88(20):1456–1466
41. Zhu S, Okuno Y, Tsujimoto G, Mamitsuka H (2005) A probabil- 62. Garzon R, Marcucci G, Croce CM (2010) Targeting micrornas in
istic model for mining implicit chemical compound–gene rela- cancer: rationale, strategies and challenges. Nat Rev Drug Dis-
tions from literature. Bioinformatics 21(2):ii245–ii251 covery 9(10):775–789
42. Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M 63. Ambros V (2003) Microrna pathways in flies and worms: growth,
(2008) Prediction of drug–target interaction networks from the death, fat, stress, and timing. Cell 113(6):673–676
integration of chemical and genomic spaces. Bioinformatics 64. Doench JG, Sharp PA (2004) Specificity of microrna target selec-
24(13):i232–i240 tion in translational repression. Genes Dev 18(5):504–511
43. Wang Y-C, Zhang C-H, Deng N-Y, Wang Y (2011) Kernel-based 65. Zhang H, Kolb FA, Brondani V, Billy E, Filipowicz W (2002)
data fusion improves the drug–protein interaction prediction. Human dicer preferentially cleaves dsrnas at their termini with-
Comput Biol Chem 35(6):353–362 out a requirement for atp. EMBO J 21(21):5875–5885
44. Fakhraei S, Huang B, Raschid L, Getoor L (2014) Network-based 66. Kosaka N, Iguchi H, Ochiya T (2010) Circulating microrna in
drug-target interaction prediction with probabilistic soft logic. body uid: a new potential biomarker for cancer diagnosis and
IEEE/ACM Trans Comput Biol Bioinf (TCBB) 11(5):775–787 prognosis. Cancer Sci 101(10):2087–2092
45. van Laarhoven T, Nabuurs SB, Marchiori E (2011) Gaussian 67. Ploussard G, de la Taille A (2010) Urine biomarkers in prostate
interaction profile kernels for predicting drug–target interaction. cancer. Nat Rev Urol 7(2):101–109
Bioinformatics 27(21):3036–3043 68. Li A, Omura N, Hong S-M, Vincent A, Walter K, Grith M,
46. Zheng X, Ding H, Mamitsuka H, Zhu S (2013) Collaborative Borges M, Goggins M (2010) Pancreatic cancers epigenetically
matrix factorization with multiple similarities for predicting silence sip1 and hypomethylate and overexpress mir-200a/200b
drug-target interactions. In: Proceedings of the 19th ACM SIG- in association with elevated circulating mir-200a and mir-200b
KDD international conference on knowledge discovery and data levels. Cancer Res 70(13):5226–5237
mining, pp 1025–1033 69. Ho AS, Huang X, Cao H, Christman-Skieller C, Bennewith K, Le
47. Bolton EE, Wang Y, Thiessen PA, Bryant SH (2008) PubChem Q-T, Koong AC (2010) Circulating mir-210 as a novel hypoxia
integrated platform of small molecules and biological activities. marker in pancreatic cancer. Transl Oncol 3(2):109–113
Ann Rep Comput Chem 4:217–241 70. Wang J, Chen J, Chang P, LeBlanc A, Li D, Abbruzzesse JL,
48. Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A (2010) Frazier ML, Killary AM, Sen S (2009) Micrornas in plasma of
DrugBank 30 a comprehensive resource for ‘omics’ research on pancreatic ductal adenocarcinoma patients as novel blood-based
drugs. Nucleic Acids Res 39(1):D1035–D1041 biomarkers of disease. Cancer Prev Res 2(9):807–813
49. Gaulton A, Bellis LJ, Patricia Bento A, Chambers J, Davies 71. Morimura R, Komatsu S, Ichikawa D, Takeshita H, Tsujiura M,
M, Hersey A, Light Y (2011) CHEMBL a large-scale bio- Nagata H, Konishi H, Shiozaki A, Ikoma H, Okamoto K et al
activity database for drug discovery. Nucleic Acids Res (2011) Novel diagnostic value of circulating mir-18a in plasma of
40(D1):D1100–D1107 patients with pancreatic cancer. Br J Cancer 105(11):1733–1740
50. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M (2011) 72. Mitchell PS, Parkin RK, Kroh EM, Fritz BR, Wyman SK, Pogos-
KEGG for integration and interpretation of large-scale molecular ova- EL, Agadjanyan AP, Noteboom J, O’Briant KC, Allen A
data sets. Nucleic Acids Res 40(D1):D109–D114 et al (2008) Circulating micrornas as stable blood-based markers
51. Hira ZM, Gillies DF (2015) A review of feature selection and for cancer detection. Proc Natl Acad Sci 105(30):10513–10518
feature extraction methods applied on microarray data. Adv Bio- 73. Zhu W, Qin W, Atasoy U, Sauter ER (2009) Circulating micro-
inf 2015(198363):1–13 rnas in breast cancer and healthy subjects. BMC Res Notes
52. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2016) 2(1):89
Feature selection for high-dimensional data. Prog Artif Intell 74. Heneghan HM, Miller N, Kelly R, Newell J, Kerin MJ (2010)
5(2):65–75 Systemic mirna-195 differentiates breast cancer from other
53. Haixiang G, Yijing Li, Jennifer Shang Gu, Mingyun HY, Bing G malignancies and is a potential biomarker for detecting nonin-
(2017) Learning from class-imbalanced data: review of methods vasive and early stage disease. Oncologist 15(7):673–682
and applications. Expert Syst Appl 73:220–239 75. Asaga S, Kuo C, Nguyen T, Terpenning M, Giuliano AE, Hoon
54. Krawczyk B, Galar M, Jeleń Ł (2016) Francisco Herrera Evolu- DSB (2011) Direct serum assay for microrna-21 concentrations
tionary undersampling boosting for imbalanced classification of in early and advanced breast cancer. Clin Chem 57(1):84–91
breast cancer malignancy. Appl Soft Comput 38:714–726 76. Zhao H, Shen J, Medico L, Wang D, Ambrosone CB, Liu S
55. Dagogo-Jack I, Shaw AT (2018) Tumour heterogeneity and (2010) A pilot study of circulating mirnas as potential biomarkers
resistance to cancer therapies. Nat Rev Clin Oncol 15(2):81–94 of early stage breast cancer. PLoS ONE 5(10):e13735
56. Tadist K, Najah S, Nikolov NS, Mrabti F, Zahi A (2019) Feature 77. Chen Xi, Ba Yi, Ma L, Cai X, Yin Y, Wang K, Guo J, Zhang
selection methods and genomic big data: a systematic review. J Y, Chen J, Guo X et al (2008) Characterization of micrornas in
Big Data 6(79):1–24 serum: a novel class of biomarkers for diagnosis of cancer and
57. Bartel DP (2009) Micrornas: target recognition and regulatory other diseases. Cell Res 18(10):997–1006
functions. Cell 136:215–233
13
A. Sharma, R. Rani
78. Shen J, Liu Z, Todd NW, Zhang H, Liao J, Lei Yu, Guarnera MA, 96. Murakami Y, Tanahashi T, Okada R, Toyoda H, Kumada T,
Li R, Cai L, Zhan M et al (2011) Diagnosis of lung cancer in Enomoto M, Tamori A, Kawada N, Taguchi YH, Azuma T
individuals with solitary pulmonary nodules by plasma microrna (2014) Comparison of hepatocellular carcinoma miRNA expres-
biomarkers. BMC Cancer 11(1):1 sion profiling as evaluated by next generation sequencing and
79. Zheng D, Haddadin S, Wang Y, Li-Qun Gu, Perry MC, Freter microarray. PLoS ONE 9(9):e106314
CE, Wang MX (2011) Plasma micrornas as novel biomark- 97. Nam J-W, Shin K-R, Han J, Yoontae Lee V, Kim N, Zhang B-T
ers for early detection of lung cancer. Int J Clin Exp Pathol (2005) Human microrna prediction through a probabilistic co-
4(6):575–586 learning model of sequence and structure. Nucleic Acids Res
80. Taylor DD, Gercel-Taylor C (2008) Microrna signatures of 33(11):3570–3581
tumor-derived exosomes as diagnostic biomarkers of ovarian 98. Huang T-H, Fan B, Rothschild MF, Zhi-Liang Hu, Li K, Zhao
cancer. Gynecol Oncol 110(1):13–21 S-H (2007) Mirfinder: an improved approach and software
81. Resnick KE, Alder H, Hagan JP, Richardson DL, Croce CM, implementation for genome-wide fast microrna precursor
Cohn DE (2009) The detection of differentially expressed micro- scans. BMC Bioinf 8(1):1
rnas from the serum of ovarian cancer patients using a novel 99. Ng KLS, Mishra SK (2007) De novo svm classification of
real-time pcr platform. Gynecol Oncol 112(1):55–59 precursor micrornas from genomic pseudo hairpins using
82. Tsujiura M, Ichikawa D, Komatsu S, Shiozaki A, Takeshita H, global and intrinsic folding measures. Bioinformatics
Kosuga T, Konishi H, Morimura R, Deguchi K, Fujiwara H et al 23(11):1321–1330
(2010) Circulating micrornas in plasma of patients with gastric 100. Ding J, Zhou S, Guan J (2010) Mirensvm: towards better pre-
cancers. Br J Cancer 102(7):1174–1179 diction of microrna precursors using an ensemble svm classi-
83. Li X, Luo F, Li Q, Meihua Xu, Feng D, Zhang G, Wei Wu (2011) fier with multi-loop features. BMC Bioinf 11(11):1
Identification of new aberrantly expressed mirnas in intestinal- 101. Xue C, Li F, He T, Liu G-P, Li Y, Zhang X (2005) Classi-
type gastric cancer and its clinical significance. Oncol Rep fication of real and pseudo microrna precursors using local
26(6):1431–1439 structure-sequence features and support vector machine. BMC
84. Yamamoto Y, Kosaka N, Tanaka M, Koizumi F, Kanai Y, Mizu- Bioinf 6(1):310
tani T, Murakami Y, Kuroda M, Miyajima A, Kato T et al (2009) 102. Ana Kozomara and Sam Griffiths-Jones (2014) mirbase: anno-
Microrna-500 as a potential diagnostic marker for hepatocellular tating high confidence micrornas using deep sequencing data.
carcinoma. Biomarkers 14(7):529–538 Nucleic Acids Res 42(D1):D68–D73
85. Qu KZ, Zhang Ke, Li HaiRong, Afdhal NH, Albitar M (2011) 103. Seunghyun Park, Seonwoo Min, Hyunsoo Choi, and Sungroh
Circulating micrornas as biomarkers for hepatocellular carci- Yoon (2016) deepmirgene: Deep neural network based precur-
noma. J Clin Gastroenterol 45(4):355–360 sor microrna prediction. arXiv preprint arXiv:1605.00017
86. Zhang C, Wang C, Chen Xi, Yang C, Li Ke, Wang J, Dai J, 104. Cheng S, Guo M, Wang C, Liu X, Liu Y, Xuejian Wu (2015)
Zhibin Hu, Zhou X, Chen L et al (2010) Expression profile of MiRTDL: a deep learning approach for miRNA target predic-
micrornas in serum: a fingerprint for esophageal squamous cell tion. IEEE ACM Trans Comput Biol Bioinf 13(6):1161–1169
carcinoma. Clin Chem 56(12):1871–1879 105. Nadeem MW, Ghamdi MA, Hussain M, Khan MA, Khan KM,
87. Wong T-S, Liu X-B, Wong B-H, Ng R-M, Yuen A-W, Wei Almotiri SH, Butt SA (2020) Brain tumor analysis empowered
WI (2008) Mature mir-184 as potential oncogenic micro- with deep learning: A review, taxonomy, and future challenges.
rna of squamous cell carcinoma of tongue. Clin Cancer Res Brain Sci 10(2):118
14(9):2588–2592 106. Thakur SK, Singh DP, Choudhary J (2020) Lung cancer iden-
88. Sung JJ, Chong WS, Jin H, Lam EK, Shin VY, Yu J, Poon TC, Ng tification: a review on detection and classification. Cancer
SS, Ng EK (2009) 1070 Differential Expression of MicroRNAs Metastasis Rev
in Plasma of Colorectal Cancer Patients: A Potential Marker for 107. Sharif MI, Li JP, Naz J, Rashid I (2020) A comprehensive
Colorectal Cancer Screening. Gastroenterol 136(5):A-165 review on multi-organs tumor detection based on machine
89. Huang Z, Huang D, Ni S, Peng Z, Sheng W, Xiang Du (2010) learning. Pattern Recognit Lett 131:30–37
Plasma micrornas are promising novel biomarkers for early 108. Yassin NI, Omran S, El Houby EM, Allam H (2018) Machine
detection of colorectal cancer. Int J Cancer 127(1):118–126 learning techniques for breast cancer computer aided diagnosis
90. Schreiber R, Mezencev R, Matyunina LV, McDonald JF (2016) using different image modalities: A systematic review. Comput
Evidence for the role of microRNA 374b in acquired cispl- Methods Progr Biomed 156:25–45
atin resistance in pancreatic cancer cells. Cancer Gene Ther 109. Chato L, Latifi S. (2017) Machine learning and deep learning
23(8):241–245 techniques to predict overall survival of brain tumor patients
91. Velagapudi SP, Cameron MD, Haga CL, Rosenberg LH, Lafitte using MRI images. In: 2017 IEEE 17th international confer-
M, Duckett DR, Phinney DG, Disney MD (2016) Design of a ence on bioinformatics and bioengineering (BIBE), pp 9–14
small molecule against an oncogenic noncoding RNA. Proc Natl 110. Montazeri M, Montazeri M, Montazeri M, Beigzadeh A (2016)
Acad Sci 113(21):5898–5903 Machine learning models in breast cancer survival prediction.
92. Hamam R, Ali AM, Alsaleh KA, Kassem M, Alfayez M, Aldah- Technol Health Care 24(1):31–42
mash A, Alajez NM (2016) microRNA expression profiling on 111. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Foti-
individual breast cancer patients identifies novel panel of circu- adis DI (2015) Machine learning applications in cancer prog-
lating microRNA for early detection. Sci Rep 6(1):1–8 nosis and prediction. Comput Struct Biotechnol J 13:8–17
93. Rupaimoole R, Calin GA, Lopez-Berestein G, Sood AK (2016) 112. Shen L, Tan EC (2005) Dimension reduction-based penalized
mirna deregulation in cancer cells and the tumor microenviron- logistic regression for cancer classification using microarray
ment. Cancer Discov 6(3):235–246 data. IEEE/ACM Trans Comput Biol Bioinf 2(2):166–175
94. Cantini L, Isella C, Petti C, Picco G, Chiola S, Ficarra E, Caselle 113. Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF,
M, Medico E (2015) MicroRNA–mRNA interactions underlying Mewes HW (2005) Gene selection from microarray data for
colorectal cancer molecular subtypes. Nat Commun 6(1):1–2 cancer classification—a machine learning approach. Comput
95. Mortazavi A, Williams BA, McCue K, Schaefier L, Wold B Biol Chem 29(1):37–46
(2008) Mapping and quantifying mammalian transcriptomes by
rna-seq. Nat Methods 5(7):621–628
13
A Systematic Review of Applications of Machine Learning in Cancer Prediction and Diagnosis
114. Chu F, Xie W, Wang L (2004) Gene selection and cancer clas- 134. Dwivedi AK (2018) Artificial neural network model for effective
sification using a fuzzy neural network. IEEE Ann Meet Fuzzy cancer classification using microarray gene expression data. Neural
Inf Process NAFIPS 2:555–559 Comput Appl 29(12):1545–1554
115. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosen- 135. Sevakula RK, Singh V, Verma NK, Kumar C, Cui Y (2018) Trans-
wald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI (2000) fer learning for molecular cancer classification using deep neural
Distinct types of diffuse large B-cell lymphoma identified by networks. IEEE ACM Trans Comput Biol Bioinf 16(6):2089–2100
gene expression profiling. Nature 403(6769):503–511 136. Stiglic G, Kokol P (2010) Stability of ranked gene lists in large
116. Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, microarray analysis studies. J Biomed Biotechnol 2010:1–9
Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS 137. Ting FF, Tan YJ, Sim KS (2019) Convolutional neural network
(2001) Classification and diagnostic prediction of cancers using improvement for breast cancer classification. Expert Syst Appl
gene expression profiling and artificial neural networks. Nature 120:103–115
Med 7(6):673–679 138. Mammographic Image Analysis Society (MIAS). (2018). http://
117. Chen X, Cheung ST, So S, Fan ST, Barry C, Higgins J, Lai KM, Ji www.mammoimage.org/databases/ Accessed: 25 January 2018
J, Dudoit S, Ng IO, Van De Rijn M (2002) Gene expression patterns 139. Ghoneim A, Muhammad G, Hossain MS (2020) Cervical cancer
in human liver cancers. Mol Biol Cell 13(6):1929–1939 classification using convolutional neural networks and extreme
118. Wang L, Chu F, Xie W (2007) Accurate cancer classification using learning machines. Future Gener Comput Syst 102:643–649
expressions of very few genes. IEEE/ACM Trans Comput Biol Bio- 140. Yu L, Chen H, Dou Q, Qin J, Heng PA (2016) Automated mela-
inf 4(1):40–53 noma recognition in dermoscopy images via very deep residual
119. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, networks. IEEE Trans Med Imaging 36(4):994–1004
Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T 141. Gutman D, Codella NCF, Celebi E, Helba B, Marchetti M, Mishra
(2001) Multiclass cancer diagnosis using tumor gene expression N, Halpern A (2016) Skin lesion analysis toward melanoma detec-
signatures. Proc Natl Acad Sci 98(26):15149–15154 tion: a challenge at the international symposium on biomedical
120. Cho SB, Won HH (2007) Cancer classification using ensemble of imaging (ISBI) hosted by the International Skin Imaging Collabo-
neural networks with multiple significant gene subsets. Appl Intell ration (ISIC), arXiv preprint arXiv:1605.01397
26(3):243–250 142. Albarqouni S, Baur C, Achilles F, Belagiannis V, Demirci S, Navab
121. Tan TZ, Quek C, Ng GS, Razvi K (2008) Ovarian cancer diagnosis N (2016) Aggnet: deep learning from crowds for mitosis detec-
with complementary learning fuzzy neural network. Artif Intell tion in breast cancer histology images. IEEE Trans Med Imaging
Med 43(3):207–222 35(5):1313–1321
122. Schummer M, Ng W, Bumgarner R, Nelson P, Schummer B, Bed- 143. Von Ahn L (2006) Games with a purpose. Comput 39(6):92–94
narski D et al (1999) Comparative hybridization of an array of 144. Wang P, Wang L, Li Y, Song Q, Lv S, Hu X (2019) Automatic
21,500 ovarian cDNAs for the discovery genes overexpressed in cell nuclei segmentation and classification of cervical Pap smear
ovarian carcinomas. Gene 238:375–385 images. Biomed Signal Process Control 48:93–103
123. Petricoin EF III, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, 145. Zhang L, Lu L, Nogues I, Summers RM, Liu S, Yao J (2017) Deep-
Steinberg SM et al (2002) Use of proteomic patterns in serum to Pap: deep convolutional networks for cervical cell classification.
identify ovarian cancer. Lancet 359(9306):572–577 IEEE J Biomed Health Inf 21(6):1633–1643
124. Glaab E, Bacardit J, Garibaldi JM, Krasnogor N (2012) Using rule- 146. Kim Y, Zheng S, Tang J, Zheng WJ, Li Z, Jiang X. (2020) Anti-
based machine learning for candidate disease gene prioritization cancer Drug Synergy Prediction in Understudied Tissues using
and sample classification of cancer gene expression data. PLoS Transfer Learning. bioRxiv
ONE 7(7):e39932 147. Jiang P, Huang S, Fu Z, Sun Z, Lakowski TM, Hu P (2020) Deep
125. Singh D, Febbo P, Ross K, Jackson D, Manola J et al (2002) Gene graph embedding for prioritizing synergistic anticancer drug com-
expression correlates of clinical prostate cancer behavior. Cancer binations. Comput Struct Biotechnol J 18:427–438
Cell 1:203–209 148. Ekşioğlu I, Tan M (2020) Prediction of Drug Synergy by Ensemble
126. Shipp M, Ross K, Tamayo P, Weng A, Kutok J et al (2002) Dif- Learning. arXiv preprint arXiv:2001.01997
fuse large B-cell lymphoma outcome prediction by gene-expression 149. O’Neil J, Benita Y, Feldman I, Chenard M, Roberts B, Liu Y, Li J,
profiling and supervised machine learning. Nat Med 8:68–74 Kral A, Lejnine S, Loboda A, Arthur W (2016) An unbiased oncol-
127. Chin S, Teschendorff A, Marioni J, Wang Y, Barbosa-Morais N ogy compound screen to identify novel combination strategies. Mol
et al (2007) High-resolution aCGH and expression profiling identi- Cancer Ther 15(6):1155–1162
fies a novel genomic subtype of ER negative breast cancer. Genome 150. Zhang H, Feng J, Zeng A, Payne PR, Li F (2020) Predicting Tumor
Biol 8:R215 Cell Response to Synergistic Drug Combinations Using a Novel
128. Liu Q, Sung AH, Chen Z, Liu J, Chen L, Qiao M, Wang Z, Huang Simplified Deep Learning Model. bioRxiv
X, Deng Y (2011) Gene selection and classification for cancer 151. Kuru HI, Tastan O, Cicek AE (2020) MatchMaker: a deep learning
microarray data based on machine learning and similarity meas- framework for drug synergy prediction. bioRxiv
ures. BMC Genom 12(S5):S1 152. Preuer K, Lewis RP, Hochreiter S, Bender A, Bulusu KC, Klam-
129. Chen KH, Wang KJ, Wang KM, Angelia MA (2014) Applying bauer G (2018) DeepSynergy: predicting anti-cancer drug syn-
particle swarm optimization-based decision tree classifier for ergy with Deep Learning. Bioinformatics 34(9):1538–1546
cancer classification on gene expression data. Appl Soft Comput 153. Wildenhain J, Spitzer M, Dolma S, Jarvik N, White R, Roy M,
24:773–780 Griffiths E, Bellows DS, Wright GD, Tyers M (2015) Predic-
130. Taiwan Cancer Registry, (2012), http://tcr.cph.ntu.edu.tw tion of synergism from chemical-genetic interactions by machine
131. Margoosian A, Abouei J (2013) Ensemble-based classifiers for can- learning. Cell Syst 1(6):383–395
cer classification using human tumor microarray data. In: 2013 21st 154. Janizek JD, Celik S, Lee SI (2018) Explainable machine learning
Iranian conference on electrical engineering (ICEE), IEEE, pp 1–6 prediction of synergistic drug combinations for precision cancer
132. Ramaswamy S et al (2002) Multiclass cancer diagnosis using medicine. bioRxiv, 1:331769
tumor gene expression signatures. Proc Natl Acad Sci PNAS 155. Mason DJ, Eastman RT, Lewis RP, Stott IP, Guha R, Bender A
98(26):15149–15154 (2018) Using machine learning to predict synergistic antimalarial
133. Abdel-Zaher AM, Eldeib AM (2016) Breast cancer classification compound combinations with novel structures. Front Pharmacol
using deep belief networks. Expert Syst Appl 46:139–144 9:1096
13
A. Sharma, R. Rani
156. Chen G, Tsoi A, Xu H, Zheng WJ (2018) Predict effective drug 176. Chen R, Liu X, Jin S, Lin J, Liu J (2018) Machine learning for
combination by deep belief network and ontology fingerprints. J drug-target interaction prediction. Molecules 23(9):2208
Biomed Inf 85:149–154 177. Wen M, Zhang Z, Niu S, Sha H, Yang R, Yun Y, Lu H (2017)
157. Sharma A, Rani R (2018) An integrated framework for identifica- Deep-learning-based drug–target interaction prediction. J Pro-
tion of effective and synergistic anti-cancer drug combinations. teome Res 16(4):1401–1409
J Bioinf Comput Biol 16(05):1850017 178. Yuan Q, Gao J, Wu D, Zhang S, Mamitsuka H, Zhu S (2016)
158. Held MA, Langdon CG, Platt JT, Graham-Steed T, Liu Z, DrugE-Rank: improving drug–target interaction prediction of
Chakraborty A, Bacchiocchi A, Koo A, Haskins JW, Bosen- new candidate drugs or targets by ensemble learning to rank.
berg MW, Stern DF (2013) Genotype-selective combination Bioinformatics 32(12):i18-27
therapies for melanoma identified by high throughput drug 179. Law V, Knox C, Djoumbou Y, Jewison T. An Chi Guo, Yifeng
screening. Cancer Discov 3(1):52–67 Liu, Adam Maciejewski, David Arndt, Michael Wilson, Vanessa
159. Jang IS, Neto EC, Guinney J, Friend SH, Margolin AA (2014) Neveu, and others (2014) DrugBank 4.0: shedding new light on
Systematic assessment of analytical methods for drug sen- drug metabolism. Nucleic Acids Res,42:D1
sitivity prediction from cancer cell line data. Biocomput 180. Zhang J, Zhu M, Chen P, Wang B (2017) Drugrpe: Random pro-
2014:63–74 jection ensemble approach to drug-target interaction prediction.
160. Menden MP, Iorio F, Garnett M, McDermott U, Benes CH, Bal- Neurocomp 228:256–262
lester PJ, Saez-Rodriguez J (2013) Machine learning prediction 181. He Z, Zhang J, Shi XH, Hu LL, Kong X, Cai YD, Chou KC
of cancer cell sensitivity to drugs based on genomic and chemical (2010) Predicting drug-target interaction networks based on
properties. PLoS ONE 8(4):e61318 functional groups and biological features. PLoS ONE 5(3):e9603
161. Turki T, Wei Z, Wang JT (2017) Transfer learning approaches to 182. Xie L, He S, Song X, Bo X, Zhang Z (2018) Deep learning-based
improve drug sensitivity prediction in multiple myeloma patients. transcriptome data classification for drug-target interaction pre-
IEEE Access 5:7381–7393 diction. BMC Genom 19(7):667
162. Wan Q, Pal R (2014) An ensemble based top performing 183. Tian K, Shao M, Wang Y, Guan J, Zhou S (2016) Boosting com-
approach for NCI-DREAM drug sensitivity prediction challenge. pound-protein interaction prediction by deep learning. Methods
PLoS ONE 9(6):e101183 110:64–72
163. Dong Z, Zhang N, Li C, Wang H, Fang Y, Wang J, Zheng X 184. Kuhn M, Szklarczyk D, Pletscher-Frankild S, Blicher TH, Von
(2015) Anticancer drug sensitivity prediction in cell lines from Mering C, Jensen LJ, Bork P (2014) STITCH 4: integration of
baseline gene expression through recursive feature selection. protein–chemical interactions with user data. Nucleic Acids Res
BMC Cancer 15(1):1–2 42(D1):D401–D407
164. Rahman R, Matlock K, Ghosh S, Pal R (2017) Heterogeneity 185. Wang J, Archambault B, Xu Y, Taleyarkhan RP (2010) Numeri-
aware random forest for drug sensitivity prediction. Sci Rep cal simulation and experimental study on Resonant Acoustic
7(1):1–1 Chambers—For novel, high-efficiency nuclear particle detectors.
165. Yuan H, Paskov I, Paskov H, González AJ, Leslie CS (2016) Nucl Eng Des 240(11):3716–3726
Multitask learning improves prediction of cancer drug sensitivity. 186. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy
Sci Rep 6:31619 SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer
166. Ali M, Aittokallio T (2019) Machine learning and feature selec- EL (2014) Pfam: the protein families database. Nucl Eng Des
tion for drug response prediction in precision oncology applica- 42(D1):D222–D230
tions. Biophys Rev 11(1):31–39 187. Feng Q, Dueva E, Cherkasov A, Ester M (2018) Padme: a deep
167. Haider S, Rahman R, Ghosh S, Pal R (2015) A copula based learning-based framework for drug-target interaction prediction.
approach for design of multivariate random forests for drug sen- arXiv preprint arXiv:1807.09741
sitivity prediction. PLoS ONE 10(12):e0144490 188. He T, Heidemeyer M, Ban F, Cherkasov A, Ester M (2017) Sim-
168. He X, Folkman L, Borgwardt K (2018) Kernelized rank learn- boost: A readacross approach for predicting drug–target binding
ing for personalized drug recommendation. Bioinformatics affinities using gradient boosting machines. J Cheminf 9(1):24
34(16):2808–2816 189. Davis MI, Hunt JP, Herrgard S, Ciceri P, Wodicka LM, Pal-
169. Matlock K, De Niz C, Rahman R, Ghosh S, Pal R (2018) Inves- lares G, Hocker M, Treiber DK, Zarrinkar PP (2011) Compre-
tigation of model stacking for drug sensitivity prediction. BMC hensive analysis of kinase inhibitor selectivity. Nat Biotechnol
Bioinf 19(3):21–33 29(11):1046–1051
170. Riddick G, Song H, Ahn S, Walling J, Borges-Rivera D, Zhang 190. Metz JT, Johnson EF, Soni NB, Merta PJ, Kifle L, Hajduk PJ
W, Fine HA (2011) Predicting in vitro drug sensitivity using (2011) Navigating the kinome. Nat Chem Biol 7(4):200
Random Forests. Bioinformatics 27(2):220–224 191. Tang J, Szwajda A, Shakyawar S, Xu T, Hintsanen P, Wenner-
171. Sharma A, Rani R (2019) Drug sensitivity prediction frame- berg K, Aittokallio T (2014) Making sense of large-scale kinase
work using ensemble and multi-task learning. Int J Mach Learn inhibitor bioactivity data sets: A comparative and integrative
Cybern 11:1231–1240 analysis. J Chem Inf Model 54(3):735–743
172. Sharma A, Rani R (2019) Ensembled machine learning frame- 192. Xie L, Zhang Z, He S, Bo X, Song X (2017) Drug—target inter-
work for drug sensitivity prediction. IET Syst Biol 14(1):39–46 action prediction with a deep-learning-based model. In: 2017
173. Ezzat A, Wu M, Li XL, Kwoh CK (2017) Drug-target interaction IEEE international conference on bioinformatics and biomedi-
prediction using ensemble learning and dimensionality reduction. cine (BIBM), pp 469–476
Methods 129:81–88 193. Sharma A, Rani R (2018) BE-DTI’: Ensemble framework for
174. Ezzat A, Wu M, Li XL, Kwoh CK (2016) Drug-target interaction drug target interaction prediction using dimensionality reduc-
prediction via class imbalance-aware ensemble learning. BMC tion and active learning. Comput Methods Programs Biomed
Bioinf 17(19):267–276 165:151–162
175. Tabei Y, Pauwels E, Stoven V, Takemoto K, Yamanishi Y (2012)
Identification of chemogenomic features from drug–target inter- Publisher’s Note Springer Nature remains neutral with regard to
action networks using interpretable classifiers. Bioinformatics jurisdictional claims in published maps and institutional affiliations.
28(18):i487–i494
13