Colorectal Cancer
Colorectal Cancer
https://doi.org/10.1007/s11517-018-1930-0
ORIGINAL ARTICLE
Abstract
Colorectal cancer (CRC) is a common cancer responsible for approximately 600,000 deaths per year worldwide. Thus, it is very
important to find the related factors and detect the cancer accurately. However, timely and accurate prediction of the disease is
challenging. In this study, we build an integrated model based on logistic regression (LR) and support vector machine (SVM) to
classify the CRC into cancer and normal samples. From various factors, human location, age, gender, BMI, and cancer tumor
type, tumor grade, and DNA, of the cancer, we select the most significant factors (p < 0.05) using logistic regression as main
features, and with these features, a grid-search SVM model is designed using different kernel types (Linear, radial basis function
(RBF), Sigmoid, and Polynomial). The result of the logistic regression indicates that the Firmicutes (AUC 0.918), Bacteroidetes
(AUC 0.856), body mass index (BMI) (AUC 0.777), and age (AUC 0.710) and their combined factors (AUC 0.942) are effective
for CRC detection. And the best kernel type is RBF, which achieves an accuracy of 90.1% when k = 5, and 91.2% when k = 10.
This study provides a new method for colorectal cancer prediction based on independent risky factors.
Due to the challenges illustrated above, our main work is to (2) The proper SVM kernel type is selected by comparing
reduce the redundant features and choose the best predictors the Linear, radial basis function (RBF), Sigmoid, and
of colorectal cancer, thus improving the prediction perfor- Polynomial functions in the same dataset, and we opti-
mance of the disease. In this study, we choose the significant mize the parameters (C, g) of SVM using the grid.py in
factors of CRC by building a logistic regression (LR) model LIBSVM. The best kernel function is RBF, and its accu-
and choose the best factors that affect the disease by calculat- racy is 90.1% when k = 5, and 91.2% when k = 10.
ing the AUC of ROC curve, and then we classify the healthy (3) LR + SVM integrated model is established to predict co-
and CRC samples by building support vector machine (SVM) lorectal cancer, whose accuracy is obviously higher than
[19] to predict the disease based on the features selected by LR other models, solving the problem of feature redundancy
and ROC; we name the integrated model LR + SVM. Logistic and low accuracy of cancer prediction to a large extent.
regression (LR) aims to find the most significant features
(p < 0.05) of disease by screening each single factor out to The paper is organized as follows: Section 2 describes the
reduce redundant features according to the p value calculated. related work on the research of prediction methods of colorec-
When p < 0.05, it indicates that the factor evaluated by the tal cancer. Section 3 describes the methods and materials used
model is significant in diagnosing the disease and thus in the paper. Section 4 performs experiment results to show
selecting the most important factors that affect the disease. the effect of the LR + SVM integrated model proposed in this
ROC curve is a graphical plot that illustrates the diagnostic paper. Discussion and conclusion are described in Section 5
ability of a binary classifier system as its discrimination and Section 6 respectively.
threshold is varied [20, 21]. The area under the ROC curve
(AUC) is the ROC score, which is widely used as measure of a
feature discriminatory power. As ROC curve approaches up- 2 Related works
per left, namely, the area closer to 1, the selected features’
classification ability is improved. In other words, ROC curve As a complex, volatile, easy-metastasizing common cancer,
can be used to evaluate the classification performance of the colorectal cancer is closely related to various factors. In order
selected features. SVM is widely used in the problem of clas- to further understand and find the cause of the disease, many
sification, particularly in the problem of binary classification. researchers have devoted to exploring the relevant risk factors.
The principle of SVM for binary classification is to find a Traditional factors, such as diet [3], age [26], and body fat
hyper plane to segment samples according to positive and index (BMI) [27], were found to affect the incidence of
negative examples, which is similar with the principle of di- CRC in early time. Recently, there has been considerable in-
viding normal and abnormal samples of certain diseases. terest in using whole-genome expression profiles for the clas-
In the paper, the LR and ROC select CRC-related factors sification of colorectal cancer. Brennan et al. researched gut
Firmicutes (AUC 0.918), Bacteroidetes (AUC 0.856), body bacteria difference between CRC patients and healthy people
mass index (BMI) (AUC 0.777), and age (AUC 0.710) and [5]. Chu et al. analyzed the difference of CRC and healthy
their combined factors (AUC 0.942) from sample information, groups, finding MMP7, K1AA1199, CA1, and CLCA4 mark-
and we consider the combined factors, whose value of AUC is er genes [28]. In addition to genes that are altered in CRC,
best, as input training features of SVM and the other compa- microRNAs show potential as biomarkers of this disease [29].
rable machine learning techniques, namely, random forest Moreover, virtual colonoscopy [30], tests for DNA methyla-
(RF) [22], naive bayes (NB) [22, 23], k-Nearest Neighbor tion markers in stool [31], and fecal occult blood test [32] are
(KNN) [24], and artificial neural networks (ANNs) [25]. All found to be potentially useful diagnostic strategies.
these techniques are common and widely used in classifica- Having known the risky factors of the disease, many vari-
tion problems. The result shows that the LR + SVM integrated ous models have been established gradually to detect and pre-
model achieves the accuracy of 90.1% when k = 5, and 91.2% dict the cancer. Jung et al. proposed traditional and genetic risk
when k = 10, higher than the comparable models. scores [8] method to predict the cancer according to numerous
In conclusion, motivated by the existing problems of can- single nucleotide polymorphisms (SNPs) [33], finding that
cer prediction, we propose an effective method based on the genetic variants were useful for predicting risk to CRC.
LR and grid optimized SVM for colorectal cancer prediction. Cubiella et al. used COLONPREDICT score to detect the
The main contributions of this paper are as follows: colorectal cancer and achieved an AUC of 0.82 and 83.1%
sensitivity [9]. Coppedè et al. applied neural networks to link
(1) We select the most significant features using logistic re- genetic factors to DNA methylation [10], providing a new
gression model from various factors of samples, and val- thought to colorectal cancer prediction. Peng et al., from the
idate that by ROC curve, avoiding the redundant fea- perspective of statistics, demonstrated the carcinoembryonic
tures. The best combined factor we selected achieves a antigen in draining venous blood (d-CEA) may be a prognos-
sensitivity of 0.933 and specificity of 0.908. tic factor for colorectal cancer patients, with the sensitivity and
Med Biol Eng Comput
specificity of 90% and 40% in the prediction of colorectal location, age, and gender, and BMI and tumor type, tumor
cancer [11]. Zhang et al. used logistic regression and back- grade, and DNA, of all participants is obtained.
propagation neural network and SVM to build model accord-
ing serum tumor markers, achieving an accuracy of 75% and 3.2 Metagenomic sequencing and quality control
82.5% respectively [34]. These methods have contributed a lot
to the detection of the cancer; however, some of them focused The raw sequences of the healthy and CRC individuals are
on traditional factors, and some models used redundant fea- obtained on an Illumina Hiseq 2000/2500 (Illumina, San
tures as model input variations, thus leading to low prediction Diego, USA) platform. All samples are pair-end sequenced
accuracy and unbalance of sensitivity and specificity. These with 100 bp read length. After downloading these sequences,
researches on the prediction of colorectal cancer are still lim- we use Trimmomatic [36] to filter the metagenomic reads
ited and needs further study. containing Illumina adapters, low-quality reads, and trim
Our study provides evidence considering genetic factors as low-quality read edges. After using Metaphlan2 [37] to obtain
well as traditional risk factors in risk prediction models can im- the microbiome of sequences with default parameters, we
prove their utility. We build an integrated model based on logistic generate taxonomic relative abundance profiles at the phylum,
regression to select the significant and independent features, in- class, family, genus, and species levels.
cluding traditional factors (age, BMI) and genetic-based factors
(phylum: Firmicutes and Bacteroidetes); and gird optimized 3.3 Microbiome diversity of CRC patients
SVM to classify samples, achieving an accuracy of 90.1% when
k = 5, and 91.2% when k = 10. Using Metaphlan2, we obtain the gut microbiome at the phy-
lum level and make comparison between healthy individuals
and CRC patients. Our analysis shows that Firmicutes ac-
3 Materials and methods counts for 57.03% in the healthy individuals and 62.91% in
the CRC patients. Meanwhile, Bacteroidetes contributes ap-
The methodology used in the paper is presented as a flowchart proximately 30.04% and 22.53% in the healthy individuals
in Fig. 1, and the details will be presented in the following and CRC patients, respectively.
sections. Notably, the abundance of bacteria at the different levels
has important influence on CRC prevalence. Firmicutes in the
3.1 Data set gut microbiome of CRC patients consists of the following
genera: Faecalibacterium, Blautia, Enterococcus,
The data set used in this study comes from the passage [35], Subdoligranulum, Dorea, Roseburia, Mycoplasma, and
which can be downloaded from the NCBI website. The study Streptococcus [38]. Each sub-bacterium constitutes more than
population includes participants without previous colon or 1% of the total bacteria in this population.
rectal surgery, CRC, and inflammatory or infectious injuries Relatively abundant genera (the abundance is more than 1%)
of the intestine. Patients needing emergency colonoscopy are in other phyla include Bacteroides, Prevotella, Parabacteroides,
excluded. In analyzing the related CRC factors, we select 89 and Porphyromonas of Bacteroidetes; Bifidobacterium and
healthy individuals and 92 CRC patients. All participants are Collinsella of Actinobacteria; and Escherichia/Shigella of
local residents of France and Germany. Information, including Proteobacteria [39]. According to the statistics, the relative
Linear Cancer
LR LR+NB
Normalization
Prediction
RBF LR+KNN
Data
Original Data
LR+SVM Healthy
Sigmoid
ROC Curve
LR+ANNs
Polynomial
Fig. 1 Flowchart depicting the method adopted in the study. LR and ROC LR + KNN, and LR + ANNs are compared with LR + SVM. After these
curve are used to select features. Linear, RBF, Sigmoid, and Polynomial steps, the CRC and healthy classification can be obtained, and the best
are kernel types of SVM, and the best kernel is RBF. LR + RF, LR + NB, model is selected
Med Biol Eng Comput
People
3.6 Logistic regression model
60000
Logistic regression (LR) model aims to find the most sig-
50000
nificant features (p < 0.05) of disease. In the paper, we use
40000 it to screen each single index out to reduce redundant
30000 features, which are carried out by SPSS 17.0. And the
predicted probabilities of LR are analyzed using
20000 New case
Receiver Operating Characteristic (ROC) curve [20, 21],
10000 Death a strong indicator of the model’s ability to distinguish two
0 groups, and the area under the curve (AUC) is used to
0-49 50-64 65-79 80+ Age measure the strength of the equation. If the value of
Fig. 2 Estimated numbers of new colorectal cancer cases and deaths by AUC is less than 0.5, it indicates that any observation is
age, USA, 2014. purely a matter of chance and a value close to 1 indicates
Med Biol Eng Comput
(a) (b)
\ \
&ODVV
&ODVV
&ODVV &ODVV
[ [
Fig. 4 Classification types of SVM: a the linear classification of SVM, b the non-linear classification of SVM
that the equation strongly discriminates two groups. After A Lagrange multiplier αi is introduced to optimize the
being measured by these methods, the most significant parameters:
features are selected as the input of SVM.
maxLðαÞ ¼ ∑ni¼1 αi −1=2 ∑ni; j¼1 αi α j yi y j ðK ðxi ; yi ÞÞ ð6Þ
3.7 Creation of the SVM prediction model
with subject to
Table 1 Confusion
matrix Predicted
yi ðwi ⋅xi þ bÞ≥ 1−ξi : ð4Þ
1 0
Kernel function K is also used to solve the specific
problem-linear function, which may not be suitable. Actual 1 TP FN
0 FP TN
K xi ; x j ¼ ϕ x i ; x j ð5Þ TP true positive, FN false negative, FP
false positive, TN true negative
Med Biol Eng Comput
language. The principle of the calculation can be divided into The Matthews correlation coefficient (MCC) is defined as:
the four situations:
*
ðTP TN Þ−ðFN FPÞ
Case 1: The actual classification is 1, and the predicted result MCC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 100%
ðTP þ FN Þ ðTN þ FPÞ ðTP þ FPÞ ðTN þ FN Þ
is also 1. In this case, the number of TP plus 1;
Case 2: The actual classification is 1, and the predicted result ð11Þ
is 0. In this case, the number of FN plus 1;
Case 3: The actual classification is 0, and the predicted result 3.9 Validation
is 1. In this case, the number of FP plus 1;
Case 4: The actual classification is 0, and the predicted result The Bk-fold^ and Bleave-one-out^ methods are used for
is also 0. In this case, the number of TN plus 1. cross validation. The k-fold cross-validation method
splits the data randomly into k equal (or almost equal)
parts. The algorithm is then run k times using k-1 parts
TP, FN, FP, and TN can be calculated by the above situa- as the training set and the remaining part as the test set.
tion and are used to calculate the measures. In this study, we use the Bk-fold^ validation method to
Sensitivity is defined as follows: validate the models. We set k as 5 and 10, which are
frequently used in cross validation. Finally, we compute
TP * the average value of sensitivity (SN), specificity (SP),
SN ¼ 100% ð8Þ
TP þ FN accuracy (ACC), and MCC when k = 5 and k = 10.
4 Results
TN *
SP ¼ 100% ð9Þ
TN þ FP 4.1 Logistic regression analysis result
4.2 ROC curve analysis of single relevant factor coefficient parameter in RBF model. We set the range of C
and combined factor as [2−10, 210] and the range of g as [2-5, 25]. Then, we use
grid.py in LIBSVM to find the best (C, g). And the best (C, g)
ROC curve is a graphical plot that illustrates the diagnostic is (0.5, 0.0078125) when k = 5; and the best (C, g) is (0.35,
ability of a binary classifier system as its discrimination 1.414) when k = 10. Table 4 and Table 5 show the SN, SP,
threshold is varied. It is generally above the linear equation ACC, and MCC, and corresponding TP, TN, FP, and FN of
y = x. As ROC curve approaches upper left, namely, the area different kernels in fivefold cross validation and tenfold
closer to 1, classification is improved. Figure 5 shows the validation.
ROC curve of the binary logistic regression model. The sen-
sitivity, specificity, and AUC of each single factor and com- 4.4 Comparison of LR + SVM with other models
bined factor are presented in Table 3. And we can see, the
AUC of combined factor is 0.942, which is closer to 1 than To illustrate the performance of LR + SVM, we compare it
the other single factor and shows the best classification per- with the following comparable methods: random forest (RF)
formance of the model. [22], Naive Bayes (NB) [22, 23], k-Nearest Neighbor (KNN)
[24], and artificial neural networks (ANNs) [25]. These
4.3 SVM model establishment methods are common and widely used in the area of disease
prediction. In this paper, the variations selected by logistic
It is acknowledged that kernel selection is an important task in regression model are used as input features for these models,
building the SVM model. It decides the classification proba- so we name the integrated models LR + RF, LR + NB, LR +
bility of the model. In the paper, we divide the samples into KNN, and LR + ANNs. For RF classifier, it builds
training group and testing group to choose the best kernel multiple decision trees and the final outcome is evaluated
type. Fivefold and tenfold cross validation are carried out to depending on voting of these trees, the problem of over-
validate these models. The final result suggests that RBF is the fitting occurring in single decision tree approach is eliminated;
best type of building the best SVM prediction model. Figure 6 experiment is conducted using number of single classifiers as
shows the comparison results of different kernel types. After 3 and 5, and achieves best result when the number is 3. For
choosing the proper kernel type, we select the optimal model KNN classifier, its classification performance relies on the
parameters (C, g), where C is cost parameter and g is number of nearest neighbors, experiment is carried out by
Fig. 6 Cross validation results of different kernel types: a the fivefold cross validation result of different kernel types, b the tenfold cross validation result
of different kernel types.
Med Biol Eng Comput
TP FN SN (%) TN FP SP (%)
using K = 3, K = 5, K = 7, and K = 9, obtaining the best accu- with SVM using an RBF kernel, yielding an accuracy of
racy when K = 3. For ANNs classifier, the parameters affect its 90.1% when k = 5 and 91.2% when k = 10, which are both
performance; experiment is conducted by using iteration = 50, higher than the comparable models.
100, 200; learning rate = 0.01, 0.02, 0.03; number of layers = For the other comparable prediction models, LR + ANNs is
2, 3, 4, 5; number of units = 2, 3, 4, 5, 6 to train data, and the best model based on fivefold and tenfold cross validations
finally obtain the best accuracy when the learning rate is set as but LR + SVM. ANNs have unique properties including ro-
0.01; the iteration is 100, and the number of units in hidden bust performance in dealing with noisy or incomplete input
layer is 4 and the number of layers is 3. Besides, fivefold and patterns, high fault tolerance, and the ability to generalize
tenfold cross validations are carried out to validate these from the training data. And the classification result of this
models. These comparison results are shown in Fig. 7. method is related to the layer number of the network, the
Table 6 and Table 7 show the SN, SP, ACC, and MCC, and number of units, the iteration and learning rate. The best ac-
corresponding TP, TN, FP, and FN of different prediction curacy is obtained when the learning rate is set as 0.01, the
models with best parameter in fivefold cross validation and iteration is 100, and the number of units in hidden layer is 4
tenfold validation. We can know the best prediction model is and the number of layers is 3. RF classifier builds multiple
LR + SVM, achieving an accuracy of 90.1% when k = 5, and decision trees and the final outcome is evaluated depending on
91.2% when k = 10, followed by LR + ANNs with an accura- voting of these trees, the problem of over-fitting occurring in
cy of 88.4% when k = 5, and 90.1% when k = 10. single decision tree approach is eliminated. The best result is
obtained when the number of single classifiers is 3. For KNN
classifier, its classification performance relies on the number
5 Discussions of nearest neighbors, obtaining the best accuracy when the
number of nearest neighbors K = 3. NB model has an advan-
CRC is a common malignant tumor of the alimentary system, tage of training rapidly, it suffers from assuming independence
and it is the fourth most prevalent in male malignant tumor of features, and the NB model achieves the worst result in our
and third most prevalent in females; therefore, early diagnosis experiment. For fivefold cross validation, SVM achieves
of CRC is important. Herberman first proposed the concept of 87.0% SN, 93.3% SP, 90.1% ACC, and 80.30% MCC, and
tumor markers in 1978, and other relevant gene has been is the best model; ANNs achieves 87.0% SN, 89.9% SP,
reproduced. Risk factors in CRC patients and healthy individ- 88.4% ACC, and 76.84%, and is next only to SVM; both
uals are evaluated in this study. Results show that the average RF and KNN model achieve better results in terms of SN,
of age and BMI and the abundance of the phylum-Firmicutes SP, ACC, and MCC than NB, whose SN, SP, ACC, and
are all significantly higher in the CRC group than in the con- MCC are 76.1%, 85.4%, 80.7%, and 63.55% respectively.
trol group. Meanwhile, this study provides a prediction model For tenfold cross validation, SVM achieves 91.3% SN,
to differentiate between normal and CRC samples using our 91.0% SP, 91.2% ACC, and 82.32% MCC, and is the best
method, which combined logistic regression feature selection model; ANNs achieves 87.0% SN, 93.3% SP, 90.1% ACC,
TP FN SN (%) TN FP SP (%)
Fig. 7 Cross validation results of prediction models: a the fivefold validation result of classification models, b the tenfold validation result of
classification models
and 80.30% MCC, and is next only to SVM; both RF and Based on our exiting study, we will predict the cancer with
KNN model achieve better results in terms of SN, SP, ACC, consensus molecular subtypes. Colorectal cancer (CRC) is a
and MCC than NB, whose SN, SP, ACC, and MCC are frequently lethal disease with heterogeneous outcomes and
83.7%, 87.6%, 85.6%, and 71.35% respectively. From the drug responses. For the diagnosis of CRC, there have been
above discussion, SVM achieves the best performance wheth- some problems that influence the diagnosis result. One of the
er five or tenfold cross validation is used and keeps a balance main problems is inconsistence among the reported gene
between sensitivity and specificity. expression–based CRC classifications and facilitates clinical
This study mainly uses intelligent algorithms based on translation. To solve the problem, we plan to explore the fur-
multiple CRC factors to build prediction models of CRC. In ther relationships between CRC classification and clinical
fact, the classification methods have been gradually applied in translation. It is acknowledged that CRC can be divided into
clinical practice. Computer-assisted diagnosis models such as four consensus molecular subtypes with distinguishing fea-
the SVM and random forest (RF) model were widely tures: CMS1 (MSI Immune), CMS2 (Canonical), CMS3
employed in clinical fields. Chen et al. applied RF to clinical (Metabolic), and CMS4 (Mesenchyme). Our study will be
metabolomics for phenotypic discrimination and biomarker carried out based on the gene expression data, which is widely
selection of colorectal cancer [50]. Saccá et al. used SVM to accepted as a relevant source of disease stratification. The data
classify Electroencephalography (EEG) signal which is fun- can be used includes GSE42284, GSE33113, GSE39582,
damental to monitor the brain functions in clinical practice GSE35896, GSE13067, GSE13294, GSE14333, GSE17536,
[51]. Esteva et al. proposed deep neural networks-CNNs to GSE20916, GSE2109, and GSE2109, and all of these data
carry out dermatologist-level classification of skin cancer sets can be found on the Gene Expression Omnibus (GEO).
[52]. Zeevi et al. solved the problem of personalized nutrition Then, we normalize the data using the robust multi-array av-
by prediction of glycemic responses by building many deci- erage (RMA) method to improve the classifier’s performance.
sion trees [41]. Classification algorithms have been gradually For classifying the samples, we will choose one of the deep
applied to clinical practice. And in the times of big data, these learning methods-CNNs as the classifier based on
methods must make it convenient for clinical practice in min- TensorFlow. Finally, we will adjust the related parameters of
ing big data. CNNs to choose the best classification result.
TP FN SN (%) TN FP SP (%)
TP FN SN (%) TN FP SP (%)
13. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for 34. Zhang B, Liang XL, Gao HY et al (2016) Models of logistic regres-
cancer classification using support vector machines. US J Mach sion analysis, support vector machine, and back-propagation neural
Learn 46:389–422 network based on serum tumor markers in colorectal cancer diag-
14. Ahmad F, Mat Isa NA, Hussain Z, Osman MK, Sulaiman SN nosis. J Genetics Mol Res 15:1–10
(2015) GA-based feature selection and parameter optimization of 35. Zeller G, Tap J, Voigt AY, Sunagawa S, Kultima JR, Costea PI,
an ANN in diagnosing breast cancer. J Pattern Analysis Appl 18: Amiot A, Bohm J, Brunetti F, Habermann N, Hercog R, Koch M,
861–870 Luciani A, Mende DR, Schneider MA, Schrotz-King P, Tournigand
15. Peng S, Xu Q, Ling XB, Peng X, du W, Chen L (2003) Molecular C, Tran van Nhieu J, Yamada T, Zimmermann J, Benes V, Kloor M,
classification of cancer types from microarray data using the com- Ulrich CM, von Knebel Doeberitz M, Sobhani I, Bork P (2014)
bination of genetic algorithms and support vector machines. J Febs Potential of fecal microbiota for early-stage detection of colorectal
Lett 555:358–362 cancer. US J Mol Systems Biol 10:766–783
16. Liu W, Zheng W L, Lu B L (2016) Emotion recognition using 36. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible
multimodal deep learning trimmer for Illumina sequence data. N Eng J Bioinformatics 30:
17. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez 2114–2120
JM, Herrera F (2014) A review of microarray datasets and applied 37. Truong DT, Franzosa EA, Tickle EL et al (2015) MetaPhlAn2 for
feature selection methods. US J Inform Sci 282:111–135 enhanced metagenomic taxonomic profiling. US J Nat Methods 12:
18. Li T, Zhang C, Ogihara M (2004) A comparative study of feature 902–903
selection and multiclass classification methods for tissue classifica- 38. Vincent C, Manges AR (2015) Antimicrobial use, human gut mi-
tion based on gene expression. N Eng J Bioinform 20:2429–2437 crobiota and Clostridium difficile colonization and infection. J
19. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector Antibiotics 4:230–253
machines. J ACM Trans Intel Systems & Technol 2:1–27 39. Endesfelder D, zu-Castell W, Ardissone A et al (2014)
20. Park SI, Tae-Ho O (2016) Application of receiver operating char- Compromised gut microbiota networks in children with anti-islet
acteristic (ROC) curve for evaluation of diagnostic test perfor- cell autoimmunity. US J Diabetes DB_131676 63:2006–2014
mance. J Vet Clin 33:97–108 40. Gao R, Gao Z, Huang L, Qin H (2017) Gut microbiota and colo-
21. Kim KA, Choi JY, Yoo TK, Kim SK, Chung KS, Kim DW (2013) rectal cancer. Eur J Eur J Clin Microbiol Infect Dis 36:1–13
Mortality prediction of rats in acute hemorrhagic shock using machine 41. Zeevi D, Korem T, Zmora N, Israeli D, Rothschild D, Weinberger
learning techniques. Germ J Med Biol Eng Comp 51:1059–1067 A, Ben-Yacov O, Lador D, Avnit-Sagi T, Lotan-Pompan M, Suez J,
22. Chowdhury A R, Chatterjee T, Banerjee S (2018) A random forest Mahdi JA, Matot E, Malka G, Kosower N, Rein M, Zilberman-
classifier-based approach in the detection of abnormalities in the Schapira G, Dohnalová L, Pevsner-Fischer M, Bikovsky R,
retina. Germ J Med Biol Eng Comp Available at doi:https://doi. Halpern Z, Elinav E, Segal E (2015) Personalized nutrition by pre-
org/10.1007/s11517-018-1878-0 diction of glycemic responses. US J Cell 163:1079–1094
23. Zhang H, Yu P, Xiang ML, Li XB, Kong WB, Ma JY, Wang JL, 42. Schmid D, Leitzmann M F (2014) Television viewing and time
Zhang JP, Zhang J (2016) Prediction of drug-induced eosinophilia spent sedentary in relation to cancer risk: a meta-analysis. J Natl
adverse effect by using SVM and naïve Bayesian approaches. Germ Cancer Instit
J Med Biol Eng Comp 54(2–3):361–369 43. Emmerzaal TL, Kiliaan AJ, Gustafson DR (2015) 2003-2013: a
24. Zhang S, Li X, Zong M et al (2018) Efficient KNN classification decade of body mass index, Alzheimer's disease, and dementia. J.
with different numbers of nearest neighbors. US J IEEE Trans J Alzheimers Dis 43:739–755
Neural Networks Learn Systems (99):1–12 44. Alfa-Wali M, Boniface S, Sharma A et al (2015) Metabolic syndrome
25. Bertolaccini L, Solli P, Pardolesi A, Pasini A (2017) An overview of (Mets) and risk of colorectal cancer (CRC): a systematic review and
the use of artificial neural networks in lung cancer research. J meta-analysis. J World J Surg Med Radiat Oncol 4:41–52
Thorac Dis 9(4):924–931 45. Sears CL, Garrett WS (2014) Microbes, microbiota, and colon can-
26. Siegel R, DeSantis C, Jemal A (2014) Colorectal cancer statistics, cer. US J Cell Host Microbe 15:317–328
2014. J CA: Cancer J Clin 64:104–117 46. Zhu Q, Jin Z, Wu W, Gao R et al (2014) Analysis of the intestinal
27. Lee J, Meyerhardt JA, Giovannucci E, Jeon JY (2015) Association lumen microbiota in an animal model of colorectal cancer. US J
between body mass index and prognosis of colorectal cancer: a meta- PLoS One e90849
analysis of prospective cohort studies. US J PloS one 10:e0120706 47. Zhao M, Fu C, Ji L, Tang K, Zhou M (2011) Feature selection and
28. Chu CM, Yao CT, Chang YT et al (2014) Gene expression profiling of parameter optimization for support vector machines: a new ap-
colorectal tumors and normal mucosa by microarrays meta-analysis proach based on genetic algorithm with feature chromosomes. J
using prediction analysis of microarray, artificial neural network, clas- Expert Syst App l38:5197–5204
sification, and regression trees. J Dis Markers 2014:459–462 48. Hu X, Wong KK, Young GS, Guo L, Wong ST (2011) Support vector
29. Orang AV, Barzegari A (2014) MicroRNAs in colorectal cancer: machine multiparametric MRI identification of pseudoprogression
from diagnosis to targeted therapy. Asian Pac J Cancer Prev 15: from tumor recurrence in patients with resected glioblastoma. US J
6989–6999 Journal of Magnetic Resonance Imaging 33:296–305
30. Philip AK, Lubner MG, Harms B (2011) Computed tomographic 49. Zhang H, Yu P, Xiang ML, Li XB, Kong WB, Ma JY, Wang JL,
colonography. J Surg Clin North Am 91:127–139 Zhang JP, Zhang J (2016) Prediction of drug-induced eosinophilia
31. Zhang H, Qi J, Wu YQ, Zhang P, Jiang J, Wang QX, Zhu YQ adverse effect by using SVM and naive Bayesian approaches. Germ
(2014) Accuracy of early detection of colorectal tumors by stool J Medical & Biological Engineering & Computing 54:361–370
methylation markers: a meta-analysis. World J Gastroenterol 20: 50. Chen T, Cao Y, Zhang Y et al Random forest in clinical metabolo-
14040–14050 mics for phenotypic discrimination and biomarker selection.
32. Ip S, Sokoro AA, Kaita L, Ruiz C, McIntyre E, Singh H (2014) Use Evidence-Based Complementray and Alternative Medicine 2013,
of fecal occult blood testing in hospitalized patients: results of an 2013:298183–298193
audit. Can J Gastroenterol Hepatol 28:489–494 51. Saccá V, Campolo M, Mirarchi D et al (2018) On the classification
33. Li H, Jin Z, Li X et al (2017) Associations between single- of EEG signal by using an SVM based algorithm
nucleotide polymorphisms and inflammatory bowel disease- 52. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM,
associated colorectal cancers in inflammatory bowel disease pa- Thrun S (2017) Dermatologist-level classification of skin cancer
tients: a meta-analysis. J Clinical & Transl Oncol 19:1–10 with deep neural networks. Nature 542:115–118
Med Biol Eng Comput
Hong Liu , Professor and Ph.D., Dianjie Lu received the Ph.D. de-
supervisor of School of gree in computer science from the
Information Science and Institute of Computing
Engeering at Shandong Normal Technology, Chinese Academy
University. She received Ph.D. of Science, in 2012. Currently,
degree in engineering from the he is an associate professor at
Institute of Computing Shandong Normal University.
Technology, Chinese Academy His research interests include
of Science, Beijing, China, in crowd cooperative computing,
1998. She is an academic leader cognitive internet of things.
in computer science and technol-
ogy. Her research is the cross
study of distributed artificial intel-
ligence, software engineering,
and computer-aided design, in-
cluding the research of multi-agent system and co-evolutionary comput-
ing technology.