0% found this document useful (0 votes)
18 views12 pages

Colorectal Cancer

This study presents a novel method for predicting colorectal cancer (CRC) using an integrated model that combines logistic regression (LR) and support vector machine (SVM) techniques. The model identifies significant risk factors such as the presence of specific gut microbiota (Firmicutes and Bacteroidetes), age, and BMI, achieving an accuracy of 90.1% to 91.2% in classifying CRC samples. This approach addresses challenges in prediction accuracy and feature redundancy, offering a reliable tool for early detection of CRC.

Uploaded by

Nadia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views12 pages

Colorectal Cancer

This study presents a novel method for predicting colorectal cancer (CRC) using an integrated model that combines logistic regression (LR) and support vector machine (SVM) techniques. The model identifies significant risk factors such as the presence of specific gut microbiota (Firmicutes and Bacteroidetes), age, and BMI, achieving an accuracy of 90.1% to 91.2% in classifying CRC samples. This approach addresses challenges in prediction accuracy and feature redundancy, offering a reliable tool for early detection of CRC.

Uploaded by

Nadia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Medical & Biological Engineering & Computing

https://doi.org/10.1007/s11517-018-1930-0

ORIGINAL ARTICLE

A reliable method for colorectal cancer prediction based on feature


selection and support vector machine
Dandan Zhao 1,2 & Hong Liu 1,2 & Yuanjie Zheng 1,2 & Yanlin He 1,2 & Dianjie Lu 1,2 & Chen Lyu 1,2

Received: 16 December 2017 / Accepted: 17 November 2018


# International Federation for Medical and Biological Engineering 2018

Abstract
Colorectal cancer (CRC) is a common cancer responsible for approximately 600,000 deaths per year worldwide. Thus, it is very
important to find the related factors and detect the cancer accurately. However, timely and accurate prediction of the disease is
challenging. In this study, we build an integrated model based on logistic regression (LR) and support vector machine (SVM) to
classify the CRC into cancer and normal samples. From various factors, human location, age, gender, BMI, and cancer tumor
type, tumor grade, and DNA, of the cancer, we select the most significant factors (p < 0.05) using logistic regression as main
features, and with these features, a grid-search SVM model is designed using different kernel types (Linear, radial basis function
(RBF), Sigmoid, and Polynomial). The result of the logistic regression indicates that the Firmicutes (AUC 0.918), Bacteroidetes
(AUC 0.856), body mass index (BMI) (AUC 0.777), and age (AUC 0.710) and their combined factors (AUC 0.942) are effective
for CRC detection. And the best kernel type is RBF, which achieves an accuracy of 90.1% when k = 5, and 91.2% when k = 10.
This study provides a new method for colorectal cancer prediction based on independent risky factors.

Keywords Colorectal cancer . Logistic regression . Support vector machine . Microbiome

1 Introduction risk [4]. Predominant bacterial phyla associated with CRC


are Firmicutes and Bacteroidetes [5].
Cancer is one of the main reasons of the morbidity and mor- With the related factors found gradually, some studies
tality worldwide, and in 2012, 14 million new cancerous pa- have employed various methods for detection and prediction
tients were diagnosed and 8.2 million patients died of cancer, of the disease, including the traditional methods and machine
including colorectal cancer [1]. Colorectal cancer (CRC) is a learning methods [6–11]. These methods have made great
common form of cancer. Early detection of the kind of cancer contribution to CRC detection; however, they can only
and its treatment before metastasis can increase the survival achieve low prediction accuracy and cannot achieve a bal-
rate and time of the cancer patients [2]. A study found that ance between sensitivity and specificity due to the improper
rates of CRC patients were attributed to human diet. People at predictors and redundant features. These researches are still
high risk of CRC were those who mostly consume animal fat, limited and need further study. Meanwhile, some prevalent
red meat, and processed meat and rarely ate grains, dietary methods, for example, logistic regression (LR) [12], recur-
fiber, and vegetables [3]. Moreover, recent surveys have sug- sive feature elimination (RFE) [13], minimum-redundancy-
gested that the gut microbiome, through its interactions with maximum-relevance (mRMR) [14], genetic algorithm (GA)
host metabolism, plays significant role in influencing CRC [15], and deep learning (DL) [16] have rarely been used in the
area of predicting CRC since their wide use in classification
problems. It is valuable to explore their performance on CRC
* Hong Liu prediction. For colorectal cancer prediction, there are still
[email protected]
tremendous challenges: firstly, feature selection contains re-
dundant features, thus leading poor performance on classifi-
1
Shandong Normal University, School of Information Science and cation algorithm [17]. Secondly, the prediction accuracy is
Engineering, No. 88, Wenhua East Road, Jinan, People’s Republic of
China
affected by the choice of predictor [18]. So, to achieve higher
2
prediction accuracy, proper number of significant features
Shandong Provincial Key Laboratory for Novel Distributed
Computer Software Technology, Jinan, China
and the function of predictor must be determined.
Med Biol Eng Comput

Due to the challenges illustrated above, our main work is to (2) The proper SVM kernel type is selected by comparing
reduce the redundant features and choose the best predictors the Linear, radial basis function (RBF), Sigmoid, and
of colorectal cancer, thus improving the prediction perfor- Polynomial functions in the same dataset, and we opti-
mance of the disease. In this study, we choose the significant mize the parameters (C, g) of SVM using the grid.py in
factors of CRC by building a logistic regression (LR) model LIBSVM. The best kernel function is RBF, and its accu-
and choose the best factors that affect the disease by calculat- racy is 90.1% when k = 5, and 91.2% when k = 10.
ing the AUC of ROC curve, and then we classify the healthy (3) LR + SVM integrated model is established to predict co-
and CRC samples by building support vector machine (SVM) lorectal cancer, whose accuracy is obviously higher than
[19] to predict the disease based on the features selected by LR other models, solving the problem of feature redundancy
and ROC; we name the integrated model LR + SVM. Logistic and low accuracy of cancer prediction to a large extent.
regression (LR) aims to find the most significant features
(p < 0.05) of disease by screening each single factor out to The paper is organized as follows: Section 2 describes the
reduce redundant features according to the p value calculated. related work on the research of prediction methods of colorec-
When p < 0.05, it indicates that the factor evaluated by the tal cancer. Section 3 describes the methods and materials used
model is significant in diagnosing the disease and thus in the paper. Section 4 performs experiment results to show
selecting the most important factors that affect the disease. the effect of the LR + SVM integrated model proposed in this
ROC curve is a graphical plot that illustrates the diagnostic paper. Discussion and conclusion are described in Section 5
ability of a binary classifier system as its discrimination and Section 6 respectively.
threshold is varied [20, 21]. The area under the ROC curve
(AUC) is the ROC score, which is widely used as measure of a
feature discriminatory power. As ROC curve approaches up- 2 Related works
per left, namely, the area closer to 1, the selected features’
classification ability is improved. In other words, ROC curve As a complex, volatile, easy-metastasizing common cancer,
can be used to evaluate the classification performance of the colorectal cancer is closely related to various factors. In order
selected features. SVM is widely used in the problem of clas- to further understand and find the cause of the disease, many
sification, particularly in the problem of binary classification. researchers have devoted to exploring the relevant risk factors.
The principle of SVM for binary classification is to find a Traditional factors, such as diet [3], age [26], and body fat
hyper plane to segment samples according to positive and index (BMI) [27], were found to affect the incidence of
negative examples, which is similar with the principle of di- CRC in early time. Recently, there has been considerable in-
viding normal and abnormal samples of certain diseases. terest in using whole-genome expression profiles for the clas-
In the paper, the LR and ROC select CRC-related factors sification of colorectal cancer. Brennan et al. researched gut
Firmicutes (AUC 0.918), Bacteroidetes (AUC 0.856), body bacteria difference between CRC patients and healthy people
mass index (BMI) (AUC 0.777), and age (AUC 0.710) and [5]. Chu et al. analyzed the difference of CRC and healthy
their combined factors (AUC 0.942) from sample information, groups, finding MMP7, K1AA1199, CA1, and CLCA4 mark-
and we consider the combined factors, whose value of AUC is er genes [28]. In addition to genes that are altered in CRC,
best, as input training features of SVM and the other compa- microRNAs show potential as biomarkers of this disease [29].
rable machine learning techniques, namely, random forest Moreover, virtual colonoscopy [30], tests for DNA methyla-
(RF) [22], naive bayes (NB) [22, 23], k-Nearest Neighbor tion markers in stool [31], and fecal occult blood test [32] are
(KNN) [24], and artificial neural networks (ANNs) [25]. All found to be potentially useful diagnostic strategies.
these techniques are common and widely used in classifica- Having known the risky factors of the disease, many vari-
tion problems. The result shows that the LR + SVM integrated ous models have been established gradually to detect and pre-
model achieves the accuracy of 90.1% when k = 5, and 91.2% dict the cancer. Jung et al. proposed traditional and genetic risk
when k = 10, higher than the comparable models. scores [8] method to predict the cancer according to numerous
In conclusion, motivated by the existing problems of can- single nucleotide polymorphisms (SNPs) [33], finding that
cer prediction, we propose an effective method based on the genetic variants were useful for predicting risk to CRC.
LR and grid optimized SVM for colorectal cancer prediction. Cubiella et al. used COLONPREDICT score to detect the
The main contributions of this paper are as follows: colorectal cancer and achieved an AUC of 0.82 and 83.1%
sensitivity [9]. Coppedè et al. applied neural networks to link
(1) We select the most significant features using logistic re- genetic factors to DNA methylation [10], providing a new
gression model from various factors of samples, and val- thought to colorectal cancer prediction. Peng et al., from the
idate that by ROC curve, avoiding the redundant fea- perspective of statistics, demonstrated the carcinoembryonic
tures. The best combined factor we selected achieves a antigen in draining venous blood (d-CEA) may be a prognos-
sensitivity of 0.933 and specificity of 0.908. tic factor for colorectal cancer patients, with the sensitivity and
Med Biol Eng Comput

specificity of 90% and 40% in the prediction of colorectal location, age, and gender, and BMI and tumor type, tumor
cancer [11]. Zhang et al. used logistic regression and back- grade, and DNA, of all participants is obtained.
propagation neural network and SVM to build model accord-
ing serum tumor markers, achieving an accuracy of 75% and 3.2 Metagenomic sequencing and quality control
82.5% respectively [34]. These methods have contributed a lot
to the detection of the cancer; however, some of them focused The raw sequences of the healthy and CRC individuals are
on traditional factors, and some models used redundant fea- obtained on an Illumina Hiseq 2000/2500 (Illumina, San
tures as model input variations, thus leading to low prediction Diego, USA) platform. All samples are pair-end sequenced
accuracy and unbalance of sensitivity and specificity. These with 100 bp read length. After downloading these sequences,
researches on the prediction of colorectal cancer are still lim- we use Trimmomatic [36] to filter the metagenomic reads
ited and needs further study. containing Illumina adapters, low-quality reads, and trim
Our study provides evidence considering genetic factors as low-quality read edges. After using Metaphlan2 [37] to obtain
well as traditional risk factors in risk prediction models can im- the microbiome of sequences with default parameters, we
prove their utility. We build an integrated model based on logistic generate taxonomic relative abundance profiles at the phylum,
regression to select the significant and independent features, in- class, family, genus, and species levels.
cluding traditional factors (age, BMI) and genetic-based factors
(phylum: Firmicutes and Bacteroidetes); and gird optimized 3.3 Microbiome diversity of CRC patients
SVM to classify samples, achieving an accuracy of 90.1% when
k = 5, and 91.2% when k = 10. Using Metaphlan2, we obtain the gut microbiome at the phy-
lum level and make comparison between healthy individuals
and CRC patients. Our analysis shows that Firmicutes ac-
3 Materials and methods counts for 57.03% in the healthy individuals and 62.91% in
the CRC patients. Meanwhile, Bacteroidetes contributes ap-
The methodology used in the paper is presented as a flowchart proximately 30.04% and 22.53% in the healthy individuals
in Fig. 1, and the details will be presented in the following and CRC patients, respectively.
sections. Notably, the abundance of bacteria at the different levels
has important influence on CRC prevalence. Firmicutes in the
3.1 Data set gut microbiome of CRC patients consists of the following
genera: Faecalibacterium, Blautia, Enterococcus,
The data set used in this study comes from the passage [35], Subdoligranulum, Dorea, Roseburia, Mycoplasma, and
which can be downloaded from the NCBI website. The study Streptococcus [38]. Each sub-bacterium constitutes more than
population includes participants without previous colon or 1% of the total bacteria in this population.
rectal surgery, CRC, and inflammatory or infectious injuries Relatively abundant genera (the abundance is more than 1%)
of the intestine. Patients needing emergency colonoscopy are in other phyla include Bacteroides, Prevotella, Parabacteroides,
excluded. In analyzing the related CRC factors, we select 89 and Porphyromonas of Bacteroidetes; Bifidobacterium and
healthy individuals and 92 CRC patients. All participants are Collinsella of Actinobacteria; and Escherichia/Shigella of
local residents of France and Germany. Information, including Proteobacteria [39]. According to the statistics, the relative

SVM Kernel Classification


Feature Selection Selection LR+RF

Linear Cancer
LR LR+NB
Normalization

Prediction

RBF LR+KNN
Data

Original Data
LR+SVM Healthy
Sigmoid
ROC Curve
LR+ANNs
Polynomial

Fig. 1 Flowchart depicting the method adopted in the study. LR and ROC LR + KNN, and LR + ANNs are compared with LR + SVM. After these
curve are used to select features. Linear, RBF, Sigmoid, and Polynomial steps, the CRC and healthy classification can be obtained, and the best
are kernel types of SVM, and the best kernel is RBF. LR + RF, LR + NB, model is selected
Med Biol Eng Comput

abundance of the five genera, namely, Bacteroides, Roseburia, 


Alistipes, Eubacterium, and Parasutterella, was significantly
higher in healthy individuals than that in CRC patients; mean-  
while, another five genera, namely, Porphyromonas, Escherichia/
Shigella, Enterococcus, Streptococcus, and Peptostreptococcus,
are significantly lower in the healthy people than in the CRC
patients [40].  

3.4 Risk factor analysis

Factors that may affect interpersonal difference in CRC in-


clude gender, age, BMI [41], and gut microbiome. For each
risk factor, estimates of relative risk and 95% confidence in-      NJP
tervals for CRC are extracted [42]. Fig. 3 Percentage of BMI distribution in the study
Age exhibits great influence as risk factors of cancer
(Fig. 2). The number of CRC patients increases with age, healthy individuals, whereas Bacteroidetes is less abun-
and males have a higher rate for the disease compared with dant in CRC patients than in healthy individuals.
women of the same age. Given their important implications in CRC, only the above
BMI, a primary risk factor for CRC, can be divided into the aspects are included in the study. These aspects are added as
following levels: normal (18.5–22.9 kg/m2), overweight input of the prediction model, and the proper algorithm is
(23.0–24.9 kg/m2), obese (25.0–29.9 kg/m2), and severely selected to carry out the study.
obese (> 30 kg/m2) [43]. A recent meta-analysis reported that
participants with a BMI greater than 25 kg/m2 have a 24% 3.5 Data normalization
increased prevalence of colorectal cancer compared that with
a lower BMI [44]. In the present study, the BMI (kg/m2) dis- Data normalization step is used to scale the feature values
tribution of the participants is shown in Fig. 3. The average for several reasons: (1) to avoid features in greater numer-
BMI of CRC patients is approximately 25.35 kg/m2, which is ic ranges dominating those in smaller numeric ranges, (2)
higher than that of healthy individuals (24.64 kg/m2). to avoid numerical difficulties during the calculation, and
The type of bacteria of the human gut is also an (3) to help to get higher classification accuracy [47]. Each
important aspect that may affect CRC progression [45]. feature can be scaled to the range [0, 1] as follows:
Researchers have also exerted efforts to search for gut
bacterial types possibly related to CRC by comparing 0 V−Min
their abundance between patients and healthy individ- V ¼ ð1Þ
Max−Min
uals. The amounts of different phyla, especially
Firmicutes and Bacteroidetes, can be used to determine
the CRC risk of an individual [46]. In specific, Where V is the original value, Min and Max are the lower
Firmicutes is more abundant in CRC patients than in and upper bounds of the feature value, and V′ is the scaled
value.

People
3.6 Logistic regression model
60000
Logistic regression (LR) model aims to find the most sig-
50000
nificant features (p < 0.05) of disease. In the paper, we use
40000 it to screen each single index out to reduce redundant
30000 features, which are carried out by SPSS 17.0. And the
predicted probabilities of LR are analyzed using
20000 New case
Receiver Operating Characteristic (ROC) curve [20, 21],
10000 Death a strong indicator of the model’s ability to distinguish two
0 groups, and the area under the curve (AUC) is used to
0-49 50-64 65-79 80+ Age measure the strength of the equation. If the value of
Fig. 2 Estimated numbers of new colorectal cancer cases and deaths by AUC is less than 0.5, it indicates that any observation is
age, USA, 2014. purely a matter of chance and a value close to 1 indicates
Med Biol Eng Comput

(a) (b)
\ \
&ODVV

&ODVV
&ODVV &ODVV
[ [
Fig. 4 Classification types of SVM: a the linear classification of SVM, b the non-linear classification of SVM

that the equation strongly discriminates two groups. After A Lagrange multiplier αi is introduced to optimize the
being measured by these methods, the most significant parameters:
features are selected as the input of SVM.
 
maxLðαÞ ¼ ∑ni¼1 αi −1=2 ∑ni; j¼1 αi α j yi y j ðK ðxi ; yi ÞÞ ð6Þ
3.7 Creation of the SVM prediction model

As an optimal theory for small-sample leaning, the main


idea of SVM is to separate different classes using hy- 0 ≤ αi ≤ C and ∑ni¼1 αi yi ¼ 0; i ¼ 1; 2; 3; :::; n ð7Þ
perplanes. SVM achieves high accuracy rates when the
In the study, LIBSVM [19] is used as an SVM classifier.
data are linearly separable (Fig. 4a). And kernel func-
We choose the best kernel type by comparing Linear, RBF,
tions can make non-linearly separable data [48, 49] sep-
Sigmoid, and Polynomial. And we use grid search method-
arated linearly (Fig. 4b). A brief discussion on SVM is
grid.py in LIBSVM to find the best (C, g) for kernel.
provided: SVM input is training set s = {xi,yi} (i=1, 2,
Integrated with LR, the model achieves higher accuracy than
3,…n) of vector of features xi∈ X together with their
comparable models.
known classes yi∈{0, 1}. For separating hyperplane wi ⋅
xi + b = 0, xiis the feature vectors; wi is the weight of
vectorxi; b is the scalar used to define the position of
3.8 Classifier performance measures
separating hyperplane. In the linear classification, it
maximizes the margin by minimizing ‖w2‖/2 according
In this part, measures, such as sensitivity (SN), specificity
to the constraint
(SP), accuracy (ACC), and Matthews’ correlation coefficient
(MCC) are used to evaluate the performance of classifiers. All
yi ðwi ⋅xi þ bÞ≥ 1: ð2Þ these above measures denote the mean value of different k-
fold validation results and their values can be calculated ac-
cording to the confusion matrix, which contains information
Considering the noise with slack variables ξiand the penalty
about actual and predicted classifications done by a classifica-
for errorC, the best hyperplane can be obtained using quadrat-
tion system (see Table 1). In Table 1, we assume that the labels
ic optimization by minimizing
of data are 1 (CRC samples) and 0 (healthy samples). The
calculation of TP, FN, FP, and TN is carried out in Python
‖w2 ‖=2 þ C∑ni¼1 ξi ð3Þ

with subject to
Table 1 Confusion
matrix Predicted
yi ðwi ⋅xi þ bÞ≥ 1−ξi : ð4Þ
1 0
Kernel function K is also used to solve the specific
problem-linear function, which may not be suitable. Actual 1 TP FN
0 FP TN
   
K xi ; x j ¼ ϕ x i ; x j ð5Þ TP true positive, FN false negative, FP
false positive, TN true negative
Med Biol Eng Comput

language. The principle of the calculation can be divided into The Matthews correlation coefficient (MCC) is defined as:
the four situations:
*
ðTP  TN Þ−ðFN  FPÞ
Case 1: The actual classification is 1, and the predicted result MCC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 100%
ðTP þ FN Þ  ðTN þ FPÞ  ðTP þ FPÞ  ðTN þ FN Þ
is also 1. In this case, the number of TP plus 1;
Case 2: The actual classification is 1, and the predicted result ð11Þ
is 0. In this case, the number of FN plus 1;
Case 3: The actual classification is 0, and the predicted result 3.9 Validation
is 1. In this case, the number of FP plus 1;
Case 4: The actual classification is 0, and the predicted result The Bk-fold^ and Bleave-one-out^ methods are used for
is also 0. In this case, the number of TN plus 1. cross validation. The k-fold cross-validation method
splits the data randomly into k equal (or almost equal)
parts. The algorithm is then run k times using k-1 parts
TP, FN, FP, and TN can be calculated by the above situa- as the training set and the remaining part as the test set.
tion and are used to calculate the measures. In this study, we use the Bk-fold^ validation method to
Sensitivity is defined as follows: validate the models. We set k as 5 and 10, which are
frequently used in cross validation. Finally, we compute
TP * the average value of sensitivity (SN), specificity (SP),
SN ¼ 100% ð8Þ
TP þ FN accuracy (ACC), and MCC when k = 5 and k = 10.

Specificity is defined as follows:

4 Results
TN *
SP ¼ 100% ð9Þ
TN þ FP 4.1 Logistic regression analysis result

We use LR to choose factors as independent variation.


Accuracy is defined as follows:
The final logistic analysis is conducted and the result is
shown in Table 2. As is shown in the table, the four
*
TP þ TN independent variations Firmicutes, Bacteroidetes, BMI,
ACC ¼ 100% ð10Þ and age are entered into the logistic regression equation,
TP þ TN þ FP þ FN
and their corresponding p value are 0.006, 0.001, 0.040,
and 0.017 respectively, all of which are less than 0.05,
suggesting statistical significance.
Table 2 Logistic regression equation variations

B S.E. Wals. d.f. Sig. Exp (B)

Step 1 Firmicutes 1.653 0.374 19.571 1 0.000 5.222


Constant − 9.822 2.146 20.953 1 0.000 0.000
Step 2 Bacteroidetes 0.996 0.345 8.317 1 0.004 2.707
Firmicutes 1.332 0.385 11.937 1 0.001 3.787
Constant − 2.015 1.310 2.366 1 0.124 0.133
Step 3 BMI 0.375 0.159 5.585 1 0.018 1.456
Bacteroidetes 0.777 0.548 2.014 1 0.156 2.176
Firmicutes 1.372 0.397 11.966 1 0.001 3.944
Constant − 15.781 3.418 21.320 1 0.000 0.000
Step 4 Firmicutes 1.086 0.394 7.586 1 0.006 2.961
Bacteroidetes 1.702 0.531 10.028 1 0.001 5.256
BMI 1.220 0.502 6.187 1 0.040 3.388
Age 0.942 0.395 5.692 1 0.017 2.564
Constant − 17.756 3.802 21.814 1 0.000 0.000
Fig. 5 ROC curve of logistic regression model of single factor and
combined factor
Med Biol Eng Comput

Table 3 Comparison of the


sensitivity, specificity, and AUC Variation Firmicutes Bacteroidetes BMI Age Combined factor
and standard error of different
variations Sensitivity 0.928 0.902 0.823 0.807 0.933
Specificity 0.804 0.776 0.681 0.698 0.908
AUC 0.918 0.856 0.777 0.710 0.942
Standard error 0.029 0.046 0.051 0.055 0.026

4.2 ROC curve analysis of single relevant factor coefficient parameter in RBF model. We set the range of C
and combined factor as [2−10, 210] and the range of g as [2-5, 25]. Then, we use
grid.py in LIBSVM to find the best (C, g). And the best (C, g)
ROC curve is a graphical plot that illustrates the diagnostic is (0.5, 0.0078125) when k = 5; and the best (C, g) is (0.35,
ability of a binary classifier system as its discrimination 1.414) when k = 10. Table 4 and Table 5 show the SN, SP,
threshold is varied. It is generally above the linear equation ACC, and MCC, and corresponding TP, TN, FP, and FN of
y = x. As ROC curve approaches upper left, namely, the area different kernels in fivefold cross validation and tenfold
closer to 1, classification is improved. Figure 5 shows the validation.
ROC curve of the binary logistic regression model. The sen-
sitivity, specificity, and AUC of each single factor and com- 4.4 Comparison of LR + SVM with other models
bined factor are presented in Table 3. And we can see, the
AUC of combined factor is 0.942, which is closer to 1 than To illustrate the performance of LR + SVM, we compare it
the other single factor and shows the best classification per- with the following comparable methods: random forest (RF)
formance of the model. [22], Naive Bayes (NB) [22, 23], k-Nearest Neighbor (KNN)
[24], and artificial neural networks (ANNs) [25]. These
4.3 SVM model establishment methods are common and widely used in the area of disease
prediction. In this paper, the variations selected by logistic
It is acknowledged that kernel selection is an important task in regression model are used as input features for these models,
building the SVM model. It decides the classification proba- so we name the integrated models LR + RF, LR + NB, LR +
bility of the model. In the paper, we divide the samples into KNN, and LR + ANNs. For RF classifier, it builds
training group and testing group to choose the best kernel multiple decision trees and the final outcome is evaluated
type. Fivefold and tenfold cross validation are carried out to depending on voting of these trees, the problem of over-
validate these models. The final result suggests that RBF is the fitting occurring in single decision tree approach is eliminated;
best type of building the best SVM prediction model. Figure 6 experiment is conducted using number of single classifiers as
shows the comparison results of different kernel types. After 3 and 5, and achieves best result when the number is 3. For
choosing the proper kernel type, we select the optimal model KNN classifier, its classification performance relies on the
parameters (C, g), where C is cost parameter and g is number of nearest neighbors, experiment is carried out by

Fig. 6 Cross validation results of different kernel types: a the fivefold cross validation result of different kernel types, b the tenfold cross validation result
of different kernel types.
Med Biol Eng Comput

Table 4 Fivefold cross validation


results of different kernel types Prediction models Positives Negatives ACC (%) MCC (%)

TP FN SN (%) TN FP SP (%)

Linear 75 17 81.5 80 9 89.9 85.6 65.63


RBF 80 12 87.0 83 6 93.3 90.1 80.30
Sigmoid 80 12 87.0 81 8 91.0 89.0 77.99
Polynomial 75 17 81.5 79 10 88.8 85.1 70.41

using K = 3, K = 5, K = 7, and K = 9, obtaining the best accu- with SVM using an RBF kernel, yielding an accuracy of
racy when K = 3. For ANNs classifier, the parameters affect its 90.1% when k = 5 and 91.2% when k = 10, which are both
performance; experiment is conducted by using iteration = 50, higher than the comparable models.
100, 200; learning rate = 0.01, 0.02, 0.03; number of layers = For the other comparable prediction models, LR + ANNs is
2, 3, 4, 5; number of units = 2, 3, 4, 5, 6 to train data, and the best model based on fivefold and tenfold cross validations
finally obtain the best accuracy when the learning rate is set as but LR + SVM. ANNs have unique properties including ro-
0.01; the iteration is 100, and the number of units in hidden bust performance in dealing with noisy or incomplete input
layer is 4 and the number of layers is 3. Besides, fivefold and patterns, high fault tolerance, and the ability to generalize
tenfold cross validations are carried out to validate these from the training data. And the classification result of this
models. These comparison results are shown in Fig. 7. method is related to the layer number of the network, the
Table 6 and Table 7 show the SN, SP, ACC, and MCC, and number of units, the iteration and learning rate. The best ac-
corresponding TP, TN, FP, and FN of different prediction curacy is obtained when the learning rate is set as 0.01, the
models with best parameter in fivefold cross validation and iteration is 100, and the number of units in hidden layer is 4
tenfold validation. We can know the best prediction model is and the number of layers is 3. RF classifier builds multiple
LR + SVM, achieving an accuracy of 90.1% when k = 5, and decision trees and the final outcome is evaluated depending on
91.2% when k = 10, followed by LR + ANNs with an accura- voting of these trees, the problem of over-fitting occurring in
cy of 88.4% when k = 5, and 90.1% when k = 10. single decision tree approach is eliminated. The best result is
obtained when the number of single classifiers is 3. For KNN
classifier, its classification performance relies on the number
5 Discussions of nearest neighbors, obtaining the best accuracy when the
number of nearest neighbors K = 3. NB model has an advan-
CRC is a common malignant tumor of the alimentary system, tage of training rapidly, it suffers from assuming independence
and it is the fourth most prevalent in male malignant tumor of features, and the NB model achieves the worst result in our
and third most prevalent in females; therefore, early diagnosis experiment. For fivefold cross validation, SVM achieves
of CRC is important. Herberman first proposed the concept of 87.0% SN, 93.3% SP, 90.1% ACC, and 80.30% MCC, and
tumor markers in 1978, and other relevant gene has been is the best model; ANNs achieves 87.0% SN, 89.9% SP,
reproduced. Risk factors in CRC patients and healthy individ- 88.4% ACC, and 76.84%, and is next only to SVM; both
uals are evaluated in this study. Results show that the average RF and KNN model achieve better results in terms of SN,
of age and BMI and the abundance of the phylum-Firmicutes SP, ACC, and MCC than NB, whose SN, SP, ACC, and
are all significantly higher in the CRC group than in the con- MCC are 76.1%, 85.4%, 80.7%, and 63.55% respectively.
trol group. Meanwhile, this study provides a prediction model For tenfold cross validation, SVM achieves 91.3% SN,
to differentiate between normal and CRC samples using our 91.0% SP, 91.2% ACC, and 82.32% MCC, and is the best
method, which combined logistic regression feature selection model; ANNs achieves 87.0% SN, 93.3% SP, 90.1% ACC,

Table 5 Tenfold cross validation


results of different kernel types Prediction models Positives Negatives ACC (%) MCC (%)

TP FN SN (%) TN FP SP (%)

Linear 79 13 85.9 74 15 83.1 84.5 69.06


RBF 84 8 91.3 81 8 91.0 91.2 82.32
Sigmoid 80 12 87.0 81 8 91.0 89.0 77.99
Polynomial 79 13 85.9 74 15 83.1 84.5 69.06
Med Biol Eng Comput

Fig. 7 Cross validation results of prediction models: a the fivefold validation result of classification models, b the tenfold validation result of
classification models

and 80.30% MCC, and is next only to SVM; both RF and Based on our exiting study, we will predict the cancer with
KNN model achieve better results in terms of SN, SP, ACC, consensus molecular subtypes. Colorectal cancer (CRC) is a
and MCC than NB, whose SN, SP, ACC, and MCC are frequently lethal disease with heterogeneous outcomes and
83.7%, 87.6%, 85.6%, and 71.35% respectively. From the drug responses. For the diagnosis of CRC, there have been
above discussion, SVM achieves the best performance wheth- some problems that influence the diagnosis result. One of the
er five or tenfold cross validation is used and keeps a balance main problems is inconsistence among the reported gene
between sensitivity and specificity. expression–based CRC classifications and facilitates clinical
This study mainly uses intelligent algorithms based on translation. To solve the problem, we plan to explore the fur-
multiple CRC factors to build prediction models of CRC. In ther relationships between CRC classification and clinical
fact, the classification methods have been gradually applied in translation. It is acknowledged that CRC can be divided into
clinical practice. Computer-assisted diagnosis models such as four consensus molecular subtypes with distinguishing fea-
the SVM and random forest (RF) model were widely tures: CMS1 (MSI Immune), CMS2 (Canonical), CMS3
employed in clinical fields. Chen et al. applied RF to clinical (Metabolic), and CMS4 (Mesenchyme). Our study will be
metabolomics for phenotypic discrimination and biomarker carried out based on the gene expression data, which is widely
selection of colorectal cancer [50]. Saccá et al. used SVM to accepted as a relevant source of disease stratification. The data
classify Electroencephalography (EEG) signal which is fun- can be used includes GSE42284, GSE33113, GSE39582,
damental to monitor the brain functions in clinical practice GSE35896, GSE13067, GSE13294, GSE14333, GSE17536,
[51]. Esteva et al. proposed deep neural networks-CNNs to GSE20916, GSE2109, and GSE2109, and all of these data
carry out dermatologist-level classification of skin cancer sets can be found on the Gene Expression Omnibus (GEO).
[52]. Zeevi et al. solved the problem of personalized nutrition Then, we normalize the data using the robust multi-array av-
by prediction of glycemic responses by building many deci- erage (RMA) method to improve the classifier’s performance.
sion trees [41]. Classification algorithms have been gradually For classifying the samples, we will choose one of the deep
applied to clinical practice. And in the times of big data, these learning methods-CNNs as the classifier based on
methods must make it convenient for clinical practice in min- TensorFlow. Finally, we will adjust the related parameters of
ing big data. CNNs to choose the best classification result.

Table 6 Fivefold cross validation


results of prediction models Prediction models Positives Negatives ACC (%) MCC (%)

TP FN SN (%) TN FP SP (%)

LR + RF 78 14 84.8 80 9 89.9 87.3 74.72


LR + NB 70 22 76.1 76 13 85.4 80.7 63.55
LR + KNN 75 17 81.5 78 11 87.6 84.5 69.24
LR + SVM 80 12 87.0 83 6 93.3 90.1 80.30
LR + ANNs 80 12 87.0 80 9 89.9 88.4 76.84
Med Biol Eng Comput

Table 7 Tenfold cross validation


results of prediction models Prediction models Positives Negatives ACC (%) MCC (%)

TP FN SN (%) TN FP SP (%)

LR + RF 82 10 89.1 80 9 89.9 89.5 79.01


LR + NB 77 15 83.7 78 11 87.6 85.6 71.35
LR + KNN 80 12 87.0 78 11 87.6 87.3 74.59
LR + SVM 84 8 91.3 81 8 91.0 91.2 82.32
LR + ANNs 80 12 87.0 83 6 93.3 90.1 80.30

6 Conclusions Funding information This research is supported by the National Natural


Science Foundation of China (61876102, 61472232, 61572300,
61402270, 61602286), Taishan Scholar Program of Shandong Province
Colorectal cancer (CRC) is a common form of cancer. in China (TSHW201502038), and Natural Science Foundation of
Early detection of the cancer and its treatment before me- Shandong Province in China (ZR2016FB13).
tastasis can increase the survival rate and time of the can-
cer patients. There are still tremendous challenges in Compliance with ethical standards
predicting the cancer. Firstly, feature selection contains
redundant features, thus leading poor performance on Conflict of interest The authors declare that they have no conflict of
interest.
classification algorithm. Secondly, the prediction accuracy
is affected by the choice of predictor. This study provides
an effective model to distinguish the normal and CRC
samples as for these problems of disease prediction. In References
summary, the main contributions are as follows: firstly,
we select the most significant factors using logistic regres- 1. Zadeh SA, Sj SMC, Mohammadi Z (2017) A novel and reliable
sion model from various factors of samples and validate computational intelligence system for breast cancer detection.
that by ROC curve. The selected features Firmicutes, Germ J Med Biol Eng Comp 9:1–12
2. Pal JK, Ray SS, Pal SK (2015) Identifying relevant group of
Bacteroidetes, BMI, and age are entered into the logistic miRNAs in cancer using fuzzy mutual information. Germ J
regression equation, and their corresponding p values Medical & Biological Engineering & Computing 54:701–710
are 0.006, 0.001, 0.040, and 0.017. The AUC of these 3. Chan AT, Giovannucci EL (2010) Primary prevention of colorectal
features are 0.918, 0.856, 0.777, and 0.710, and all of cancer. J Gastroenterol 138:2029–2043
them are lower than combination feature with the AUC 4. Saleh M, Trinchieri G (2010) Innate immune mechanisms of colitis
and colitis-associated colorectal cancer. N Eng J Nature Rev
of 0.942, sensitivity of 0.933, and specificity of 0.908. Immunol 11:9–20
Secondly, the proper SVM kernel type is selected by com- 5. Brennan CA, Garrett WS (2016) Gut microbiota, inflammation, and
paring the Linear, RBF, Sigmoid, and Polynomial func- colorectal cancer. US J Ann Rev Microbiol 70:395–411
tions in the same dataset. Generally, the common way to 6. Chatterjee S, Dey N, Shi F, Ashour AS et al (2017) Clinical appli-
select kernel function is based on prior knowledge of ex- cation of modified bag-of-features coupled with hybrid neural-
based classifier in dengue fever classification using gene expression
perts, but this method often cannot solve the problem data. Germ J Med Biol Eng Comp:1–12
well. We select the final kernel type as RBF, and we 7. Ay A, Gong D, Kahveci T (2014) Network-based prediction of
optimize the parameters (C, g) of SVM using the grid.py cancer under genetic storm. J Cancer Inform 13:15–31
in LIBSVM, its accuracy achieves 90.1% when k = 5, and 8. Jung KJ, Won D, Jeon C et al (2015) A colorectal cancer prediction
91.2% when k = 10, which is obviously higher than other model using traditional and genetic risk scores in Koreans. N Eng J
BMC Genet 16:1–7
models (LR + RF, LR + NB, LR + KNN, LR + ANNs), 9. Cubiella J, Vega P, Salve M et al (2016) Development and external
solving the problem of low accuracy of cancer prediction validation of a fecal immunochemical test-based prediction model
to a large extent. for colorectal cancer detection in symptomatic patients. J BMC
This work may provide a new method for building Med 14:128–140
classification models for cancer based on independent fac- 10. Coppedè F, Grossi E, Lopomo A et al (2015) Application of artificial
neural networks to link genetic and environmental factors to DNA
tors and prove that considering genetic factors as well as methylation in colorectal cancer. N Eng J Epigenomics 7:175–186
traditional risk factors in risk prediction models can im- 11. Peng Y, Zhai Z, Li Z et al (2015) Role of blood tumor markers in
prove their utility. Most importantly, the result of this predicting metastasis and local recurrence after curative resection of
work indicates that integrated model provides new colon cancer. J Int J Clin Exp Med 8:982–990
12. Juan M, Philippe W, Nermin G et al (2016) An original stepwise
thought in solving the problem of low accuracy and the
multilevel logistic regression analysis of discriminatory accuracy:
unbalance between sensitivity and specificity. the case of neighborhoods and health. US J Plos One 11:e0153778
Med Biol Eng Comput

13. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for 34. Zhang B, Liang XL, Gao HY et al (2016) Models of logistic regres-
cancer classification using support vector machines. US J Mach sion analysis, support vector machine, and back-propagation neural
Learn 46:389–422 network based on serum tumor markers in colorectal cancer diag-
14. Ahmad F, Mat Isa NA, Hussain Z, Osman MK, Sulaiman SN nosis. J Genetics Mol Res 15:1–10
(2015) GA-based feature selection and parameter optimization of 35. Zeller G, Tap J, Voigt AY, Sunagawa S, Kultima JR, Costea PI,
an ANN in diagnosing breast cancer. J Pattern Analysis Appl 18: Amiot A, Bohm J, Brunetti F, Habermann N, Hercog R, Koch M,
861–870 Luciani A, Mende DR, Schneider MA, Schrotz-King P, Tournigand
15. Peng S, Xu Q, Ling XB, Peng X, du W, Chen L (2003) Molecular C, Tran van Nhieu J, Yamada T, Zimmermann J, Benes V, Kloor M,
classification of cancer types from microarray data using the com- Ulrich CM, von Knebel Doeberitz M, Sobhani I, Bork P (2014)
bination of genetic algorithms and support vector machines. J Febs Potential of fecal microbiota for early-stage detection of colorectal
Lett 555:358–362 cancer. US J Mol Systems Biol 10:766–783
16. Liu W, Zheng W L, Lu B L (2016) Emotion recognition using 36. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible
multimodal deep learning trimmer for Illumina sequence data. N Eng J Bioinformatics 30:
17. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez 2114–2120
JM, Herrera F (2014) A review of microarray datasets and applied 37. Truong DT, Franzosa EA, Tickle EL et al (2015) MetaPhlAn2 for
feature selection methods. US J Inform Sci 282:111–135 enhanced metagenomic taxonomic profiling. US J Nat Methods 12:
18. Li T, Zhang C, Ogihara M (2004) A comparative study of feature 902–903
selection and multiclass classification methods for tissue classifica- 38. Vincent C, Manges AR (2015) Antimicrobial use, human gut mi-
tion based on gene expression. N Eng J Bioinform 20:2429–2437 crobiota and Clostridium difficile colonization and infection. J
19. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector Antibiotics 4:230–253
machines. J ACM Trans Intel Systems & Technol 2:1–27 39. Endesfelder D, zu-Castell W, Ardissone A et al (2014)
20. Park SI, Tae-Ho O (2016) Application of receiver operating char- Compromised gut microbiota networks in children with anti-islet
acteristic (ROC) curve for evaluation of diagnostic test perfor- cell autoimmunity. US J Diabetes DB_131676 63:2006–2014
mance. J Vet Clin 33:97–108 40. Gao R, Gao Z, Huang L, Qin H (2017) Gut microbiota and colo-
21. Kim KA, Choi JY, Yoo TK, Kim SK, Chung KS, Kim DW (2013) rectal cancer. Eur J Eur J Clin Microbiol Infect Dis 36:1–13
Mortality prediction of rats in acute hemorrhagic shock using machine 41. Zeevi D, Korem T, Zmora N, Israeli D, Rothschild D, Weinberger
learning techniques. Germ J Med Biol Eng Comp 51:1059–1067 A, Ben-Yacov O, Lador D, Avnit-Sagi T, Lotan-Pompan M, Suez J,
22. Chowdhury A R, Chatterjee T, Banerjee S (2018) A random forest Mahdi JA, Matot E, Malka G, Kosower N, Rein M, Zilberman-
classifier-based approach in the detection of abnormalities in the Schapira G, Dohnalová L, Pevsner-Fischer M, Bikovsky R,
retina. Germ J Med Biol Eng Comp Available at doi:https://doi. Halpern Z, Elinav E, Segal E (2015) Personalized nutrition by pre-
org/10.1007/s11517-018-1878-0 diction of glycemic responses. US J Cell 163:1079–1094
23. Zhang H, Yu P, Xiang ML, Li XB, Kong WB, Ma JY, Wang JL, 42. Schmid D, Leitzmann M F (2014) Television viewing and time
Zhang JP, Zhang J (2016) Prediction of drug-induced eosinophilia spent sedentary in relation to cancer risk: a meta-analysis. J Natl
adverse effect by using SVM and naïve Bayesian approaches. Germ Cancer Instit
J Med Biol Eng Comp 54(2–3):361–369 43. Emmerzaal TL, Kiliaan AJ, Gustafson DR (2015) 2003-2013: a
24. Zhang S, Li X, Zong M et al (2018) Efficient KNN classification decade of body mass index, Alzheimer's disease, and dementia. J.
with different numbers of nearest neighbors. US J IEEE Trans J Alzheimers Dis 43:739–755
Neural Networks Learn Systems (99):1–12 44. Alfa-Wali M, Boniface S, Sharma A et al (2015) Metabolic syndrome
25. Bertolaccini L, Solli P, Pardolesi A, Pasini A (2017) An overview of (Mets) and risk of colorectal cancer (CRC): a systematic review and
the use of artificial neural networks in lung cancer research. J meta-analysis. J World J Surg Med Radiat Oncol 4:41–52
Thorac Dis 9(4):924–931 45. Sears CL, Garrett WS (2014) Microbes, microbiota, and colon can-
26. Siegel R, DeSantis C, Jemal A (2014) Colorectal cancer statistics, cer. US J Cell Host Microbe 15:317–328
2014. J CA: Cancer J Clin 64:104–117 46. Zhu Q, Jin Z, Wu W, Gao R et al (2014) Analysis of the intestinal
27. Lee J, Meyerhardt JA, Giovannucci E, Jeon JY (2015) Association lumen microbiota in an animal model of colorectal cancer. US J
between body mass index and prognosis of colorectal cancer: a meta- PLoS One e90849
analysis of prospective cohort studies. US J PloS one 10:e0120706 47. Zhao M, Fu C, Ji L, Tang K, Zhou M (2011) Feature selection and
28. Chu CM, Yao CT, Chang YT et al (2014) Gene expression profiling of parameter optimization for support vector machines: a new ap-
colorectal tumors and normal mucosa by microarrays meta-analysis proach based on genetic algorithm with feature chromosomes. J
using prediction analysis of microarray, artificial neural network, clas- Expert Syst App l38:5197–5204
sification, and regression trees. J Dis Markers 2014:459–462 48. Hu X, Wong KK, Young GS, Guo L, Wong ST (2011) Support vector
29. Orang AV, Barzegari A (2014) MicroRNAs in colorectal cancer: machine multiparametric MRI identification of pseudoprogression
from diagnosis to targeted therapy. Asian Pac J Cancer Prev 15: from tumor recurrence in patients with resected glioblastoma. US J
6989–6999 Journal of Magnetic Resonance Imaging 33:296–305
30. Philip AK, Lubner MG, Harms B (2011) Computed tomographic 49. Zhang H, Yu P, Xiang ML, Li XB, Kong WB, Ma JY, Wang JL,
colonography. J Surg Clin North Am 91:127–139 Zhang JP, Zhang J (2016) Prediction of drug-induced eosinophilia
31. Zhang H, Qi J, Wu YQ, Zhang P, Jiang J, Wang QX, Zhu YQ adverse effect by using SVM and naive Bayesian approaches. Germ
(2014) Accuracy of early detection of colorectal tumors by stool J Medical & Biological Engineering & Computing 54:361–370
methylation markers: a meta-analysis. World J Gastroenterol 20: 50. Chen T, Cao Y, Zhang Y et al Random forest in clinical metabolo-
14040–14050 mics for phenotypic discrimination and biomarker selection.
32. Ip S, Sokoro AA, Kaita L, Ruiz C, McIntyre E, Singh H (2014) Use Evidence-Based Complementray and Alternative Medicine 2013,
of fecal occult blood testing in hospitalized patients: results of an 2013:298183–298193
audit. Can J Gastroenterol Hepatol 28:489–494 51. Saccá V, Campolo M, Mirarchi D et al (2018) On the classification
33. Li H, Jin Z, Li X et al (2017) Associations between single- of EEG signal by using an SVM based algorithm
nucleotide polymorphisms and inflammatory bowel disease- 52. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM,
associated colorectal cancers in inflammatory bowel disease pa- Thrun S (2017) Dermatologist-level classification of skin cancer
tients: a meta-analysis. J Clinical & Transl Oncol 19:1–10 with deep neural networks. Nature 542:115–118
Med Biol Eng Comput

Dandan Zhao , born in 1991, is a Yanlin He , born in 1992, is a


master degree candidate of master degree candidate of
School of Information Science School of Information Science
and Engineering at Shandong and Engineering at Shandong
Normal University. Her major is Normal University. His major is
computer software and theory. computer software and theory.
And her research interest is the And his research interest is the ap-
application of machine learning plication of intelligent algorithms
algorithms in disease prediction in the field of biological informa-
of bioinformatics. tion.

Hong Liu , Professor and Ph.D., Dianjie Lu received the Ph.D. de-
supervisor of School of gree in computer science from the
Information Science and Institute of Computing
Engeering at Shandong Normal Technology, Chinese Academy
University. She received Ph.D. of Science, in 2012. Currently,
degree in engineering from the he is an associate professor at
Institute of Computing Shandong Normal University.
Technology, Chinese Academy His research interests include
of Science, Beijing, China, in crowd cooperative computing,
1998. She is an academic leader cognitive internet of things.
in computer science and technol-
ogy. Her research is the cross
study of distributed artificial intel-
ligence, software engineering,
and computer-aided design, in-
cluding the research of multi-agent system and co-evolutionary comput-
ing technology.

Chen Lyu received the Ph.D. de-


gree from the Institute of
Computing Technology, Chinese
Yuanjie Zheng is currently a pro- Academy of Sciences, Beijing,
fessor in the School of China, in 2015. His research inter-
Information Science and ests include bioinformatics, data
Engineering at Shandong mining, and software engineering.
Normal University and a Taishan Currently, he is with the School of
Scholar of People’s Government Information Science and
of Shandong Province of China. Engineering, Shandong Normal
His research interests include University, China.
medical image analysis, transla-
tional medicine, computer vision,
and computational photography.
His ultimate research goal is to
enhance patient care by creating
algorithms for automatically
quantifying and generalizing the
information latent in various medical images for tasks such as disease
analysis and surgical planning.

You might also like