Applying A Random Projection Algorithm To Optimize Machine Learning Model For Predicting Peritoneal Metastasis in Gastric Cancer Patients Using CT Images
Applying A Random Projection Algorithm To Optimize Machine Learning Model For Predicting Peritoneal Metastasis in Gastric Cancer Patients Using CT Images
1
School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019,
USA
2
School of Computer Sciences, University of Oklahoma, Norman, OK 73019, USA.
Abstract
Background and Objective: Non-invasively predicting the risk of cancer metastasis before
surgery plays an essential role in determining optimal treatment methods for cancer patients
(including who can benefit from neoadjuvant chemotherapy). Although developing radiomics
based machine learning (ML) models has attracted broad research interest for this purpose, it often
faces a challenge of how to build a highly performed and robust ML model using small and
Methods: In this study, we explore a new approach to build an optimal ML model. A retrospective
dataset involving abdominal computed tomography (CT) images acquired from 159 patients
diagnosed with gastric cancer is assembled. Among them, 121 cases have peritoneal metastasis
(PM), while 38 cases do not have PM. A computer-aided detection (CAD) scheme is first applied
to segment primary gastric tumor volumes and initially computes 315 image features. Then, two
Gradient Boosting Machine (GBM) models embedded with two different feature dimensionality
reduction methods, namely, the principal component analysis (PCA) and a random projection
algorithm (RPA) and a synthetic minority oversampling technique, are built to predict the risk of
the patients having PM. All GBM models are trained and tested using a leave-one-case-out cross-
validation method.
Results: Results show that the GBM embedded with RPA yielded a significantly higher
Conclusions: The study demonstrated that CT images of the primary gastric tumors contain
discriminatory information to predict the risk of PM, and RPA is a promising method to generate
Although the occurrence of gastric cancer has declined recently, it remains the third leading
cause of cancer-related death worldwide [1]. While surgery remains the only curative treatment
option, preoperative neoadjuvant chemotherapy (NAC) has demonstrated favorable results with
increased therapeutic resection rates and improved survival [2]. Preventing the adverse effect of
NAC, patients with different disease stages must be distinguished from each other [3] because, for
each step of the disease, the treatment would be different [4]. Recent studies demonstrated that
applying preoperative NAC for advanced gastric cancer patients with peritoneal metastasis (PM)
yielded a much better clinical outcome and enhanced the overall survival rate [5-8]. Thus, accurate
assessment of the presence of the PM is essential for the selection of appropriate patients for NAC.
Since the overall accuracies of subjectively reading endoscopic ultrasound and computed
tomography (CT) images are not completely reliable [3, 4], an alternative technique is needed to
Recently, many previous studies have revealed that novel radiomics technique could
extract quantitative information from medical images with a large pool of image features, and the
data mining of image feature pool offers an exciting approach to build machine learning (ML)
models and predict clinical outcomes [9, 10]. Although several radiomics based ML models have
been reported to differentiate and stage gastric cancer patients [11, 12], these studies computed
radiomics features from the tumor region manually segmented from one CT slice selected by the
radiologists. Meanwhile, the correlation analysis based method was used to select a small set of
image features, which cannot eliminate the redundancy of the selected features. Thus,
discriminatory power and prediction accuracy of these ML models were limited. To overcome
such limitations, we in this study propose to develop and evaluate a new computer-aided detection
(CAD) scheme aiming to predict the risk of PM among gastric cancer patients. First, our scheme
segments primary gastric tumor volume in 3D CT image data, which can better compute image
features related to the heterogeneity of the tumors. Second, to reduce the dimensionality of feature
space and better identify orthogonal or non-redundant image features from a large pool of initially
computed radiomics features, we investigate and apply a random projection algorithm (RPA).
Third, to avoid bias in generating feature vector, RPA is embedded in a multi-feature fusion-based
machine learning (ML) model to predict the risk of PM, which is trained and tested using (1) a
synthetic minority oversampling technique (SMOTE) to balance numbers of cases in two classes
and (2) a leave-one-case-out (LOCO) cross-validation method. The details of the study design,
experimental procedures, data analysis results, and discussions are presented in the following
In this study, we use a retrospective dataset of abdominal computed tomography (CT) images
acquired from 159 patients diagnosed with gastric cancers, which were confirmed by
patients, 121 cases have PM, and 38 cases do not have PM. Each patient had an abdominal CT
imaging examination during the original cancer diagnosis, which involves approximately 300-400
image slices. The primary gastric tumors are typically depicted in around 20-22 slices. Table 1
summarizes the distribution of general demographic information of these 159 patients involved in
this study.
Table 1. Distribution of study cases in the selected dataset
By recognizing the heterogeneity of tumors in the clinical images and difficulty of tumor
segmentation, we modified and implemented a hybrid tumor segmentation scheme that used a
dynamic programming method [13, 14] to adaptively identify growing thresholds of a multi-layer
topographic region growing algorithm and initial contour in active contour algorithm. Specifically,
the tumor segmentation scheme involves the following steps. First, a Weiner filter is applied to
reduce image noise. Second, an initial seed is placed at the center of the tumor region of one CT
slice in which the tumor has its most significant area. To reduce the inter-operator variability in
choosing the initial seed and increase the robustness of the system as demonstrated in the previous
study [15], a predefined window with the size of (5,5) around the initial seed is automatically
created. A pixel with the minimum value inside the window is detected and selected as the first
seed point. Third, to automatically determine the first threshold value for the region growing
algorithm, a new predefined window size of (5,5) is created around the new seed point. Then, the
scheme computes the pixel value differences between the center pixel and boundary pixels and
identifies the maximum difference. Subsequently, the region growing threshold is determined as
𝑇1 = 𝑉𝑐 + 0.25 × 𝐷𝑚𝑎𝑥 , where 𝑉𝑐 is the pixel value of the center pixel and 𝐷𝑚𝑎𝑥 is the computed
maximum pixel value difference inside the bounding window. Then, this threshold value is applied
to define the first layer of region growing to segment tumor region depicting on one CT image
slice.
Fourth, after determining the first layer of tumor region growth, the growing threshold of
the second layer is 𝑇2 = 𝑇1 + 𝛽𝐶1 where 𝐶1 is the computed contrast of the first layer, and 𝛽 is a
coefficient (i.e., 0.5). This multi-layer region growing continues until the growth ratio between
two adjacent layers is two times bigger than the size of the last growing layer. Last, after the region
growing algorithm stops, the scheme selects the boundary contour of the last region growing layer
as initial region contour. The active contour algorithm is then applied to expend or shrink the
contour curve for the best fitting tumor boundary. As a result, the scheme completes the process
of segmenting the tumor region from one CT slice. Figure 1 illustrated the above steps for the
Subsequently, After segmenting tumor region on one CT slice, the CAD scheme continues
to perform tumor region segmentation by scanning on both up and down directions until no tumor
region is detected in the next adjacent CT slice. A binary map from the previous layer is obtained
to perform this continuing tumor region segmentation task. Next, three seeds, including the center
point of the ROI and two other points randomly selected within a predefined window around the
center point, are mapped to the next slice. Then, the region growing algorithm was automatically
performed from the growing seeds implemented in the targeted slice. Additionally, a tumor
growing boundary condition is limited by the adjacent slice to facilitate the multi-layer region-
image slices of one case. In this way, 3D tumor volume can be segmented and computed.
Once 3D tumor volume is segmented, CAD scheme is applied to compute a large set of
radiomics-based image features, which include 315 features extracted and computed from each
segmented 2D tumor region (ROI) depicting on one CT image slice. These features were
categorized into four main groups, including, (a) the grayscale-run length (GLRLM ) features:
from each ROI, 44 two dimensional features are extracted. (b) The Gray Level Difference Methods
(GLDM) probability density function features: From each probability density function
representing statistical texture features of ROI, four features of mean, median, standard deviation,
and variance are computed. (c) Wavelet domain features: for extracting these features, first, the
image is decomposed into four components comprising low and high scale decomposition in either
X or Y direction by wavelet transforms. Then, the GLCM features, as well as 21 tumor density
[16] and GLDM features, are extracted from those components. (d) the Laplacian of Gaussian
(LoG) features: As for extracting these features first, a Gaussian smoothing filter is applied to
reduce the sensitivity to the noise, and then the Laplacian filter sharpens the image's edge and
highlights rapid intensity changes inside the region. Next, from the extracted points after applying
the LoG filters, the mean, median, and the standard deviation are computed. Figure 3 shows the
GLCM
LL Density
GLDM
GLCM
HL Density
GLCM
GLDM
Wavelet
Input Transform GLCM
ROI
Image
LH Density
GLDM
GLDM
LoG
GLCM
HH Density
GLDM
𝑘
CAD scheme computes each 3D feature (𝐹3𝐷 ) as
𝑁
𝑘 𝑘
𝐹3𝐷 = ∑ 𝑤𝑖 × 𝐹2𝐷 (1)
𝑖=1
Where 𝑤𝑖 is the ratio of the segmented tumor volume on a 𝑖th slice to the whole tumor volume
segmented on all 𝑁 involved CT slices. The segmented tumor volume on a 𝑖 th slice is computed
by multiplying the segmented region size (2D) to the CT slice thickness. Finally, all 315 computed
3D feature values are normalized between 0 to 1 to reduce case-based reliance and weight all
features evenly.
Since the initial feature pool contains 315 image features, many of them can be redundant
(highly correlated) or irrelevant (with lower performance). Hence, selecting a small set of optimal
features to reduce the feature dimension and enhance learning accuracy is vital. In this study, in
order to perform feature dimensionality reduction, we investigate and apply a novel image feature
regeneration method of the Random Projection Algorithm (RPA). Theoretic analysis has indicated
that the RPA has advantages for its simplicity, high performance, and robustness compared to
other feature reduction methods; however, empirical results are sparse [17]. Meanwhile, the RPA
method has been investigated and tested in many engineering applications such as text and face
recognition and yielded comparable results to conventional feature regeneration methods like
generate more robust results and computationally inexpensive [17, 18]. To the best of our
knowledge, the RPA has not been well investigated in the medical imaging informatics field to
reduce the dimensionality of radiomics feature space. Thus, the RPA method is tested in this study.
To introduce the RPA method, let’s first consider each case as a point, if the feature vector
size is k, the case point would be in k dimensional space. Thus, the Euclidian distance between two
|𝑀 − 𝑁| = √∑𝑘𝑖=1(𝑚𝑖 − 𝑛𝑖 )2 (2)
Regarding Formula (2), 𝑀 = (𝑚1 , … , 𝑚𝑘 ), and 𝑁 = (𝑛1 , … 𝑛𝑘 ) are two points in the k
dimensional space. Likewise, the volume of a sphere with radius r and volume of V in k
𝑘
𝑟𝑘𝜋2
𝑉(𝑘) = (3)
1 𝑘
2 Γ(2)
The normalization of feature matrix between [0, 1] suggests that all data can be included
in a sphere with a radius of 1. The important fact about a sphere with unit radius is that the more
increase in dimension, the more reduction in the volume (Formula 4). Simultaneously, the possible
𝑘
𝜋2
lim ( )≅0 (4)
𝑑→∞ 1 𝑘
2 Γ (2 )
Additionally, according to the theory of the heavy-tailed distribution, for a case like 𝑀 =
∑𝑘𝑖=1 𝑝𝑖 = 𝜇 and 𝐸|(𝑚𝑖 − 𝑝𝑖 )𝑑 | ≤ 𝑝𝑖 for 𝑑 = 2,3, … , ⌊𝑡 2 /6𝜇⌋, then, a probability can be computed
using Formula 5:
−𝑡2 −𝑡
𝑝𝑟𝑜𝑏(|∑𝑘𝑖=1 𝑚𝑖 − 𝜇 | ≥ 𝑡) ≤ 𝑀𝑎𝑥 (3𝑒 12𝜇 ,4 × 2𝑒 ) (5)
The more the value of t increases, the less chance of a point be out of that distance. Thus,
𝑀 should be focused around the mean value. In particular, according to Formula 4 and 5, with a
satisfactory estimation, all data are contained in a sphere of unit size, and they are focused around
their mean value. As a result, if the dimension increases, the volume of the sphere would close to
zero. Therefore, the difference between the cases is not enough for accurate classification.
According to the above analysis, the larger the initial feature vector size, the bigger the
space dimension is. Hence, most of the data is focused around the center, which leads to less
difference between the features. Consequently, to reduce the feature dimension, the best and
powerful technique is the one that reduces the dimensionality of features while preserves the
distance between the points indicating rough preservation of the vast amount of information. If we
implement a conventional feature selection method and choose a d-dimensional sup-space of the
initial feature vector randomly, it is expected that all the projected distances in the new space are
within a determined scale-factor of the initial k-dimensional space [20]. Thus, it is probable that
after removing the redundant features, the accuracy would not increase due to the fact that the
divergence between the points is not significant enough to consider as a robust model.
To address the concern discussed above and to optimize the feature space, Johnson-
Lindenstrauss Lemma's theory can be applied in RPA. This theory states that for any 0 < 𝜖 < 1,
and for any number of cases as 𝑡, which are like the points in 𝑘-dimensional space (𝑅 𝑘 ), if
assuming 𝑑 as a positive integer, Formula 6 can be used to compute this integer number:
ln 𝑡
𝑑 ≥4 (6)
𝜖2 𝜖3
(2 − 3)
Afterward, for any set 𝑊 of 𝑡 points in 𝑅 𝑘 , for all 𝑧, 𝑤 ∈ 𝑊, it is revealed that there is a
map, or random projection function like 𝑓: 𝑅 𝑘 → 𝑅 𝑑 , which keeps the distance determined by
Formula 7 [21]:
As it is demonstrated in Formula 8, the distance between the set of points in the lower-
dimension space is roughly close to the distance in high-dimensional space. The Lemma theory
declares that it is feasible to project a set of points from a high-dimensional space into a lower-
dimensional space, as the distances between the points are approximately preserved.
As a result, the above analysis suggests that if the initial set of features are projected into
space with a lower-dimensional subspace using the random projection method, the distances
between points are preserved under better contrast. Hence, it may improve the classification
accuracy between the features of two classes representing cases either with or without PM under
low risk of overfitting ML models. In this study, we investigate whether using RPA can yield a
better result in comparison to one of the popular feature dimensionality reduction approaches,
namely, principal component analysis (PCA). All extracted features in the above section are fed
into both methods of RPA and PCA. After applying these two methods, each of them generates 20
To classify between the study cases with or without PM, we build a multi-feature fusion-
based machine learning model. However, our dataset includes 121 PM cases and 38 non-PM cases,
which are imbalanced in two classes. Thus, to address this issue, we apply The Synthetic Minority
Oversampling Technique (SMOTE) algorithm [22] to rebalance the original image dataset. The
vital point of applying SMOTE is that it introduces synthetic data by interpolation between some
minority class instances that are within a specified neighborhood. If we consider u as a minority
class instance, it is selected as a base to generate new synthetic data points. According to a distance
matrix, some nearest neighbors of the same class are chosen from the training set (points 𝑢1 , 𝑢4 ).
The procedure of SMOTE is as follows. Initially, the total amount of oversampling N is set
up. Then, randomly, a minority class instances are chosen from the training set. Following that,
the K nearest neighbors are attained. Among these K instances, N instances are selected randomly
for computing the new instances by interpolation. Figure 4 illustrates the process of creating
Synthetic data in the SMOTE algorithm. As a result, we add 83 synthetic non-PM cases, and the
dataset is expanded to 242 cases, including 121 PM cases and 121 non-PM cases.
𝑢1 𝑢2
𝑣1 𝑣2
𝑣3 u
𝑣4
𝑢3
𝑢4
After addressing the imbalance dataset, we select and implement the Gradient Boosting
Machine (GBM) to train the optimal feature-based machine learning model and predict the risk of
the advanced gastric cancer patients having PM. The GBM model is a popular machine learning
algorithm that has proven effectiveness at classifying complex datasets and often first in class with
the predictive accuracy [23]. Under a hyperparameter tuning, the GBM model is implemented to
achieve a low computational cost and high robustness in detection results as well. Additionally, to
decrease the case partition bias, we use a leave-one-case-out (LOCO) based cross-validation
method to train and test the GBM classifier. Using the LOCO method, each case is independently
tested once using the GBM model trained using all other cases in the balanced dataset. The model
produces a prediction score for each testing case ranging from 0 to 1. The higher score indicates
the higher risk of the test case having PM. The prediction performance is evaluated using a receiver
operating characteristic (ROC) method after discarding all SMOTE generated non-PM training
samples. The areas under ROC curves (AUC) and overall prediction accuracy after applying an
operating threshold (𝑇 = 0.5) on the GBM model generated prediction scores are used as two
Figure 5 shows the process or flow chat of using our CAD scheme to process images,
compute features and train ML models in which the RPA is also embedded inside the LOCO
training process to reduce potential bias of generating optimal feature vector independent to the
test case. In this study, the segmentation and feature extraction steps were performed using
MATLAB R2019a package, and the feature reduction and classifications were done using Python
3.7.
ROI Feature
Segmentation Extraction
Random GBM
Test
Projection Model
Prediction Score
3. Results
Figure 6 presents two ROC curves generated by the GBM models embedded with two
feature vector regeneration methods (RPA and PCA). The AUC value and the overall prediction
accuracy of the GBM model trained using RPA with 3D image features as input are 0.69±0.019
and 71.2%, respectively (Figure 6.a). Further, Figure 6.b displays the AUC and accuracy of the
GBM model trained using PCA, which are 0.58±0.021, and 65.2%, respectively. The results
indicate that using RPA generated optimal image feature vector provides significantly higher
(b)
Figure 6. The ROC plot of the proposed model after applying (a)Random Projection (b)PCA for
feature reduction.
Figure 7 illustrates the ROC curve and prediction performance of the GBM model trained
using 2D features computed from the largest tumor region segmented from one CT image slice.
As shown in the figure, the AUC and accuracy of the GBM model are 0.66 ±0.017 and 68.4%,
respectively. Table 2 shows the data to compare the performance of two GBM models built using
3D and 2D image feature vectors generated using the RPA method. The results demonstrate that
using 3D image features yields significantly higher performance than using 2D features (p < 0.05)
Table 2. the comparison of the proposed model's performance after applying 2D and 3D
features.
AUC Accuracy
Figure 7. The ROC plot of applying the extracted 2D features to the proposed model.
4. Discussion
CT is the most popular imaging modality to detect and diagnose gastric cancer, and it may
also provide a non-invasive alternative method to predict the risk of PM in advanced gastric cancer
patients. Despite the potential advantages of using CT to detect or predict the risk of PM, the
efficacy of radiologists in reading and interpreting CT images for PM detection is insufficient [24].
Thus, many studies suggested that developing and applying CAD schemes integrated with the
radiomics concept and ML method can be beneficial and provide a second opinion to radiologists
in detecting and diagnosing different abnormalities, including PMs of gastric cancer patients [25].
However, developing ML models using a large number of radiomics features and small training
dataset remains a difficult task. In this study, we explore a new approach to develop a new CAD
scheme or ML model with several unique characteristics and novel ideas in feature extraction and
First, in a previous study conducted in this area, the authors performed manual
segmentation of gastric cancer tumor regions from the single CT image slices [26]. However,
manual segmentation of tumor regions is often inconsistent with large inter-observer variability
due to the fuzzy boundary of the tumor regions, which make the computed image features also
inconsistent or not reproducible. Thus, the prediction accuracy can be affected or not robust. To
solve this issue, we in our study developed an interactive CAD scheme with a graphical user
interface (GUI) to segment tumor regions from CT images. A user only needs to place an initial
seed around the center of the tumor region that has the largest size in one CT slice. Our CAD
scheme then segments tumor regions on all involved CT image slices automatically. The
segmentation results can also be visualized by the human eyes on the GUI windows. Although we
have designed and installed correction function icon in the GUI and the user can activate this
function to order CAD scheme correcting the segmentation errors (if any), the results in this study
show that CAD scheme can achieve satisfactory results in automatically segmenting all 3,305
Second, although several previous studies (including references [27]) have been reported
to develop radiomics based ML models to detect and diagnose gastric cancer using CT images,
they all used image features computed just from one manually selected CT image slice. To the best
of our knowledge, this is the first study that develops and tests a new ML model using 3D image
features. Our study results support our hypothesis that using 2D image features extracted from
only one CT slice might not be sufficient enough to represent the heterogonous characteristics of
the tumors, while using 3D image features can yield significantly higher performance. Specifically,
in this study, we have performed 3D tumor segmentation and extracted 3D image features to detect
or predict the risk of advanced gastric patients having PM. As shown in Table 2, the prediction
performance of the GBM model trained using 3D features yield AUC=0.69±0.019 and the accuracy
of 71.15%, which are significantly higher than the GBM model trained using 2D features with
Third, in developing CAD schemes to train machine learning classifiers, identifying a small
and efficient set of image features plays a critical role; therefore, in previous studies, different
feature dimensionality reduction methods have been investigated. Although these studies made
many improvements in optimizing the feature vectors, there is a significant challenge of achieving
small feature vectors representing the complex and non-linear image feature space. For the first
time, in this study, we investigate the feasibility of applying the RPA to the medical imaging
informatics field in optimizing the CAD scheme or ML model. Our study results show that RPA
is a promising technique to reduce the dimensionality of a set of points lying in Euclidian space
for very heterogeneous feature data commonly occurred in medical images and has advantages to
achieve high robustness in classification and low risk of overfitting. Figure 6 illustrates that the
classification performance of the GBM model embedded with RPA yields significantly higher
performance than the GBM model embedded with a PCA, which is well-known as a popular
feature dimensionality reduction method. As it was presented in Figure 6, the AUC value increased
from 0.58 to 0.69 after applying the RPA as compared to the PCA. Additionally, the overall
prediction accuracy of the GBM model improves from 65.2% to 71.2%, after using RPA instead
of PCA. Thus, the study results demonstrate that due to very complicated distribution of radiomics
features computed from medical images, RPA is a promising and more powerful technique
applicable to generate optimal feature vectors for better training ML models used in CAD schemes
of medical images.
Last but not least, despite the encouraging results, we also notice some limitations in this
study. First, the dataset used in this study is relatively small; hence to validate the results of this
study, larger datasets are required before being tested in future prospective clinical studies. Second,
although in this study we have used synthetic data to balance the dataset and reduce the impact of
an imbalanced dataset, using the SMOTE technique is just efficient for the low dimensional data,
and it may not be appropriate or optimal for a high dimensional data [28]. Third, in the initial pool
of features, we only extracted a limited number of 315 statistics and textural features, which are
much less than the number of features computed based on recently developed radiomics concepts
and technology in other studies [29]. Thus, more texture features can be explored in future studies
to increase the diversity of the initial feature pool, which may also increase the chance of selecting
or generating more optimal features to significantly improve accuracy of ML model to predict risk
of PM. In summary, regardless of the limitations mentioned above, this study reveals a new and
promising approach to identify and generate optimal feature vectors for training ML models
implemented in the CAD schemes of medical images. Since optimizing the feature vector is one
of the critical steps of building an optimal ML model, the presented method in this study is not
only limited to the detection of advanced gastric patients with PM, and it can also be beneficial for
other medical imaging studies of developing ML models to detect different types of cancers or
5. Competing interests
The authors declare that they have no competing interests.
6. Authors’ contributions
SM conceived of the presented idea, developed the CAD scheme and computational
framework as well as analyzing the data. MH assisted with technical details in developing the CAD
scheme. GD helped in carrying out the feature computation. BZH and SL supervised the project
and were in charge of the overall direction. All authors provided critical feedback and helped shape
Acknowledgment
This study is supported in part by research grant R01 CA197150 from the National Cancer
Institute. The authors also thank the support from the Stephenson Cancer Center, University of
Oklahoma.
Reference
1. Bray, F., et al., Global cancer statistics 2018: GLOBOCAN estimates of incidence and
mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians,
2018. 68(6): p. 394-424.
2. Biondi, A., et al., Neo-adjuvant chemo (radio) therapy in gastric cancer: current status
and future perspectives. World journal of gastrointestinal oncology, 2015. 7(12): p. 389.
3. Fukagawa, T., et al., A prospective multi-institutional validity study to evaluate the
accuracy of clinical diagnosis of pathological stage III gastric cancer (JCOG1302A).
Gastric Cancer, 2018. 21(1): p. 68-73.
4. Wang, F.-H., et al., The Chinese Society of Clinical Oncology (CSCO): clinical guidelines
for the diagnosis and treatment of gastric cancer. Cancer communications, 2019. 39(1): p.
1-31.
5. Fujiwara, Y., et al., Neoadjuvant intraperitoneal and systemic chemotherapy for gastric
cancer patients with peritoneal dissemination. Annals of surgical oncology, 2011. 18(13):
p. 3726-3731.
6. Lordick, F., et al., Capecitabine and cisplatin with or without cetuximab for patients with
previously untreated advanced gastric cancer (EXPAND): a randomised, open-label phase
3 trial. The lancet oncology, 2013. 14(6): p. 490-499.
7. Coccolini, F., et al., Intraperitoneal chemotherapy in advanced gastric cancer. Meta-
analysis of randomized trials. European Journal of Surgical Oncology (EJSO), 2014. 40(1):
p. 12-26.
8. Ishigami, H., et al., Phase III trial comparing intraperitoneal and intravenous paclitaxel
plus S-1 versus cisplatin plus S-1 in patients with gastric cancer with peritoneal metastasis:
PHOENIX-GC trial. Journal of Clinical Oncology, 2018. 36(19): p. 1922-1929.
9. Kumar, V., et al., Radiomics: the process and the challenges. Magnetic resonance imaging,
2012. 30(9): p. 1234-1248.
10. Lambin, P., et al., Radiomics: extracting more information from medical images using
advanced feature analysis. European journal of cancer, 2012. 48(4): p. 441-446.
11. Sun, Z.-Q., et al., Radiomics study for differentiating gastric cancer from gastric stromal
tumor based on contrast-enhanced CT images. Journal of X-ray Science and Technology,
2019. 27(6): p. 1021-1031.
12. Wang, L., et al. Computer-aided staging of gastric cancer using radiomics signature on
computed tomography imaging. in Medical Imaging 2020: Computer-Aided Diagnosis.
2020. International Society for Optics and Photonics.
13. Zheng, B., et al., Interactive computer-aided diagnosis of breast masses: computerized
selection of visually similar image sets from a reference library. Academic radiology,
2007. 14(8): p. 917-927.
14. Danala, G., et al., classification of breast masses using a computer-aided diagnosis scheme
of contrast enhanced digital mammograms. Annals of biomedical engineering, 2018.
46(9): p. 1419-1431.
15. Gundreddy, R.R., et al., Assessment of performance and reproducibility of applying a
content‐based image retrieval scheme for classification of breast lesions. Medical physics,
2015. 42(7): p. 4241-4249.
16. Mirniaharikandehei, S., et al., Developing a quantitative ultrasound image feature analysis
scheme to assess tumor treatment efficacy using a mouse model. Scientific reports, 2019.
9(1): p. 1-10.
17. Bingham, E. and H. Mannila. Random projection in dimensionality reduction: applications
to image and text data. in Proceedings of the seventh ACM SIGKDD international
conference on Knowledge discovery and data mining. 2001.
18. Xie, H., J. Li, and H. Xue, A survey of dimensionality reduction techniques based on
random projection. arXiv preprint arXiv:1706.04371, 2017.
19. Aggarwal, C.C., A. Hinneburg, and D.A. Keim. On the surprising behavior of distance
metrics in high dimensional space. in International conference on database theory. 2001.
Springer.
20. Saunders, C., et al., Subspace, Latent Structure and Feature Selection: Statistical and
Optimization Perspectives Workshop, SLSFS 2005 Bohinj, Slovenia, February 23-25,
2005, Revised Selected Papers. Vol. 3940. 2006: Springer.
21. Dasgupta, S. and A. Gupta, An elementary proof of a theorem of Johnson and
Lindenstrauss. Random Structures & Algorithms, 2003. 22(1): p. 60-65.
22. Fernández, A., et al., SMOTE for learning from imbalanced data: progress and challenges,
marking the 15-year anniversary. Journal of artificial intelligence research, 2018. 61: p.
863-905.
23. Hu, R., X. Li, and Y. Zhao. Gradient boosting learning of Hidden Markov models. in 2006
IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
2006. IEEE.
24. Seevaratnam, R., et al., How useful is preoperative imaging for tumor, node, metastasis
(TNM) staging of gastric cancer? A meta-analysis. Gastric cancer, 2012. 15(1): p. 3-18.
25. Gonçalves, V.M., M.E. Delamaro, and F.d.L.d.S. Nunes, A systematic review on the
evaluation and characteristics of computer-aided diagnosis systems. Revista Brasileira de
Engenharia Biomédica, 2014. 30(4): p. 355-383.
26. Liu, S., et al., CT textural analysis of gastric cancer: correlations with
immunohistochemical biomarkers. Scientific reports, 2018. 8(1): p. 1-9.
27. Li, R., et al., Detection of gastric cancer and its histological type based on iodine
concentration in spectral CT. Cancer Imaging, 2018. 18(1): p. 1-10.
28. Blagus, R. and L. Lusa, SMOTE for high-dimensional class-imbalanced data. BMC
bioinformatics, 2013. 14: p. 106-106.
29. Wang, T., et al., Correlation between CT based radiomics features and gene expression
data in non-small cell lung cancer. Journal of X-ray Science and Technology, 2019. 27(5):
p. 773-803.