0% found this document useful (0 votes)

62 views24 pages

Applying A Random Projection Algorithm To Optimize Machine Learning Model For Predicting Peritoneal Metastasis in Gastric Cancer Patients Using CT Images

This study explores using a random projection algorithm to build an optimal machine learning model for predicting peritoneal metastasis in gastric cancer patients using CT images. A dataset of 159 gastric cancer patients was analyzed, with 121 cases having peritoneal metastasis and 38 not having it. 315 image features were initially computed from segmented tumor volumes in CT scans. Two Gradient Boosting Machine models using principal component analysis and random projection for feature dimensionality reduction were compared. The model using random projection achieved a significantly higher prediction accuracy of 71.2% compared to 65.2% for the principal component analysis model. The study demonstrates that CT images contain information to predict metastasis risk, and random projection is a promising method for feature selection to improve machine learning performance.

Uploaded by

th onorimi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views24 pages

Applying A Random Projection Algorithm To Optimize Machine Learning Model For Predicting Peritoneal Metastasis in Gastric Cancer Patients Using CT Images

Uploaded by

th onorimi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Applying a random projection algorithm to optimize machine

learning model for predicting peritoneal metastasis in

gastric cancer patients using CT images

Seyedehnafiseh Mirniaharikandehei1, Morteza Heidari1, Gopichandh Danala1, Sivaramakrishnan

Lakshmivarahan2, Bin Zheng1

1
School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019,

USA

2
School of Computer Sciences, University of Oklahoma, Norman, OK 73019, USA.

Abstract

Background and Objective: Non-invasively predicting the risk of cancer metastasis before

surgery plays an essential role in determining optimal treatment methods for cancer patients

(including who can benefit from neoadjuvant chemotherapy). Although developing radiomics

based machine learning (ML) models has attracted broad research interest for this purpose, it often

faces a challenge of how to build a highly performed and robust ML model using small and

imbalanced image datasets.

Methods: In this study, we explore a new approach to build an optimal ML model. A retrospective

dataset involving abdominal computed tomography (CT) images acquired from 159 patients

diagnosed with gastric cancer is assembled. Among them, 121 cases have peritoneal metastasis

(PM), while 38 cases do not have PM. A computer-aided detection (CAD) scheme is first applied
to segment primary gastric tumor volumes and initially computes 315 image features. Then, two

Gradient Boosting Machine (GBM) models embedded with two different feature dimensionality

reduction methods, namely, the principal component analysis (PCA) and a random projection

algorithm (RPA) and a synthetic minority oversampling technique, are built to predict the risk of

the patients having PM. All GBM models are trained and tested using a leave-one-case-out cross-

validation method.

Results: Results show that the GBM embedded with RPA yielded a significantly higher

prediction accuracy (71.2%) than using PCA (65.2%) (p<0.05).

Conclusions: The study demonstrated that CT images of the primary gastric tumors contain

discriminatory information to predict the risk of PM, and RPA is a promising method to generate

optimal feature vector, improving the performance of ML models of medical images.

Keywords- Gastric Cancer, Quantitative features, Computed tomography, Random projection,

Feature dimensionality reduction.

1. Introduction

Although the occurrence of gastric cancer has declined recently, it remains the third leading

cause of cancer-related death worldwide [1]. While surgery remains the only curative treatment

option, preoperative neoadjuvant chemotherapy (NAC) has demonstrated favorable results with

increased therapeutic resection rates and improved survival [2]. Preventing the adverse effect of

NAC, patients with different disease stages must be distinguished from each other [3] because, for

each step of the disease, the treatment would be different [4]. Recent studies demonstrated that

applying preoperative NAC for advanced gastric cancer patients with peritoneal metastasis (PM)

yielded a much better clinical outcome and enhanced the overall survival rate [5-8]. Thus, accurate

assessment of the presence of the PM is essential for the selection of appropriate patients for NAC.

Since the overall accuracies of subjectively reading endoscopic ultrasound and computed

tomography (CT) images are not completely reliable [3, 4], an alternative technique is needed to

facilitate the assessment of tumor stages and the risk of PM.

Recently, many previous studies have revealed that novel radiomics technique could

extract quantitative information from medical images with a large pool of image features, and the

data mining of image feature pool offers an exciting approach to build machine learning (ML)

models and predict clinical outcomes [9, 10]. Although several radiomics based ML models have

been reported to differentiate and stage gastric cancer patients [11, 12], these studies computed

radiomics features from the tumor region manually segmented from one CT slice selected by the

radiologists. Meanwhile, the correlation analysis based method was used to select a small set of

image features, which cannot eliminate the redundancy of the selected features. Thus,

discriminatory power and prediction accuracy of these ML models were limited. To overcome

such limitations, we in this study propose to develop and evaluate a new computer-aided detection
(CAD) scheme aiming to predict the risk of PM among gastric cancer patients. First, our scheme

segments primary gastric tumor volume in 3D CT image data, which can better compute image

features related to the heterogeneity of the tumors. Second, to reduce the dimensionality of feature

space and better identify orthogonal or non-redundant image features from a large pool of initially

computed radiomics features, we investigate and apply a random projection algorithm (RPA).

Third, to avoid bias in generating feature vector, RPA is embedded in a multi-feature fusion-based

machine learning (ML) model to predict the risk of PM, which is trained and tested using (1) a

synthetic minority oversampling technique (SMOTE) to balance numbers of cases in two classes

and (2) a leave-one-case-out (LOCO) cross-validation method. The details of the study design,

experimental procedures, data analysis results, and discussions are presented in the following

sections of this article.

2. Materials and methods

2.1. Image Dataset

In this study, we use a retrospective dataset of abdominal computed tomography (CT) images

acquired from 159 patients diagnosed with gastric cancers, which were confirmed by

histopathology examinations of endoscopic-biopsied tissues at two hospitals. Among these

patients, 121 cases have PM, and 38 cases do not have PM. Each patient had an abdominal CT

imaging examination during the original cancer diagnosis, which involves approximately 300-400

image slices. The primary gastric tumors are typically depicted in around 20-22 slices. Table 1

summarizes the distribution of general demographic information of these 159 patients involved in

this study.
Table 1. Distribution of study cases in the selected dataset

2.2. Tumor Segmentation

By recognizing the heterogeneity of tumors in the clinical images and difficulty of tumor

segmentation, we modified and implemented a hybrid tumor segmentation scheme that used a

dynamic programming method [13, 14] to adaptively identify growing thresholds of a multi-layer

topographic region growing algorithm and initial contour in active contour algorithm. Specifically,

the tumor segmentation scheme involves the following steps. First, a Weiner filter is applied to

reduce image noise. Second, an initial seed is placed at the center of the tumor region of one CT

slice in which the tumor has its most significant area. To reduce the inter-operator variability in

choosing the initial seed and increase the robustness of the system as demonstrated in the previous

study [15], a predefined window with the size of (5,5) around the initial seed is automatically

created. A pixel with the minimum value inside the window is detected and selected as the first

seed point. Third, to automatically determine the first threshold value for the region growing
algorithm, a new predefined window size of (5,5) is created around the new seed point. Then, the

scheme computes the pixel value differences between the center pixel and boundary pixels and

identifies the maximum difference. Subsequently, the region growing threshold is determined as

𝑇1 = 𝑉𝑐 + 0.25 × 𝐷𝑚𝑎𝑥 , where 𝑉𝑐 is the pixel value of the center pixel and 𝐷𝑚𝑎𝑥 is the computed

maximum pixel value difference inside the bounding window. Then, this threshold value is applied

to define the first layer of region growing to segment tumor region depicting on one CT image

slice.

Fourth, after determining the first layer of tumor region growth, the growing threshold of

the second layer is 𝑇2 = 𝑇1 + 𝛽𝐶1 where 𝐶1 is the computed contrast of the first layer, and 𝛽 is a

coefficient (i.e., 0.5). This multi-layer region growing continues until the growth ratio between

two adjacent layers is two times bigger than the size of the last growing layer. Last, after the region

growing algorithm stops, the scheme selects the boundary contour of the last region growing layer

as initial region contour. The active contour algorithm is then applied to expend or shrink the

contour curve for the best fitting tumor boundary. As a result, the scheme completes the process

of segmenting the tumor region from one CT slice. Figure 1 illustrated the above steps for the

tumor segmentation in one CT slice.

(c) Continuing to growth while
(b) Applying region growing
(a) Select the first seed meeting the criteria of the
based on auto initial threshold
growth

(d) Growing until the growth (e) Final segmentation

criterion does not meet

Figure 1. The process of 2D tumor region segmentation.

Subsequently, After segmenting tumor region on one CT slice, the CAD scheme continues

to perform tumor region segmentation by scanning on both up and down directions until no tumor

region is detected in the next adjacent CT slice. A binary map from the previous layer is obtained

to perform this continuing tumor region segmentation task. Next, three seeds, including the center

point of the ROI and two other points randomly selected within a predefined window around the

center point, are mapped to the next slice. Then, the region growing algorithm was automatically

performed from the growing seeds implemented in the targeted slice. Additionally, a tumor
growing boundary condition is limited by the adjacent slice to facilitate the multi-layer region-

growing process and avoid growth leakage.

Figure 2 shows an example of the segmentation of tumor regions depicting several CT

image slices of one case. In this way, 3D tumor volume can be segmented and computed.

Figure 2. An Example of 3D segmentation of a lesion in 3 different slices.

2.3. Feature Extraction

Once 3D tumor volume is segmented, CAD scheme is applied to compute a large set of

radiomics-based image features, which include 315 features extracted and computed from each

segmented 2D tumor region (ROI) depicting on one CT image slice. These features were

categorized into four main groups, including, (a) the grayscale-run length (GLRLM ) features:

from each ROI, 44 two dimensional features are extracted. (b) The Gray Level Difference Methods

(GLDM) probability density function features: From each probability density function

representing statistical texture features of ROI, four features of mean, median, standard deviation,

and variance are computed. (c) Wavelet domain features: for extracting these features, first, the
image is decomposed into four components comprising low and high scale decomposition in either

X or Y direction by wavelet transforms. Then, the GLCM features, as well as 21 tumor density

[16] and GLDM features, are extracted from those components. (d) the Laplacian of Gaussian

(LoG) features: As for extracting these features first, a Gaussian smoothing filter is applied to

reduce the sensitivity to the noise, and then the Laplacian filter sharpens the image's edge and

highlights rapid intensity changes inside the region. Next, from the extracted points after applying

the LoG filters, the mean, median, and the standard deviation are computed. Figure 3 shows the

flow diagram of the feature extraction process.

GLCM

LL Density

GLDM

GLCM

HL Density
GLCM
GLDM
Wavelet
Input Transform GLCM
ROI
Image
LH Density
GLDM

GLDM
LoG
GLCM

HH Density

GLDM

Figure 3. Diagram of Feature extraction Process.

After computing 2D features of all segmented tumor regions in 𝑁 involved CT image slice,

𝑘
CAD scheme computes each 3D feature (𝐹3𝐷 ) as

𝑁
𝑘 𝑘
𝐹3𝐷 = ∑ 𝑤𝑖 × 𝐹2𝐷 (1)
𝑖=1

Where 𝑤𝑖 is the ratio of the segmented tumor volume on a 𝑖th slice to the whole tumor volume

segmented on all 𝑁 involved CT slices. The segmented tumor volume on a 𝑖 th slice is computed

by multiplying the segmented region size (2D) to the CT slice thickness. Finally, all 315 computed

3D feature values are normalized between 0 to 1 to reduce case-based reliance and weight all

features evenly.

2.4. Feature Dimensionality Reduction Using Random Projection Algorithm

Since the initial feature pool contains 315 image features, many of them can be redundant

(highly correlated) or irrelevant (with lower performance). Hence, selecting a small set of optimal

features to reduce the feature dimension and enhance learning accuracy is vital. In this study, in

order to perform feature dimensionality reduction, we investigate and apply a novel image feature

regeneration method of the Random Projection Algorithm (RPA). Theoretic analysis has indicated

that the RPA has advantages for its simplicity, high performance, and robustness compared to

other feature reduction methods; however, empirical results are sparse [17]. Meanwhile, the RPA

method has been investigated and tested in many engineering applications such as text and face

recognition and yielded comparable results to conventional feature regeneration methods like

principal component analysis (PCA), etc.

Nevertheless, the advantage of employing RP methods over their alternative is that they

generate more robust results and computationally inexpensive [17, 18]. To the best of our

knowledge, the RPA has not been well investigated in the medical imaging informatics field to

reduce the dimensionality of radiomics feature space. Thus, the RPA method is tested in this study.

To introduce the RPA method, let’s first consider each case as a point, if the feature vector

size is k, the case point would be in k dimensional space. Thus, the Euclidian distance between two

points in that k dimensional space expressed as follows:

|𝑀 − 𝑁| = √∑𝑘𝑖=1(𝑚𝑖 − 𝑛𝑖 )2 (2)

Regarding Formula (2), 𝑀 = (𝑚1 , … , 𝑚𝑘 ), and 𝑁 = (𝑛1 , … 𝑛𝑘 ) are two points in the k

dimensional space. Likewise, the volume of a sphere with radius r and volume of V in k

dimensional space is defined as follows in Formula 3 [19]:

𝑘
𝑟𝑘𝜋2
𝑉(𝑘) = (3)
1 𝑘
2 Γ(2)

The normalization of feature matrix between [0, 1] suggests that all data can be included

in a sphere with a radius of 1. The important fact about a sphere with unit radius is that the more

increase in dimension, the more reduction in the volume (Formula 4). Simultaneously, the possible

distance between the two points remains at 2.

𝑘
𝜋2
lim ( )≅0 (4)
𝑑→∞ 1 𝑘
2 Γ (2 )
Additionally, according to the theory of the heavy-tailed distribution, for a case like 𝑀 =

(𝑚1 , … , 𝑚𝑘 ) in the space of features, considering features independent with an acceptable

approximation, or almost perpendicular variables mapping to different axes, with 𝐸(𝑚𝑖 ) = 𝑝𝑖 ,

∑𝑘𝑖=1 𝑝𝑖 = 𝜇 and 𝐸|(𝑚𝑖 − 𝑝𝑖 )𝑑 | ≤ 𝑝𝑖 for 𝑑 = 2,3, … , ⌊𝑡 2 /6𝜇⌋, then, a probability can be computed

using Formula 5:

−𝑡2 −𝑡
𝑝𝑟𝑜𝑏(|∑𝑘𝑖=1 𝑚𝑖 − 𝜇 | ≥ 𝑡) ≤ 𝑀𝑎𝑥 (3𝑒 12𝜇 ,4 × 2𝑒 ) (5)

The more the value of t increases, the less chance of a point be out of that distance. Thus,

𝑀 should be focused around the mean value. In particular, according to Formula 4 and 5, with a

satisfactory estimation, all data are contained in a sphere of unit size, and they are focused around

their mean value. As a result, if the dimension increases, the volume of the sphere would close to

zero. Therefore, the difference between the cases is not enough for accurate classification.

According to the above analysis, the larger the initial feature vector size, the bigger the

space dimension is. Hence, most of the data is focused around the center, which leads to less

difference between the features. Consequently, to reduce the feature dimension, the best and

powerful technique is the one that reduces the dimensionality of features while preserves the

distance between the points indicating rough preservation of the vast amount of information. If we

implement a conventional feature selection method and choose a d-dimensional sup-space of the

initial feature vector randomly, it is expected that all the projected distances in the new space are

within a determined scale-factor of the initial k-dimensional space [20]. Thus, it is probable that

after removing the redundant features, the accuracy would not increase due to the fact that the

divergence between the points is not significant enough to consider as a robust model.
To address the concern discussed above and to optimize the feature space, Johnson-

Lindenstrauss Lemma's theory can be applied in RPA. This theory states that for any 0 < 𝜖 < 1,

and for any number of cases as 𝑡, which are like the points in 𝑘-dimensional space (𝑅 𝑘 ), if

assuming 𝑑 as a positive integer, Formula 6 can be used to compute this integer number:

ln 𝑡
𝑑 ≥4 (6)
𝜖2 𝜖3
(2 − 3)

Afterward, for any set 𝑊 of 𝑡 points in 𝑅 𝑘 , for all 𝑧, 𝑤 ∈ 𝑊, it is revealed that there is a

map, or random projection function like 𝑓: 𝑅 𝑘 → 𝑅 𝑑 , which keeps the distance determined by

Formula 7 [21]:

(1 − 𝜖)|𝑧 − 𝑤|2 ≤ |𝑓(𝑧) − 𝑓(𝑤)|2 ≤ (1 + 𝜖)|𝑧 − 𝑤|2 (7)

The above approximation also can be achieved from Formula 8 as follows:

|𝑓(𝑧) − 𝑓(𝑤)|2 |𝑓(𝑧) − 𝑓(𝑤)|2

≤ |𝑧 − 𝑤|2 ≤ (8)
(1 + 𝜖) (1 − 𝜖)

As it is demonstrated in Formula 8, the distance between the set of points in the lower-

dimension space is roughly close to the distance in high-dimensional space. The Lemma theory

declares that it is feasible to project a set of points from a high-dimensional space into a lower-

dimensional space, as the distances between the points are approximately preserved.

As a result, the above analysis suggests that if the initial set of features are projected into

space with a lower-dimensional subspace using the random projection method, the distances

between points are preserved under better contrast. Hence, it may improve the classification
accuracy between the features of two classes representing cases either with or without PM under

low risk of overfitting ML models. In this study, we investigate whether using RPA can yield a

better result in comparison to one of the popular feature dimensionality reduction approaches,

namely, principal component analysis (PCA). All extracted features in the above section are fed

into both methods of RPA and PCA. After applying these two methods, each of them generates 20

optimal features out of the large initial pool of 315 features.

2.5. Machine learning model

To classify between the study cases with or without PM, we build a multi-feature fusion-

based machine learning model. However, our dataset includes 121 PM cases and 38 non-PM cases,

which are imbalanced in two classes. Thus, to address this issue, we apply The Synthetic Minority

Oversampling Technique (SMOTE) algorithm [22] to rebalance the original image dataset. The

vital point of applying SMOTE is that it introduces synthetic data by interpolation between some

minority class instances that are within a specified neighborhood. If we consider u as a minority

class instance, it is selected as a base to generate new synthetic data points. According to a distance

matrix, some nearest neighbors of the same class are chosen from the training set (points 𝑢1 , 𝑢4 ).

Then, to attain the new instances (𝑣1 , 𝑣4 ), a randomized interpolation is applied.

The procedure of SMOTE is as follows. Initially, the total amount of oversampling N is set

up. Then, randomly, a minority class instances are chosen from the training set. Following that,

the K nearest neighbors are attained. Among these K instances, N instances are selected randomly

for computing the new instances by interpolation. Figure 4 illustrates the process of creating

Synthetic data in the SMOTE algorithm. As a result, we add 83 synthetic non-PM cases, and the

dataset is expanded to 242 cases, including 121 PM cases and 121 non-PM cases.
𝑢1 𝑢2
𝑣1 𝑣2

𝑣3 u
𝑣4
𝑢3
𝑢4

Figure 4. Synthetic data in SMOTE algorithm illustration.

After addressing the imbalance dataset, we select and implement the Gradient Boosting

Machine (GBM) to train the optimal feature-based machine learning model and predict the risk of

the advanced gastric cancer patients having PM. The GBM model is a popular machine learning

algorithm that has proven effectiveness at classifying complex datasets and often first in class with

the predictive accuracy [23]. Under a hyperparameter tuning, the GBM model is implemented to

achieve a low computational cost and high robustness in detection results as well. Additionally, to

decrease the case partition bias, we use a leave-one-case-out (LOCO) based cross-validation

method to train and test the GBM classifier. Using the LOCO method, each case is independently

tested once using the GBM model trained using all other cases in the balanced dataset. The model

produces a prediction score for each testing case ranging from 0 to 1. The higher score indicates

the higher risk of the test case having PM. The prediction performance is evaluated using a receiver

operating characteristic (ROC) method after discarding all SMOTE generated non-PM training

samples. The areas under ROC curves (AUC) and overall prediction accuracy after applying an

operating threshold (𝑇 = 0.5) on the GBM model generated prediction scores are used as two

performance evaluation indices.

Figure 5 shows the process or flow chat of using our CAD scheme to process images,

compute features and train ML models in which the RPA is also embedded inside the LOCO
training process to reduce potential bias of generating optimal feature vector independent to the

test case. In this study, the segmentation and feature extraction steps were performed using

MATLAB R2019a package, and the feature reduction and classifications were done using Python

3.7.

Image Data Random SMOTE

Train
Projection

Validation Random GBM

Projection Model

ROI Feature
Segmentation Extraction

Random GBM
Test
Projection Model

Prediction Score

Figure 5.The flowchart of the proposed CAD scheme.

3. Results

Figure 6 presents two ROC curves generated by the GBM models embedded with two

feature vector regeneration methods (RPA and PCA). The AUC value and the overall prediction

accuracy of the GBM model trained using RPA with 3D image features as input are 0.69±0.019

and 71.2%, respectively (Figure 6.a). Further, Figure 6.b displays the AUC and accuracy of the

GBM model trained using PCA, which are 0.58±0.021, and 65.2%, respectively. The results

indicate that using RPA generated optimal image feature vector provides significantly higher

prediction accuracy (p < 0.05) than using the PCA method.

(a)

(b)

Figure 6. The ROC plot of the proposed model after applying (a)Random Projection (b)PCA for
feature reduction.

Figure 7 illustrates the ROC curve and prediction performance of the GBM model trained

using 2D features computed from the largest tumor region segmented from one CT image slice.
As shown in the figure, the AUC and accuracy of the GBM model are 0.66 ±0.017 and 68.4%,

respectively. Table 2 shows the data to compare the performance of two GBM models built using

3D and 2D image feature vectors generated using the RPA method. The results demonstrate that

using 3D image features yields significantly higher performance than using 2D features (p < 0.05)

in predicting the risk of gastric cancer cases with PM.

Table 2. the comparison of the proposed model's performance after applying 2D and 3D
features.
AUC Accuracy

2D features 0.66±0.017 68.4%

3D features 0.69±0.019 71.2%

Figure 7. The ROC plot of applying the extracted 2D features to the proposed model.
4. Discussion

CT is the most popular imaging modality to detect and diagnose gastric cancer, and it may

also provide a non-invasive alternative method to predict the risk of PM in advanced gastric cancer

patients. Despite the potential advantages of using CT to detect or predict the risk of PM, the

efficacy of radiologists in reading and interpreting CT images for PM detection is insufficient [24].

Thus, many studies suggested that developing and applying CAD schemes integrated with the

radiomics concept and ML method can be beneficial and provide a second opinion to radiologists

in detecting and diagnosing different abnormalities, including PMs of gastric cancer patients [25].

However, developing ML models using a large number of radiomics features and small training

dataset remains a difficult task. In this study, we explore a new approach to develop a new CAD

scheme or ML model with several unique characteristics and novel ideas in feature extraction and

optimization to improve accuracy in detecting advanced gastric patients with PM.

First, in a previous study conducted in this area, the authors performed manual

segmentation of gastric cancer tumor regions from the single CT image slices [26]. However,

manual segmentation of tumor regions is often inconsistent with large inter-observer variability

due to the fuzzy boundary of the tumor regions, which make the computed image features also

inconsistent or not reproducible. Thus, the prediction accuracy can be affected or not robust. To

solve this issue, we in our study developed an interactive CAD scheme with a graphical user

interface (GUI) to segment tumor regions from CT images. A user only needs to place an initial

seed around the center of the tumor region that has the largest size in one CT slice. Our CAD

scheme then segments tumor regions on all involved CT image slices automatically. The

segmentation results can also be visualized by the human eyes on the GUI windows. Although we

have designed and installed correction function icon in the GUI and the user can activate this
function to order CAD scheme correcting the segmentation errors (if any), the results in this study

show that CAD scheme can achieve satisfactory results in automatically segmenting all 3,305

tumor regions from all 159 cases in our dataset.

Second, although several previous studies (including references [27]) have been reported

to develop radiomics based ML models to detect and diagnose gastric cancer using CT images,

they all used image features computed just from one manually selected CT image slice. To the best

of our knowledge, this is the first study that develops and tests a new ML model using 3D image

features. Our study results support our hypothesis that using 2D image features extracted from

only one CT slice might not be sufficient enough to represent the heterogonous characteristics of

the tumors, while using 3D image features can yield significantly higher performance. Specifically,

in this study, we have performed 3D tumor segmentation and extracted 3D image features to detect

or predict the risk of advanced gastric patients having PM. As shown in Table 2, the prediction

performance of the GBM model trained using 3D features yield AUC=0.69±0.019 and the accuracy

of 71.15%, which are significantly higher than the GBM model trained using 2D features with

AUC=0.66±0.017 and the accuracy of 68.4% (p < 0.05), respectively.

Third, in developing CAD schemes to train machine learning classifiers, identifying a small

and efficient set of image features plays a critical role; therefore, in previous studies, different

feature dimensionality reduction methods have been investigated. Although these studies made

many improvements in optimizing the feature vectors, there is a significant challenge of achieving

small feature vectors representing the complex and non-linear image feature space. For the first

time, in this study, we investigate the feasibility of applying the RPA to the medical imaging

informatics field in optimizing the CAD scheme or ML model. Our study results show that RPA

is a promising technique to reduce the dimensionality of a set of points lying in Euclidian space
for very heterogeneous feature data commonly occurred in medical images and has advantages to

achieve high robustness in classification and low risk of overfitting. Figure 6 illustrates that the

classification performance of the GBM model embedded with RPA yields significantly higher

performance than the GBM model embedded with a PCA, which is well-known as a popular

feature dimensionality reduction method. As it was presented in Figure 6, the AUC value increased

from 0.58 to 0.69 after applying the RPA as compared to the PCA. Additionally, the overall

prediction accuracy of the GBM model improves from 65.2% to 71.2%, after using RPA instead

of PCA. Thus, the study results demonstrate that due to very complicated distribution of radiomics

features computed from medical images, RPA is a promising and more powerful technique

applicable to generate optimal feature vectors for better training ML models used in CAD schemes

of medical images.

Last but not least, despite the encouraging results, we also notice some limitations in this

study. First, the dataset used in this study is relatively small; hence to validate the results of this

study, larger datasets are required before being tested in future prospective clinical studies. Second,

although in this study we have used synthetic data to balance the dataset and reduce the impact of

an imbalanced dataset, using the SMOTE technique is just efficient for the low dimensional data,

and it may not be appropriate or optimal for a high dimensional data [28]. Third, in the initial pool

of features, we only extracted a limited number of 315 statistics and textural features, which are

much less than the number of features computed based on recently developed radiomics concepts

and technology in other studies [29]. Thus, more texture features can be explored in future studies

to increase the diversity of the initial feature pool, which may also increase the chance of selecting

or generating more optimal features to significantly improve accuracy of ML model to predict risk

of PM. In summary, regardless of the limitations mentioned above, this study reveals a new and
promising approach to identify and generate optimal feature vectors for training ML models

implemented in the CAD schemes of medical images. Since optimizing the feature vector is one

of the critical steps of building an optimal ML model, the presented method in this study is not

only limited to the detection of advanced gastric patients with PM, and it can also be beneficial for

other medical imaging studies of developing ML models to detect different types of cancers or

abnormalities in the future.

5. Competing interests
The authors declare that they have no competing interests.

6. Authors’ contributions
SM conceived of the presented idea, developed the CAD scheme and computational

framework as well as analyzing the data. MH assisted with technical details in developing the CAD

scheme. GD helped in carrying out the feature computation. BZH and SL supervised the project

and were in charge of the overall direction. All authors provided critical feedback and helped shape

the research, analysis, and manuscript.

Acknowledgment
This study is supported in part by research grant R01 CA197150 from the National Cancer

Institute. The authors also thank the support from the Stephenson Cancer Center, University of

Oklahoma.
Reference

1. Bray, F., et al., Global cancer statistics 2018: GLOBOCAN estimates of incidence and
mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians,
2018. 68(6): p. 394-424.
2. Biondi, A., et al., Neo-adjuvant chemo (radio) therapy in gastric cancer: current status
and future perspectives. World journal of gastrointestinal oncology, 2015. 7(12): p. 389.
3. Fukagawa, T., et al., A prospective multi-institutional validity study to evaluate the
accuracy of clinical diagnosis of pathological stage III gastric cancer (JCOG1302A).
Gastric Cancer, 2018. 21(1): p. 68-73.
4. Wang, F.-H., et al., The Chinese Society of Clinical Oncology (CSCO): clinical guidelines
for the diagnosis and treatment of gastric cancer. Cancer communications, 2019. 39(1): p.
1-31.
5. Fujiwara, Y., et al., Neoadjuvant intraperitoneal and systemic chemotherapy for gastric
cancer patients with peritoneal dissemination. Annals of surgical oncology, 2011. 18(13):
p. 3726-3731.
6. Lordick, F., et al., Capecitabine and cisplatin with or without cetuximab for patients with
previously untreated advanced gastric cancer (EXPAND): a randomised, open-label phase
3 trial. The lancet oncology, 2013. 14(6): p. 490-499.
7. Coccolini, F., et al., Intraperitoneal chemotherapy in advanced gastric cancer. Meta-
analysis of randomized trials. European Journal of Surgical Oncology (EJSO), 2014. 40(1):
p. 12-26.
8. Ishigami, H., et al., Phase III trial comparing intraperitoneal and intravenous paclitaxel
plus S-1 versus cisplatin plus S-1 in patients with gastric cancer with peritoneal metastasis:
PHOENIX-GC trial. Journal of Clinical Oncology, 2018. 36(19): p. 1922-1929.
9. Kumar, V., et al., Radiomics: the process and the challenges. Magnetic resonance imaging,
2012. 30(9): p. 1234-1248.
10. Lambin, P., et al., Radiomics: extracting more information from medical images using
advanced feature analysis. European journal of cancer, 2012. 48(4): p. 441-446.
11. Sun, Z.-Q., et al., Radiomics study for differentiating gastric cancer from gastric stromal
tumor based on contrast-enhanced CT images. Journal of X-ray Science and Technology,
2019. 27(6): p. 1021-1031.
12. Wang, L., et al. Computer-aided staging of gastric cancer using radiomics signature on
computed tomography imaging. in Medical Imaging 2020: Computer-Aided Diagnosis.
2020. International Society for Optics and Photonics.
13. Zheng, B., et al., Interactive computer-aided diagnosis of breast masses: computerized
selection of visually similar image sets from a reference library. Academic radiology,
2007. 14(8): p. 917-927.
14. Danala, G., et al., classification of breast masses using a computer-aided diagnosis scheme
of contrast enhanced digital mammograms. Annals of biomedical engineering, 2018.
46(9): p. 1419-1431.
15. Gundreddy, R.R., et al., Assessment of performance and reproducibility of applying a
content‐based image retrieval scheme for classification of breast lesions. Medical physics,
2015. 42(7): p. 4241-4249.
16. Mirniaharikandehei, S., et al., Developing a quantitative ultrasound image feature analysis
scheme to assess tumor treatment efficacy using a mouse model. Scientific reports, 2019.
9(1): p. 1-10.
17. Bingham, E. and H. Mannila. Random projection in dimensionality reduction: applications
to image and text data. in Proceedings of the seventh ACM SIGKDD international
conference on Knowledge discovery and data mining. 2001.
18. Xie, H., J. Li, and H. Xue, A survey of dimensionality reduction techniques based on
random projection. arXiv preprint arXiv:1706.04371, 2017.
19. Aggarwal, C.C., A. Hinneburg, and D.A. Keim. On the surprising behavior of distance
metrics in high dimensional space. in International conference on database theory. 2001.
Springer.
20. Saunders, C., et al., Subspace, Latent Structure and Feature Selection: Statistical and
Optimization Perspectives Workshop, SLSFS 2005 Bohinj, Slovenia, February 23-25,
2005, Revised Selected Papers. Vol. 3940. 2006: Springer.
21. Dasgupta, S. and A. Gupta, An elementary proof of a theorem of Johnson and
Lindenstrauss. Random Structures & Algorithms, 2003. 22(1): p. 60-65.
22. Fernández, A., et al., SMOTE for learning from imbalanced data: progress and challenges,
marking the 15-year anniversary. Journal of artificial intelligence research, 2018. 61: p.
863-905.
23. Hu, R., X. Li, and Y. Zhao. Gradient boosting learning of Hidden Markov models. in 2006
IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
2006. IEEE.
24. Seevaratnam, R., et al., How useful is preoperative imaging for tumor, node, metastasis
(TNM) staging of gastric cancer? A meta-analysis. Gastric cancer, 2012. 15(1): p. 3-18.
25. Gonçalves, V.M., M.E. Delamaro, and F.d.L.d.S. Nunes, A systematic review on the
evaluation and characteristics of computer-aided diagnosis systems. Revista Brasileira de
Engenharia Biomédica, 2014. 30(4): p. 355-383.
26. Liu, S., et al., CT textural analysis of gastric cancer: correlations with
immunohistochemical biomarkers. Scientific reports, 2018. 8(1): p. 1-9.
27. Li, R., et al., Detection of gastric cancer and its histological type based on iodine
concentration in spectral CT. Cancer Imaging, 2018. 18(1): p. 1-10.
28. Blagus, R. and L. Lusa, SMOTE for high-dimensional class-imbalanced data. BMC
bioinformatics, 2013. 14: p. 106-106.
29. Wang, T., et al., Correlation between CT based radiomics features and gene expression
data in non-small cell lung cancer. Journal of X-ray Science and Technology, 2019. 27(5):
p. 773-803.

A New Deep Learning Method For Automatic Ovarian Cancer Prediction & Subtype Classification
No ratings yet
A New Deep Learning Method For Automatic Ovarian Cancer Prediction & Subtype Classification
10 pages
BreastCancer Classification - 2025
No ratings yet
BreastCancer Classification - 2025
24 pages
Chrons Disease Documentation
No ratings yet
Chrons Disease Documentation
390 pages
A Survey On Brain Tumor Image Analysis: Kashfia Sailunaz Sleiman Alhajj Tansel Özyer Jon Rokne Reda Alhajj
No ratings yet
A Survey On Brain Tumor Image Analysis: Kashfia Sailunaz Sleiman Alhajj Tansel Özyer Jon Rokne Reda Alhajj
45 pages
Performance Evaluation of Tumor
No ratings yet
Performance Evaluation of Tumor
12 pages
Ieee Icraset Presentation Final
No ratings yet
Ieee Icraset Presentation Final
17 pages
Multi Modal Analysis For Accurate Prediction of Preoperative Stage and Indications of Optimal Treatment in Gastric Cancer
No ratings yet
Multi Modal Analysis For Accurate Prediction of Preoperative Stage and Indications of Optimal Treatment in Gastric Cancer
11 pages
A Project Report On Social Work
0% (1)
A Project Report On Social Work
13 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
5 pages
Fonc 11 631686
No ratings yet
Fonc 11 631686
11 pages
The Role of Deep Learning and Radiomic F
No ratings yet
The Role of Deep Learning and Radiomic F
14 pages
Deep Learning-Based Methods For Brain Tumor Segmentation
No ratings yet
Deep Learning-Based Methods For Brain Tumor Segmentation
23 pages
Early Detection of Cancer Using Soft Computing
No ratings yet
Early Detection of Cancer Using Soft Computing
6 pages
Automated Cancer Diagnosis Based On Histopathologi
No ratings yet
Automated Cancer Diagnosis Based On Histopathologi
17 pages
2020 - Machine and Deep Learning Methods For Radiomics
No ratings yet
2020 - Machine and Deep Learning Methods For Radiomics
33 pages
PIIS2472630324001006
No ratings yet
PIIS2472630324001006
18 pages
Transfer - Learning - For - Medical - Image - Classification SLR
No ratings yet
Transfer - Learning - For - Medical - Image - Classification SLR
14 pages
(IJCST-V10I5P50) :DR K. Sailaja, Guttalasandu Vasudeva Reddy
No ratings yet
(IJCST-V10I5P50) :DR K. Sailaja, Guttalasandu Vasudeva Reddy
7 pages
Brain Tumor Classification Using Neural Network Based Methods
No ratings yet
Brain Tumor Classification Using Neural Network Based Methods
6 pages
Segmentation of Medical Images Using Adaptive Region Growing
No ratings yet
Segmentation of Medical Images Using Adaptive Region Growing
10 pages
Final Model
No ratings yet
Final Model
6 pages
Optimizing Classification Models For Medical Image Diagnosis: A Comparative Analysis On Multi-Class Datasets
No ratings yet
Optimizing Classification Models For Medical Image Diagnosis: A Comparative Analysis On Multi-Class Datasets
10 pages
An Artificial Intelligence Method To Assess The Tumor Microenvironment With Treatment Outcomes For Gastric Cancer Patients After Gastrectomy
No ratings yet
An Artificial Intelligence Method To Assess The Tumor Microenvironment With Treatment Outcomes For Gastric Cancer Patients After Gastrectomy
13 pages
Optimizing Pulmonary Carcinoma Detection Through Image Segmentation Using Evolutionary Algorithms
No ratings yet
Optimizing Pulmonary Carcinoma Detection Through Image Segmentation Using Evolutionary Algorithms
11 pages
Effective Feature Selection Using Multi-Objective Improved Ant Colony Optimization For Breast Cancer Classification
No ratings yet
Effective Feature Selection Using Multi-Objective Improved Ant Colony Optimization For Breast Cancer Classification
10 pages
Machine Learning Applied For Long Bone Tumor Segmentation
No ratings yet
Machine Learning Applied For Long Bone Tumor Segmentation
2 pages
Optimal Image Feature Set For Detecting Lung Nodules On Chest X-Ray Images
No ratings yet
Optimal Image Feature Set For Detecting Lung Nodules On Chest X-Ray Images
6 pages
Applsci 12 10763 v2
No ratings yet
Applsci 12 10763 v2
14 pages
04 Manuscript
No ratings yet
04 Manuscript
15 pages
An Intelligent Black Widow Optimization On Image Enhancement With Deep Learning Based Ovarian Tumor Diagnosis Model
No ratings yet
An Intelligent Black Widow Optimization On Image Enhancement With Deep Learning Based Ovarian Tumor Diagnosis Model
16 pages
Identifying Liver Cancer Cells Using Cascaded Convolutional Neural Network and Gray Level Co-Occurrence Matrix Techniques
No ratings yet
Identifying Liver Cancer Cells Using Cascaded Convolutional Neural Network and Gray Level Co-Occurrence Matrix Techniques
9 pages
Sensors 22 00807 v2
No ratings yet
Sensors 22 00807 v2
23 pages
Kumar 2012
No ratings yet
Kumar 2012
26 pages
CT-based Deep Learning Segmentation of Ovarian Cancer and The Stability of The Extracted Radiomics Features
No ratings yet
CT-based Deep Learning Segmentation of Ovarian Cancer and The Stability of The Extracted Radiomics Features
12 pages
Az 03303230327
No ratings yet
Az 03303230327
5 pages
Data
No ratings yet
Data
3 pages
Brain Tumor MRI Image Classification With Feature Selection and Extraction Using Linear Discriminant Analysis
No ratings yet
Brain Tumor MRI Image Classification With Feature Selection and Extraction Using Linear Discriminant Analysis
16 pages
1 s2.0 S1877050917327813 Main
No ratings yet
1 s2.0 S1877050917327813 Main
9 pages
Gastric Cancer Paper 5)
No ratings yet
Gastric Cancer Paper 5)
27 pages
A Technical Study On Biomedical Image Classification Using Mining Algorithms
No ratings yet
A Technical Study On Biomedical Image Classification Using Mining Algorithms
4 pages
How To Develop A Meaningful Radiomic Signature For Clinical Use in Oncologic Patients
No ratings yet
How To Develop A Meaningful Radiomic Signature For Clinical Use in Oncologic Patients
10 pages
TMP - 10033 5a4b 394565751 PDF
No ratings yet
TMP - 10033 5a4b 394565751 PDF
17 pages
An Efficient Medical Image Processing Approach Based On A Cognitive Marine Predators Algorithm
No ratings yet
An Efficient Medical Image Processing Approach Based On A Cognitive Marine Predators Algorithm
7 pages
PMNet A Probability Map Based Scaled Network - 2021 - Computerized Medical Imag
No ratings yet
PMNet A Probability Map Based Scaled Network - 2021 - Computerized Medical Imag
7 pages
Cancers 11 01235 PDF
No ratings yet
Cancers 11 01235 PDF
36 pages
Lung Cancer Detection by Using Image Processing Approach: IOP Conference Series: Materials Science and Engineering
No ratings yet
Lung Cancer Detection by Using Image Processing Approach: IOP Conference Series: Materials Science and Engineering
4 pages
138 Submission
No ratings yet
138 Submission
5 pages
Radiomics From Various Tumour Volume Sizes For Prognosis Prediction of Head and Neck Squamous Cell Carcinoma A Voted Ensemble Machine Learning Approach
No ratings yet
Radiomics From Various Tumour Volume Sizes For Prognosis Prediction of Head and Neck Squamous Cell Carcinoma A Voted Ensemble Machine Learning Approach
15 pages
Deep Learning Predictive Model For Colon Cancer
No ratings yet
Deep Learning Predictive Model For Colon Cancer
10 pages
Radiomics-Clinical AI Model With Probability Weighted Strategy For Prognosis Prediction in Non-Small Cell Lung Cancer
No ratings yet
Radiomics-Clinical AI Model With Probability Weighted Strategy For Prognosis Prediction in Non-Small Cell Lung Cancer
13 pages
Piumi Isbi 2024
No ratings yet
Piumi Isbi 2024
1 page
Fuzzy C-Means Approach To Ovarian Cancer Recognition and Analysis
No ratings yet
Fuzzy C-Means Approach To Ovarian Cancer Recognition and Analysis
5 pages
Prognostic Prediction of Cancer Based On Radiomics Features
No ratings yet
Prognostic Prediction of Cancer Based On Radiomics Features
11 pages
Detecting Cancer in Gastrointestinal Images Using MATLAB
No ratings yet
Detecting Cancer in Gastrointestinal Images Using MATLAB
5 pages
Advancing Cancer Classification With Hybrid Deep Learning: Image Analysis For Lung and Colon Cancer Detection
No ratings yet
Advancing Cancer Classification With Hybrid Deep Learning: Image Analysis For Lung and Colon Cancer Detection
15 pages
Li 2018
No ratings yet
Li 2018
4 pages
Humanize
No ratings yet
Humanize
2 pages
Otcon2022 Paper 200
No ratings yet
Otcon2022 Paper 200
6 pages
Major Paper
No ratings yet
Major Paper
10 pages
TA1 English - Mini Excavator
No ratings yet
TA1 English - Mini Excavator
15 pages
1 F40, R-41, In-House IHTM-14 Test Report
No ratings yet
1 F40, R-41, In-House IHTM-14 Test Report
1 page
Canon+EOS+R8+Brochure
No ratings yet
Canon+EOS+R8+Brochure
21 pages
Mos Cabin R1
100% (1)
Mos Cabin R1
13 pages
Struers Prestopress3 Embedded Press
No ratings yet
Struers Prestopress3 Embedded Press
23 pages
How To Draw and Read Line Diagrams Onboard Ships
No ratings yet
How To Draw and Read Line Diagrams Onboard Ships
23 pages
Management Plan: Operations Manager
No ratings yet
Management Plan: Operations Manager
14 pages
Concept Paper 3 Eng 10 New
No ratings yet
Concept Paper 3 Eng 10 New
6 pages
RFP DURG EPC S&T Work
No ratings yet
RFP DURG EPC S&T Work
110 pages
Women Empowerment
100% (1)
Women Empowerment
7 pages
My Notes Financial Market
No ratings yet
My Notes Financial Market
8 pages
Hazard Analysis and Risk Assessments For Industrial Processes Using FMEA and Bow-Tie Methodologies
No ratings yet
Hazard Analysis and Risk Assessments For Industrial Processes Using FMEA and Bow-Tie Methodologies
13 pages
Nutrition in Plants All Sets Quiz
No ratings yet
Nutrition in Plants All Sets Quiz
8 pages
Certificate of Creditable Tax Withheld at Source: Kawanihan NG Rentas Internas
No ratings yet
Certificate of Creditable Tax Withheld at Source: Kawanihan NG Rentas Internas
4 pages
CS Project
No ratings yet
CS Project
17 pages
Ny Lybrary
No ratings yet
Ny Lybrary
6 pages
An Enhanced Simulation-Based Iterated Local Search Metaheuristic For Gravity Fed Water Distribution Network Design Optimization
No ratings yet
An Enhanced Simulation-Based Iterated Local Search Metaheuristic For Gravity Fed Water Distribution Network Design Optimization
27 pages
Lifelong Graph Learning: Preprint. Under Review
No ratings yet
Lifelong Graph Learning: Preprint. Under Review
12 pages
How To Package and Deploy SAP Business One Extensions For Lightweight Deployment
No ratings yet
How To Package and Deploy SAP Business One Extensions For Lightweight Deployment
26 pages
Blackand Berendzen 2020
No ratings yet
Blackand Berendzen 2020
16 pages
ĐỀ KIỂM TRA ĐẦU VÀO - ANH 7 Global
No ratings yet
ĐỀ KIỂM TRA ĐẦU VÀO - ANH 7 Global
5 pages
EnviroBLASTO: A Calculator For Estimating The Environmental Impacts of Rock Blasting
No ratings yet
EnviroBLASTO: A Calculator For Estimating The Environmental Impacts of Rock Blasting
6 pages
Template Jurnal Al-Manar
No ratings yet
Template Jurnal Al-Manar
3 pages
L2 and L3-Network Classification-Topology
No ratings yet
L2 and L3-Network Classification-Topology
17 pages
Biology 10th 10 - 10 - 2024 - 083742
No ratings yet
Biology 10th 10 - 10 - 2024 - 083742
2 pages
UNIT 11 - BT MLH 11 - Test 2
No ratings yet
UNIT 11 - BT MLH 11 - Test 2
3 pages
Electronic Certificate
No ratings yet
Electronic Certificate
2 pages
2009 01215 PDF
No ratings yet
2009 01215 PDF
14 pages
View-Invariant Action Recognition: Synonyms
No ratings yet
View-Invariant Action Recognition: Synonyms
14 pages
Document Similarity From Vector Space Densities
No ratings yet
Document Similarity From Vector Space Densities
12 pages
China Plastic Chair in Furniture Suppliers, Plastic Chair in Furniture Manufacturers From China On
No ratings yet
China Plastic Chair in Furniture Suppliers, Plastic Chair in Furniture Manufacturers From China On
11 pages
Strings (ALL PROGRAMS)
No ratings yet
Strings (ALL PROGRAMS)
4 pages
Building Application-Specific Overlays On Fpgas With High-Level Customizable Ips
No ratings yet
Building Application-Specific Overlays On Fpgas With High-Level Customizable Ips
7 pages
Safe Optimal Control Using Stochastic Barrier Functions and Deep Forward-Backward Sdes
No ratings yet
Safe Optimal Control Using Stochastic Barrier Functions and Deep Forward-Backward Sdes
19 pages
The Connections Between Lyapunov Functions For Some Optimization Algorithms and Differential Equations
No ratings yet
The Connections Between Lyapunov Functions For Some Optimization Algorithms and Differential Equations
18 pages
A Framework For A Modular Multi-Concept Lexicographic Closure Semantics
No ratings yet
A Framework For A Modular Multi-Concept Lexicographic Closure Semantics
18 pages
Robust, Accurate Stochastic Optimization For Variational Inference
No ratings yet
Robust, Accurate Stochastic Optimization For Variational Inference
16 pages
AI Solutions For Drafting in Magic: The Gathering: Henry N. Ward Daniel J. Brooks Dan Troha Bobby Mills
No ratings yet
AI Solutions For Drafting in Magic: The Gathering: Henry N. Ward Daniel J. Brooks Dan Troha Bobby Mills
15 pages
Review of Hussain Sagar Lake Pollution, Hyderabad, India
No ratings yet
Review of Hussain Sagar Lake Pollution, Hyderabad, India
7 pages
Distributed Locally Non-Interfering Connectivity Via Linear Temporal Logic
No ratings yet
Distributed Locally Non-Interfering Connectivity Via Linear Temporal Logic
6 pages
A Bayesian Approach With Type-2 Student-T Membership Function For T-S Model Identification
No ratings yet
A Bayesian Approach With Type-2 Student-T Membership Function For T-S Model Identification
5 pages
ENVI Classic Tutorial: Target Detection
No ratings yet
ENVI Classic Tutorial: Target Detection
18 pages
Advancements in Cancer Research: Exploring Diagnostics and Therapeutic Breakthroughs
From Everand
Advancements in Cancer Research: Exploring Diagnostics and Therapeutic Breakthroughs
Sankha Bhattacharya
No ratings yet
Case Studies in Advanced Skin Cancer Management: An Osce Viva Resource
From Everand
Case Studies in Advanced Skin Cancer Management: An Osce Viva Resource
James Bricknell
No ratings yet

Applying A Random Projection Algorithm To Optimize Machine Learning Model For Predicting Peritoneal Metastasis in Gastric Cancer Patients Using CT Images

Uploaded by

Applying A Random Projection Algorithm To Optimize Machine Learning Model For Predicting Peritoneal Metastasis in Gastric Cancer Patients Using CT Images

Uploaded by

Applying a random projection algorithm to optimize machine

learning model for predicting peritoneal metastasis in

gastric cancer patients using CT images

Seyedehnafiseh Mirniaharikandehei1, Morteza Heidari1, Gopichandh Danala1, Sivaramakrishnan

Lakshmivarahan2, Bin Zheng1

imbalanced image datasets.

prediction accuracy (71.2%) than using PCA (65.2%) (p<0.05).

optimal feature vector, improving the performance of ML models of medical images.

Keywords- Gastric Cancer, Quantitative features, Computed tomography, Random projection,

Feature dimensionality reduction.

facilitate the assessment of tumor stages and the risk of PM.

sections of this article.

2. Materials and methods

histopathology examinations of endoscopic-biopsied tissues at two hospitals. Among these

Category Cases with PM Cases without PM

2.2. Tumor Segmentation

tumor segmentation in one CT slice.

(d) Growing until the growth (e) Final segmentation

Figure 1. The process of 2D tumor region segmentation.

growing process and avoid growth leakage.

Figure 2 shows an example of the segmentation of tumor regions depicting several CT

Figure 2. An Example of 3D segmentation of a lesion in 3 different slices.

2.3. Feature Extraction

flow diagram of the feature extraction process.

Figure 3. Diagram of Feature extraction Process.

2.4. Feature Dimensionality Reduction Using Random Projection Algorithm

principal component analysis (PCA), etc.

points in that k dimensional space expressed as follows:

dimensional space is defined as follows in Formula 3 [19]:

distance between the two points remains at 2.

(𝑚1 , … , 𝑚𝑘 ) in the space of features, considering features independent with an acceptable

approximation, or almost perpendicular variables mapping to different axes, with 𝐸(𝑚𝑖 ) = 𝑝𝑖 ,

(1 − 𝜖)|𝑧 − 𝑤|2 ≤ |𝑓(𝑧) − 𝑓(𝑤)|2 ≤ (1 + 𝜖)|𝑧 − 𝑤|2 (7)

The above approximation also can be achieved from Formula 8 as follows:

|𝑓(𝑧) − 𝑓(𝑤)|2 |𝑓(𝑧) − 𝑓(𝑤)|2

optimal features out of the large initial pool of 315 features.

2.5. Machine learning model

Then, to attain the new instances (𝑣1 , 𝑣4 ), a randomized interpolation is applied.

Figure 4. Synthetic data in SMOTE algorithm illustration.

performance evaluation indices.

Image Data Random SMOTE

Validation Random GBM

Figure 5.The flowchart of the proposed CAD scheme.

prediction accuracy (p < 0.05) than using the PCA method.

in predicting the risk of gastric cancer cases with PM.

2D features 0.66±0.017 68.4%

3D features 0.69±0.019 71.2%

optimization to improve accuracy in detecting advanced gastric patients with PM.

tumor regions from all 159 cases in our dataset.

AUC=0.66±0.017 and the accuracy of 68.4% (p < 0.05), respectively.

abnormalities in the future.

the research, analysis, and manuscript.

You might also like