0% found this document useful (0 votes)
18 views12 pages

Applying A Random Projection Algorithm To Optimize Machine Learning Model For Breast Lesion Classification

Uploaded by

padmajakamaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views12 pages

Applying A Random Projection Algorithm To Optimize Machine Learning Model For Breast Lesion Classification

Uploaded by

padmajakamaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

2764 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 68, NO.

9, SEPTEMBER 2021

Applying a Random Projection Algorithm to


Optimize Machine Learning Model for Breast
Lesion Classification
Morteza Heidari , Sivaramakrishnan Lakshmivarahan, Seyedehnafiseh Mirniaharikandehei ,
Gopichandh Danala, Sai Kiran R. Maryada, Hong Liu , and Bin Zheng

Abstract—Objective: Since computer-aided diagnosis I. INTRODUCTION


(CAD) schemes of medical images usually computes large
EVELOPING computer-aided detection and diagnosis
number of image features, which creates a challenge of
how to identify a small and optimal feature vector to build
robust machine learning models, the objective of this study
D (CAD) schemes of medical images have been attracting
broad research interest in order to detect suspicious diseased
is to investigate feasibility of applying a random projection regions, classify between malignant and benign lesions, quantify
algorithm (RPA) to build an optimal feature vector from
the initially CAD-generated large feature pool and improve disease severity, and predict disease prognosis or monitor treat-
performance of machine learning model. Methods: We as- ment efficacy. Some CAD schemes have been used as “a second
semble a retrospective dataset involving 1,487 cases of reader” or quantitative image marker assessment tools in clinical
mammograms in which 644 cases have confirmed malig- practice to assist clinicians (i.e., radiologists) aiming to improve
nant mass lesions and 843 have benign lesions. A CAD image reading accuracy and reduce the inter-reader variability
scheme is first applied to segment mass regions and ini-
tially compute 181 features. Then, support vector machine [1]. Despite of extensive research effort and progress made in the
(SVM) models embedded with several feature dimension- CAD field, researchers still face many challenges in developing
ality reduction methods are built to predict likelihood of CAD schemes for clinical applications [2]. For example, in
lesions being malignant. All SVM models are trained and developing CAD schemes, machine learning plays a critical role,
tested using a leave-one-case-out cross-validation method. which use image features to train classification models to predict
SVM generates a likelihood score of each segmented mass
region depicting on one-view mammogram. By fusion of the likelihood of the analyzed regions depicting or patterns
two scores of the same mass depicting on two-view mam- representing diseases. However, due to the great heterogeneity
mograms, a case-based likelihood score is also evaluated. of disease patterns and the limited size of image datasets, how
Results: Comparing with the principle component anal- to identify a small and optimal image feature vector to build the
yses, nonnegative matrix factorization, and Chi-squared highly performed and robust machine learning models remains
methods, SVM embedded with RPA yielded a significantly
higher case-based lesion classification performance with a difficult task.
the area under ROC curve of 0.84 ± 0.01 (p<0.02). Con- In current CAD schemes, after image preprocessing to reduce
clusion: The study demonstrates that RPA is a promising image noise, detecting and segmenting suspicious regions of
method to generate optimal feature vectors and improve interest (ROIs), CAD schemes can compute many image features
SVM performance. Significance: This study presents a new from the entire image region or the segmented ROIs. Recently,
method to develop CAD schemes with significantly higher
and robust performance. two methods have attracted broad research interest to compute
image features. One uses a deep transfer learning model as
Index Terms—Breast cancer diagnosis, computer-aided an automated feature extractor (i.e., extracting 4096 features
diagnosis (CAD) of mammograms, feature dimensionality
in a fully connected layer (FC6 or FC7) of an AlexNet). The
reduction, lesion classification, random projection algo-
rithm, support vector machine (SVM). disadvantage of this approach is requiring very big training and
validation image datasets, which are often not available in med-
ical image fields. Another approach uses radiomics concept and
Manuscript received December 17, 2020; accepted January 17, 2021.
Date of publication January 25, 2021; date of current version August method to compute and generate an initial feature pool. Although
20, 2021. This work was supported by Grant R01-CA197150 from the Radiomics typically computes smaller number of features than
National Cancer Institute, National Institutes of Health, USA. (Corre- deep learning based feature extractors, it may still compute many
sponding author: Morteza Heidari.)
Morteza Heidari is with the School of Electrical and Computer En- features (i.e., >1000 image features, which mostly represent
gineering, University of Oklahoma, Norman, OK 73019 USA (e-mail: texture patterns of the segmented ROIs in variety of scanning
[email protected]). directions as reported in previous studies [3], [4]). However, due
Sivaramakrishnan Lakshmivarahan and Sai Kiran R. Maryada are
with the School of Computer Science, University of Oklahoma, USA. to the limited size of the training datasets, such large number
Seyedehnafiseh Mirniaharikandehei, Gopichandh Danala, Hong Liu, of image features can often drive to overfit machine learning
and Bin Zheng are with the School of Electrical and Computer Engineer- models and reduce model robustness. Thus, it is important to
ing, University of Oklahoma, USA.
Digital Object Identifier 10.1109/TBME.2021.3054248 build an optimal feature vector from the initially large feature

0018-9294 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
HEIDARI et al.: APPLYING A RANDOM PROJECTION ALGORITHM TO OPTIMIZE MACHINE LEARNING MODEL 2765

pool in which the generated features should not be redundant TABLE I


CASE NUMBER AND PERCENTAGE DISTRIBUTION OF PATIENTS AGE AND
or highly correlated [5]. Then, machine learning models can MAMMOGRAPHIC DENSITY RATED BY RADIOLOGISTS USING
be better trained to achieve the enhanced performance and BIRADS GUIDELINES
robustness. In general, if the feature dimensionality reduction
happens with choosing the most effective image features from
the initial feature pool, it is known as feature selection (i.e., using
sequential forward floating selection (SFFS) [6]). On the other
hand, if the dimensionality reduction comes from reanalyzing
the initial set of features to produce a new set of orthogonal
features, it is known as feature regeneration (i.e., principal
component analysis (PCA) and its modified algorithms [7]).
Comparing between these two methods, feature regeneration
method has advantages to more effectively eliminate or reduce
redundancy or correlation in the final optimal image feature
vector. However, most of medical image data or features have
very complicated or heterogeneous distribution patterns, which
may not meet the precondition that all feature variables are linear
to optimally apply PCA-type feature regeneration methods.
In order to better address this challenge and more reliably the radiologists on the mammograms. Based on lesion biopsy
regenerate image feature vector for developing CAD schemes results, 644 cases depict malignant lesions and 843 cases had
of medical images, we investigate and test another feature regen- benign lesions. These patients have an age range from 35 to 80
eration method namely, a random projection algorithm (RPA), years old. Table I summarizes and compares case distribution
which is an efficient way to map features into a space with information of patients’ age and mammographic density rated
a lower-dimensional subspace, while preserving the distances by radiologists using breast imaging reporting and data system
between points under better contrast. This mapping process is (BIRADS) guidelines. As shown in the table, patients in benign
done with a random projection matrix. In the lower space since group are moderately younger than the patients in the malignant
the distance is preserved, it will be much easier and reliably to group. However, there is not a significant difference of mammo-
classify between two feature classes. Because of its advantages graphic density between the two groups of patients (p = 0.576).
and high performance, RPA has been tested and implemented All FFDM images were acquired using one type of digital
in a wide range of engineering applications including handwrite mammography machines (Selenia Dimensions made by the
recognition [8], face recognition and detection [9], visual object Hologic Company), which have a fixed pixel size of 70μm
tracking and recognition [10], [11], and car detection [12]. in order to detect microcalcifications. Since in this study, we
Thus, motivated by the success of applying RPA to the com- only focus on classification of soft tissue mass type lesions,
plex and nonlinear feature data used in many engineering appli- all images are thus subsampled using a pixel averaging method
cation domains, we hypothesize that RPA also has advantages with a 5 × 5 pixel frame, so that the pixel size of the subsampled
when applying to medical images with the heterogeneous feature images increases to 0.35 mm. This subsample method has been
distributions. To test our hypothesis, we conduct this study to used and reported in many of our previous CAD studies (i.e.,
investigate feasibility and potential advantages of applying RPA [13], [14]). Additionally, in this dataset, the majority of cases
to build optimal feature vector and train machine learning model have two craniocaudal (CC) and mediolateral oblique (MLO)
implemented in a new computer-aided diagnosis (CAD) scheme view mammograms of either left or right breast in which the
to classify between malignant and benign breast lesions depict- suspicious lesions are detected by the radiologists, while small
ing on digital mammograms. The details of the assembled image fraction of cases just have one CC or MLO image in which the
dataset, the experimental methods of feature regeneration using lesions were detected. Overall, 1197 images depicting malignant
RPA and a support vector machine (SVM) model optimization, lesions and 1302 images depicting benign lesions are available
data analysis and performance evaluation results are presented in this image dataset. All lesion centers are visually marked by
in the following sections. the radiologists using a custom-designed interactive graphic user
interface (GUI) tool. The marked lesion centers are recorded and
II. MATERIALS AND METHODS used as “ground-truth” to evaluate CAD performance [13].
A. Image Dataset
A fully anonymized dataset of full-field digital mammography B. Initial Image Feature Pool With a High Dimensionality
(FFDM) images acquired from 1487 patients are retrospectively In developing CAD schemes to classify between malignant
assembled and used in this study. All cases were randomly and benign breast lesions, many different approaches have been
selected by an institutional review board (IRB) certified research investigated and applied to compute image features including
coordinator from the cancer repository and picture archive and those computed from the segmented lesions [15], the fixed
communication system (PACS). All selected cases have sus- regions of interest (ROIs) [16] and the entire breast area [14].
picious soft-tissue mass type lesions previously detected by Each approach has advantages and disadvantages. However,

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
2766 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 68, NO. 9, SEPTEMBER 2021

their classification performance may be quite comparable with reduce image dependence.
an appropriate training and optimization process. Thus, since 
this study focus on investigating the feasibility and potential P (i, j) = P (i, j, d = 2, ϕ) (1)
ϕ = 0, π/4,π/2,3π/4
advantages of a new feature dimensionality reduction method
of RPA, we will use a simple approach to compute the initial P (i, j) =   P (i,j)
; i, j = 12, 3, . . . , L
image features from both the fixed ROI and the segmented lesion i j P (i,j)
Third, a gray level run length matrix (GLRLM) is another pop-
regions.
ular way to extract textural features. In each local area depicting
Since classification between malignant and benign lesions is
suspicious breast lesion, a set of pixel values are searched within
a difficult task, which depends on optimal fusion of many image
a predefined interval of the gray levels in several directions. They
features related to tissue density heterogeneity, speculation of
are defined as gray level runs. GLRM calculates the length of
lesion boundary, as well as variation of surrounding tissues.
gray-level runs. The length of the run is the number of pixels
Previous studies have demonstrated that statistics and texture
within the run. In the ROI, spatial variation of the pixel values
features can be used to model these valuable image features
for benign and malignant lesions may be different, and gray
including intensity, energy, uniformity, entropy, and statistical
level run is a proper way to delineate this variation. The output
moments, etc. Thus, like most CAD schemes using the ROIs
of a GLRM is a matrix with elements that express the number
with a fixed size as classification targets (including the schemes
of runs in a particular gray level interval with a distinct length.
using deep learning approaches [17]), this CAD scheme also
Depending on the orientation of the run, different matrices can be
focuses on using the statistics and texture-based image features
formed [20]. We in this study consider four different directions
computed from the defined ROIs and the segmented lesion re-
(ϕ = 0, π/4, π/2, 3π/4) for GLRM calculations. Then, just
gions. For this purpose, following methods are used to compute
like GLCM, GLRM is also rotation invariant. Thus, the output
image features that are included in the initial feature pool.
matrices of different angles in a summation mode are merged to
First, from a ROI of an input image, gray level differ-
generate one matrix.
ence method (GLDM) is used to compute the occurrence
Fourth, in addition to the computing texture features from
of the absolute difference between pairs of gray levels di-
the ROI of the original image in the spatial domain, we also
vided in a particularly defined distance in several directions.
explore and conduct multiresolution analysis, which is a reliable
It is a practical way for modeling analytical texture fea-
way to make it possible to perform zooming concept through a
tures. The output of this function is four different probability
wide range of sub-bands in more details [21]. Hence, textural
distributions. For an image I(m, n), we consider displace-
features extracted from the multiresolution sub-bands manifest
ment in different directions like δ(dx , dy ), then Iˆ (m, n) =
the difference in texture more clearly. Specifically, a wavelet
|I(m, n) − I(m + dx , n + dy )| estimates the absolute dif-
transform is performed to extract image texture features. Wavelet
ference between gray levels, where dx , dy are integer val-
decomposes an image into the sub-bands made with high-pass
ues. Now it is possible to determine an estimated probability
ˆ and low-pass filters in horizontal and vertical directions followed
density function for I(m, n) like f (.|δ) in which f (i|δ) =
by a down-sampling process. While down-sampling is suitable
P (Iˆ (m, n) = i). It means for an image with L gray levels, the
for noise cancelation and data compression, high-pass filters are
probability density function is L-dimensional. The components
ˆ beneficial to focus on edge, variations, and the deviation, which
in each index of the function show the probability of I(m, n)
can show and quantify texture difference between benign and
with the same value of the index. In the proposed method
malignant lesions. For this purpose, we apply 2D Daubechies
implemented in this CAD study, we consider dx = dy = 11,
(Db4) wavelet on each ROI to get approximate and detailed
which is calculated heuristically [18]. The probability functions
coefficients. From the computed wavelet maps, a wide range of
are computed in four directions (ϕ = 0, π/4, π/2, 3π/4),
texture features is extracted from principal components of this
which signifies that four probability functions are computed to
domain.
provide the absolute differences in four primary directions that
Moreover, analyzing geometry and boundary of the breast
each of which is used for feature extraction.
lesions and the neighboring area is another way to distinguish
Second, a gray-level co-occurrence matrix (GLCM) estimates
benign and malignant lesions. In general, benign lesions are typ-
the second-order joint conditional probability density function.
ically round, smooth, convex shaped, with well-circumscribed
The GLCM carries information about the locations of pixels
boundary, while malignant lesions tend to be much blurry, irreg-
having similar gray level values, as well as the distance and
ular, rough, with non-convex shapes [22]. Hence, we also extract
angular spatial correlation over an image sub-region. To estab-
and compute a group of features that represent geometry and
lish the occurrence probability of pixels with the gray level of
shape of lesion boundary contour. Then, we add all computed
i, j over an image along a given distance of d and a specific
features as described above to create the initial pool of image
orientation of ϕ, we have P (i, j, d, ϕ). In this way, the output
features.
matrix has a dimension of the gray levels (L) of the image
[19]. Like GLDM, we compute four co-occurrence matrices in
four cardinal directions (ϕ = 0, π/4, π/2, 3π/4). GLCM is C. Applying Random Projection Algorithm (RPA) to
rotation invariant. We combine the results of different angles in Generate Optimal Feature Vector
a summation mode to obtain the following probability density Before using RPA to generate an optimal feature vector from
function for feature extraction, which is also normalized to the initial image feature pool, we first normalize each feature

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
HEIDARI et al.: APPLYING A RANDOM PROJECTION ALGORITHM TO OPTIMIZE MACHINE LEARNING MODEL 2767

to make its value distribution between [0, 1] to reduce case- that all the projected distances in the new space are within a
based dependency and weight all features equally. Thus, for each determined scale-factor of the initial d-dimensional space [25].
case, we have a feature vector of size d , which is valuable to Hence, although some redundant features are removed, the final
determine that case based on the extracted features as a point in a accuracy may not increase, since contrast between the points
d dimensional space. For two points like X = (x1 , . . . , xd ), and may still be not enough to present a robust model.
Y = (y1 , . . . yd ), the distance in d dimensional spaces define To address this issue, we take advantage of Johnson-
as: Lindenstrauss Lemma to optimize the feature space. Based on
 the idea of this lemma, for any 0 <  < 1, and any number of
 d

|X − Y | =  (xj − yj )2 (2) cases as N , which are like the points in d-dimensional space
j=1
(Rd ), if we assume k as a positive integer, it can be computed
as:
In addition, it is also possible to define the volume V of a ln N
sphere in a d dimensional space as a function of its radius (r) k ≥4 2 (6)
3
2 − 3

and the dimension of the space as (3). This equation is proved
in [23].
d
Then, for any set V of N points in Rd , for all u, v ∈ V , it
d
r π 2
is possible to prove that there is a map, or random projection
V (d) = 1
d (3)
2 dΓ 2
function like f : Rd → Rk , which preserves the distance in the
following approximation [26], which is known as Restricted
The matrix of features is normalized between [0, 1]. It means
Isometry Property(RIP):
a sphere with r = 1 can encompass all the data. An interesting
fact about a unit-radius sphere is that as equation (4) shows, as (1 − ) |u − v|2 ≤ |f (u) − f (v)|2 ≤ (1 + ) |u − v|2 (7)
d
the dimension increase, the volume goes to zero. Since π 2 is an
Another arrangement of this formula is like:
exponential of d2 , while growing rate of Γ( d2 ) is a factorial of d2 .
At the same time, the maximum possible distance between two |f (u) − f (v)|2 |f (u) − f (v)|2
points stays at 2. ≤ |u − v|2 ≤ (8)
(1 + ) (1 − )
 d
π2 ∼0 As these formulas show the distance between the set of points
lim d  d  = (4) in the lower-dimension space is approximately close to the
2Γ 2
d→∞
distance in high-dimensional space. This Lemma states that it is
Moreover, based on the heavy-tailed distribution theorem, for possible to project a set of points from a high-dimensional space
a case like X = (x1 , . . . , xd ) in the space of features, suppose into a lower dimensional space, while the distances between the
with an acceptable approximation features are independent, or points are nearly preserved.
nearly perpendicular variables as mapped to different axes, with It implies that if we project the initial group of features into
E (xi ) = pi , di = 1 pi = μ and E|(xi − pi )k | ≤ pi for k = a space with a lower-dimensional subspace using the random
23, . . . , t2 /6μ , then, the previous study [24] has proven that: projection method, the distances between points are preserved
 d under better contrast. This may help better classify between two
 −t2 −t
prob xi − μ ≥ t ≤ M ax 3e 12µ , 4 × 2 e (5) feature classes representing benign and malignant lesions with
i=1 low risk of overfitting.
We can perceive that the farther the value of t increases, the It should be noted that for an input matrix of features like
smaller the chance of having a point out of that distance, which X ∈ Rn×d , n and d represent the number of training samples and
means that X would be concentrated around the mean value. features, respectively. Unlike the principal component analysis
Overall, based on equations (4), and (5) with an acceptable (PCA) that assumes relationship among feature variables are
approximation, all data are encompassed in a sphere of size one, linear and intends to generate new orthogonal features, RPA
and they are concentrated around their mean value. As a result, aims to preserve distance of the points (training samples) while
if the dimensionality is high, the volume of the sphere is close reducing the space dimensionality. Thus, using RPA will create
to zero. Hence, the contrast between the cases is not enough for a subspace X̃ = XR in which R satisfies the RIP condition,
a proper classification. and R ∈ Rd×k , X̃ ∈ Rn×k . Since the subspace’s geometry is
Above analysis also indicates the more features included in the preserved, previous studies [27], [28] proved that a SVM based
initial feature vector, the higher the dimension of the space is, and machine learning classifier could better preserve the character-
the more data is concentrated around the center, which makes it istics of the image dataset to build the optimal hyperplane and
more difficult to have enough contrast between the features. A thus reduce the generalization error. In other words, if an SVM
powerful technique to reduce the dimensionality while approxi- classifier makes the resulting margin γ ∗ = 1/ w∗ 2 for its opti-
mately preserves the distance between the points, which implies mal hyperplane (w∗ ) after solving the optimization problem on
approximate preservation of the highest amount of information, the initial feature space of X, and on the subspace of X̃, it makes
is the key point that we are looking for. If we adopt a typical the resulting margin γ̃ ∗ = 1/ w̃∗ 2 for the respective optimized
feature selection method and randomly select a k-dimensional hyperplane (w̃∗ ). Another study [29] proved that hinge loss (for
sup-space of the initial feature vector, it is possible to prove margin γ̃ ∗ ) of the classifier trained on the subspace data (X̃) is

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
2768 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 68, NO. 9, SEPTEMBER 2021

Fig. 2. Example to illustrate lesion segmentation, which include (a) the


original ROI, (b) absolute difference of ROI from low-pass filtered ver-
sion, (c) combination of (a) and (b) which gives the suspicious regions
Fig. 1. Example of 4 extracted ROIs with the detected suspicious soft- better contrast to the background, (d) output of morphological filtering,
tissue masses (lesions) in ROI center. (a), (b) 2 ROIs involving malignant (e) blob with the largest size is selected (a) binary version of the lesion),
lesions and (c), (d) 2 ROIs involving benign lesions. and (f) finally segmented lesion area. It is output of mapping (e) to (a).

TABLE II
less than that (γ ∗ ) of the classifier trained on the original data LIST OF THE COMPUTED FEATURES ON ROI AREA
(X). Strictly speaking, the trained classifier’s error rate on the
optimized subspace generated using RPA is lower than that of the
classifier trained on the original space. It indicates that training
a machine learning classifier using an optimal subspace under
RIP condition can build a more accurate and robust model for
the classification purpose
In this study, we investigate and demonstrate whether using
RPA can yield better result as comparable to other popular
feature dimensionality reduction approaches (i.e., PCA).

D. Experiment of Feature Combination and first applied to filter the whole ROI. Next, CAD computes the
Dimensionality Reduction
absolute pixel value difference between the original ROI and
First, the proposed CAD scheme applies an image prepro- the filtered ROI to produce a new image map that highlights
cessing step to the whole images in the dataset to read them the lesion and other regions (or blobs) with locally higher and
one by one, and based on the lesion centers pre-marked by the heterogeneous tissue density. Then, CAD applies morphological
radiologists to extract a squared ROI area in which the centers of filters (i.e., opening and closing) to delete the small and isolated
the lesion and ROI overlap. In order to identify the optimal size blobs (with the pixel members less than 50), and repair boundary
of the ROIs, a heuristic method is applied to select and analyze contour of the lesion and other remaining blobs with higher
ROI size. Basically, the different ROI sizes (i.e., in the range from tissue density. Since in this study, the user clicks the lesion
128×128 to 180×180 pixels) are examined and compared. From center and the ROI is extracted around this clicked point, the
the experiments, we observe that the ROIs with size of 150×150 blob located in the center of ROI represents the segmented
pixels generate the best classification results applying to this lesion. Fig. 2 shows an example of applying this algorithm to
large and diverse dataset, which reveals that this is the most locate and segment suspicious lesion from the surrounding tissue
efficient size to cover all mass lesions included in our diverse background.
dataset, which corresponds to use the ROI of 52.5 × 52.5mm2 . After image segmentation, CAD scheme computes several
Fig. 1 shows examples of 4 ROIs depicting two malignant lesions sets of the relevant image features. The first group of features
and two benign lesions. After ROI determination, all the images are the pixel value (or density) related statistics features as sum-
in the dataset are saved in Portable Network Graphics (PNG) marized in Table II. These 20 statistics features are repeatedly
format with 16 bits in the lossless mode for the feature extraction computed from three types of images namely, 1) the entire ROI
phase. of the original images (as shown in Fig. 2(a)), 2) the segmented
Next, the CAD scheme is applied to segment lesion from the lesion region (as shown in Fig. 2(f)), and 3) all highly dense and
background. For this process, CAD applies an unsharp masking heterogeneous tissue blobs (as shown in Fig. 2(d)). Thus, this
method in which a low-pass filter with a window-size of 30 is group of features includes 60 statistics features.

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
HEIDARI et al.: APPLYING A RANDOM PROJECTION ALGORITHM TO OPTIMIZE MACHINE LEARNING MODEL 2769

Fig 3. Wavelet based feature extraction. Wavelet decomposition is applied three times to make the images compress as possible. Then PCA is
adopted as another way of data compression.

The second group of features is computed from the GLRLM TABLE III
LIST OF WAVELET-BASED FEATURES
matrix of the ROI area. For this purpose, 16 different quantiza-
tion levels are considered to calculate all probability functions in
four different directions from the histograms. After combining
the probability functions, on rotation invariance version of them,
the following group of features is computed. Features are short-
run emphasis, long-run emphasis, gray level non-uniformity,
run percentage, run-length non-uniformity, low gray level run
emphasis, and high gray level run emphasis. Hence, this group
of features includes seven GLRM-based features.
The third group of features includes GLDM based features TABLE IV
computed from the entire ROI. Specifically, we select a distance LIST OF GEOMETRICAL FEATURES
value of 11 pixels for the inter-sample distance calculation. CAD
computes four different probability density functions (PDFs)
based on the image histogram calculation in different directions.
The PDF (p) with (μ) as the mean of the population, standard
deviation, root mean square level, and the first four statistical
moments (n = 1, 2, 3, 4) with the following equation are
calculated as features.

N
m̂n = pi (xi − μ)n (9)
i=1

It is an unbiased estimate of nth moment possible to calculate The fifth group of features includes wavelet-based features.
by: The Daubechies wavelet decomposition is accomplished on the
 ∞ original ROI (i.e., Fig. 2(a)). Fig. 3 shows a block diagram of
mn = p (x) xn dx (10) the wavelet-based feature extraction procedure. The last four
−∞
sub-bands of wavelet transform are used to build a matrix of
As shown in equation 10, p(x) is weighted by xn . Hence, any four sub-bands in which principal components of this matrix are
change in the р(x) is polynomially reinforced in the statistical driven for feature extraction and computation. The computed
moments. Thus, any difference in the four PDFs computed from features are listed in Table III. We also repeat the same process to
malignant lesions is likely to be polynomially reinforced in the compute wavelet-based feature from the segmented lesion (i.e.,
statistical moments of the computed coefficients. Six features Fig. 2(f)). As a result, this feature group includes 26 wavelet-
from each of four GLDM based PDFs make this feature group, based image features.
which has total 24 features. Last, to address the differences between morphological and
The fourth group of features computes GLCM based texture structural characteristics of benign and malignant lesions, an-
feature. Based on the method proposed in the previous study other group of geometrical based features is derived and com-
[30], our CAD scheme generates a matrix of 44 textural features puted from the segmented lesion region. For this purpose, a
computed from GLCM matrix based on all GLCM based equa- binary version of the lesion, like what we showed in Fig. 2(e), is
tions proposed in [19]. In this way any properties of the GLCM first segmented from the ROI area. Then, all the properties listed
matrix proper for the classification purpose is granted. Hence, in Table IV are calculated from the segmented lesion region in
this group contains 44 features computed from the entire ROI. the image using the equations reported in [31].

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
2770 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 68, NO. 9, SEPTEMBER 2021

Fig 4. Illustration of the overall classification flow of the CAD scheme developed and tested in this study.

By combining all features computed in above 6 groups, CAD Second, we apply the RPA to reduce the dimensionality of
scheme creates an initial pool of 181 image features. Then, image feature space and map to the most efficient feature vector
RPA is applied to reduce feature dimensionality and generate as input features of the SVM model. To demonstrate the poten-
an optimal feature vector. For this purpose, we utilize sparse tial advantages of using RPA in developing machine learning
random matrix as the projection function to achieve the criteria models, we build and compare 5 SVM models, which using
as defined in equation (7). Sparse random matrix is a memory all 181 image features included in the initial feature pool, and
efficient and fast computing way of projecting data, which embedding 4 other feature dimensionality reduction methods
guarantees the embedding quality of this idea. To do so, if including (1) random projection algorithm (RPA), (2) principle
we define s = 1/density, in which density defines ratio of component analyses (PCA), (3) nonnegative matrix factorization
non-zero components in the RPA, the components of the matrix (NMF), and (4) Chi-squared (Chi2).
as random matrix elements (RME) are: Third, to increase size and diversity of training cases, as
⎧  well as reduce the potential bias in case partitions, we use a
⎪ − 1/ s leave-one-case-out (LOCO) based cross-validation method to

⎨ ncomponents ,
2s
1 train SVM model and evaluate its performance. All feature
RM E = 0, with
probability 1 − /s (11)

⎪ dimensionality reduction methods discussed in the second step
⎩ 1
ncomponents , /2s
s
are also embedded in this LOCO iteration process to train the
SVM. This can diminish the potential bias in the process of
In this process, we select ncomponents , which is the size of feature dimensionality reduction and machine learning model
the projected subspace. As recommended in [32], we consider training as demonstrated in our previous study [33]. When the
number of non-zero elements to the minimum density, which is: RPA is embedded in the LOCO based model training pro-
1/√ cess, it helps generate a feature vector independent of the test
n_f eatures.
case. Thus, the test case is unknown to both RPA and SVM
model training process. In this way, in each LOCO iteration
E. Development and Evaluation cycle, the trained SVM model is tested on a truly indepen-
of Machine Learning Model dent test case by generating an unbiased classification score
After processing images and computing image features from for the test case. As a result, all SVM-generated classifica-
all 1197 ROIs depicting malignant lesions and 1302 ROIs depict- tion scores are independent of the training data. In addition,
ing benign lesions, we build machine learning model to classify other N-fold cross-validation methods (i.e., N = 3, 5, 10)
between malignant and benign lesions by taking following steps are also tested and compared with LOCO method in the
or measures. Fig. 4 shows a block diagram of the machine study.
learning model along with the training and testing process. First, Fourth, since majority of lesions detected in two ROIs from
although many machine learning models (i.e., artificial neural CC and MLO view mammograms, in the LOCO process, two
networks, K-nearest neighborhood network, Bayesian belief ROIs representing the same lesion will be grouped together
network, support vector machine) have been investigated and to be used for either training or validation to avoid potential
used to develop CAD schemes, based on our previous research bias. After training, ROIs in one remaining case will be used to
experience [14], we adopt the support vector machine (SVM) test the machine learning model that generates a classification
to train a multi-feature fusion based machine leaning model to score to indicate the likelihood of each testing ROI depicting
predict the likelihood of lesions being malignancy in this study. a malignant lesion. The score ranges from 0 to 1. The higher
Under a grid search and hyperparameter analyses, linear kernel score indicates a higher risk of being malignant. In addition
implemented in SVM model can achieve a low computational to the classification score of each ROI, a case-based likelihood
cost and high robustness in prediction results as well. score is also generated by fusion of two scores of two ROIs

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
HEIDARI et al.: APPLYING A RANDOM PROJECTION ALGORITHM TO OPTIMIZE MACHINE LEARNING MODEL 2771

TABLE V
ACCURACY OF THE SVM MODELS FOR CASE-BASED CLASSIFICATION
BASED ON SIX DIFFERENT CATEGORIES OF THE ORIGINAL FEATURES

Fig 5. A malignant case annotated by radiologists in both CC and MLO


views. The annotated mass is squared in each view.

representing the same lesion depicting on CC and MLO view


mammograms.
Fifth, a receiver operating characteristic (ROC) method is
applied in the data analysis. Area under ROC curve (AUC) is
computed from the ROC curve and utilized as an evaluation in-
dex to evaluate and compare performance of each SVM model to
classify between the malignant and benign lesions. Then, we also
apply an operating threshold of T = 0.5 on the SVM-generated
classification scores to classify or divide all testing cases into Fig. 6. A trend of the case-based classification AUC values generated
by the SVM models trained using different number of features (NF)
two classes of malignant and benign cases. By comparing to the generated by the RPA.
available ground-truth, a confusion matrix for the classification
results is determined for each SVM. From the confusion matrix,
we compute classification accuracy, sensitivity, specificity, and
odds ratio (OR) of each SVM model based on both lesion region using all 181 image features misclassifies this malignant lesion
and case. In the region-based performance evaluation, all lesion into benign when an operating threshold (T = 0.5) is applied,
region are considered independent, while in the case-based while the SVM model trained using the embedded RPA increases
performance evaluation, the average classification score of two the classification scores for both lesion regions depicting on CC
matched lesion regions (if the lesions are detected and marked and MLO view images. As a result, it is correctly classified as
by radiologists in both CC and MLO view) is computed and malignant with the case-based classification score greater than
used. In this study, all pre-processing and feature extraction steps the operating threshold.
to make the matrix of features are conducted using MATLAB Table V summarizes the performance of using the original
R2019a package. features computed in 6 categories to classify between the malig-
nant and benign lesions. As shown in this table, using the group
of statistical features yields the highest classification accuracy
III. RESULTS among 6 categories of features. Fig. 6 shows a curve indicating
Fig. 5 shows a malignant case as an example in which the the variation trend of the AUC values of the SVM models trained
lesion center is annotated by radiologists in both CC and MLO and tested using different number of features (ranging from 50 to
view mammograms. Based on the marked center, we plot two 100) generated by the proposed RPA. The trend result indicates
square areas on two images in which image features are com- that using a reduced feature dimensionality with 80 features, the
puted by the CAD scheme. Using the whole feature vector of 181 SVM yields the highest AUC value of 0.84.
image features, the SVM-model generates the following classi- Table VI shows and compares the average number of the input
fication scores to predict the likelihood of two lesion regions on features used to train 5 SVM models with and without embed-
two view images being malignant, which are SCCview = 0.685, ding different feature dimensionality reduction methods, lesion
and SM LOview = 0.291. The case-based classification score is region-based and case-based classification performance of AUC
SCase = 0.488. When using the feature vectors generated by the values. When embedding a feature dimensionality reduction
RPA, the SVM-model generates two new classification scores algorithm, the size of feature vectors in different LOCO-based
of these two lesion regions, which are SCCview = 0.817, and SVM model training and validation cycle may vary. Table VI
SM LOview = 0.375. Thus, the case-based classification score shows that average number of features are reduced from original
is SCase = 0.596. As a result, using the SVM model trained 181 features to 100 or less. When using RPA, the average number

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
2772 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 68, NO. 9, SEPTEMBER 2021

TABLE VI TABLE VII


SUMMARY OF AVERAGE NUMBER OF IMAGE FEATURES USED IN 5 FIVE CONFUSION MATRICES OF CASE-BASED LESION CLASSIFICATION
DIFFERENT SVM MODELS AND CLASSIFICATION PERFORMANCE (AUC) USING 5 DIFFERENT SVM MODELS TO CLASSIFY BETWEEN
BASED ON BOTH REGION AND CASE-BASED LESION CLASSIFICATION. p BENIGN AND MALIGNANT CASES
VALUE COMPARES RESULTS OF EACH MODEL TO THE
LAST ONE (RPA) AS THE OPTIMAL ONE

TABLE VIII
SUMMARY OF THE LESION CASE-BASED CLASSIFICATION ACCURACY,
SENSITIVITY, SPECIFICITY, AND ODD RATIO OF USING 5 SVMS TRAINED
USING DIFFERENT GROUPS OF OPTIMIZED FEATURES

in Table VI and ROC curves in Fig. 7 also indicate that the case-
based lesion classification yields higher performance than the
Fig. 7. Comparison of 10 ROC curves generated using 5 SVM models region-based classification performance, which indicates that
and 2 scoring (region and case-based) methods to classify between using and combining image features computed from two-view
malignant and benign lesion regions or cases. mammograms has advantages.
Table VII presents 5 confusion matrices of lesion case-based
classification using 5 SVM-models after applying the operating
of features is 80. From both Table VI and Fig. 7, which show threshold (T = 0.5). Based on this table, several lesion clas-
and compare the corresponding AUC values and ROC curves, sification performance indices like sensitivity, specificity, and
we observe that a SVM model trained using an embedded RPA odds ratio are measured and shown in Table VIII. This table
feature dimensionality reduction method produces the statisti- also shows that the SVM model trained based on the feature
cally significantly higher or improved classification performance vector generated by the RPA yields the highest classification
including a case-based AUC value of 0.84 ± 0.01 as comparing accuracy comparing to the other 4 SVM models trained using
to all other SVM model (p < 0.05) including the SVM trained feature vectors generated either based on other three feature
using the initial feature pool of 181 features and other SVM dimensionality reduction methods or the original feature pool
models embedded with other three feature dimensionality re- of 181 features.
duction methods namely, principle component analyses (PCA), Table IX shows and compares the classification results using
nonnegative matrix factorization (NMF) and Chi-squared (Chi2) four different cross-validation methods (N = 3, 5, 10 and
in the classification model training process. In addition, the data LOCO). The results show two trends of performance decrease

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
HEIDARI et al.: APPLYING A RANDOM PROJECTION ALGORITHM TO OPTIMIZE MACHINE LEARNING MODEL 2773

TABLE IX First, previous CAD schemes of mammograms computed


SUMMARY OF THE CASE-BASED LESION CLASSIFICATION FOR THE
PROPOSED METHOD (RPA) UNDER DIFFERENT
image features from either the segmented lesion regions or
CROSS VALIDATION (CV) TECHNIQUES the regions with a fixed size (i.e., squared ROIs to cover le-
sions with varying sizes). Both approaches have advantages
and disadvantages. Due to the difficulty to accurately segment
subtle lesions with fuzzy boundary, the image features computed
from the automatically segmented lesions may not be accurate
or reproducible, which reduces the accuracy of the computed
image features to represent actual lesion regions. When using the
fixed ROIs (including most deep learning based CAD schemes
[[17], [36]]), although it can avoid the potential error in lesion
and standard deviation increase (in both AUC and accuracy) as segmentation, it may lose and reduce the weight of the image
the number of folds decreases from the maximum folds (LOCO) features that are more relevant to the lesions due to the potential
to the smallest folds (N = 3). This indicates that using LOCO heavy influence of irregular fibro-glandular tissue distribution
yields not only the highest performance, but also probably surrounding the lesions with varying sizes. In this study, we
highest robustness due to the smallest standard deviation. tested a new approach that combines image features computed
Additionally, to assess the reduction of feature redundancy from both a fixed ROI and the segmented lesion region. In
after applying RPA, we create a feature correlation matrix, addition, comparing to the most of previous CAD studies as
corr(i, j) with the number of M features. Then, we compute surveyed in the previous study, which used several hundreds
a mean absolute value of the correlation matrix: of malignant and benign lesion regions [37], we assemble a
much larger image dataset with 1847 cases or 2499 lesion region
1 M
mean of correlation = |corr (i, j)| (12) (including 1197 malignant lesion regions and 1302 benign lesion
M × M i,j = 1 regions). Despite using a much larger image dataset, this new
CAD scheme yields a higher classification performance (AUC
Two mean values of correlation computed from two correla-
= 0.84 ± 0.01) as comparing to AUC of 0.78 to 0.82 reported in
tion matrices generated using the feature space (or pools) before
our previous CAD studies that using much smaller image dataset
and after applying RPA are 0.49 and 0.31, respectively, which
(<500 malignant and benign ROIs or images) [17], [38]. Thus,
indicates that feature correlation coefficients after using RPA is
although it may be difficult to directly compare performance of
reduced. Thus, using RPA can reduces not only dimensionality
CAD schemes tested using different image datasets as surveyed
of feature space, but also redundancy of the feature space.
in [37], we believe that our new approach to combine image
Last, the computational processing tasks of applying RPA
features computed from both a fixed ROI and the segmented le-
to generate optimal features and train the SVM model are
sion region has advantages to partially compensate the potential
performed using a Dell computer (Processor: Intel(R) Xeon
lesion segmentation error and misrepresentation of the lesions
CPU E5-1603 v3, 2.8 GHz, and 16 GB RAM) and Python-based
related image features, and enable to achieve an improved or
software package. For cross validation process we use Sklearn-
very comparable classification performance.
model library. For example, in the 10-fold cross validation,
Second, since identifying a small, but effective and non-
the average computation time to complete one cross-validation
redundant image feature vector plays an important role in CAD
iteration is approximately 38.12 seconds.
development to train machine learning classifiers or models,
many feature selection or dimensionality reduction methods
IV. DISCUSSION have been investigated and applied in previous studies. Although
Mammography is a popular imaging modality used in breast these methods can exclude many redundant and low-performed
cancer screening and early cancer detection. However, due to or irrelevant features in the initial pool of features, the chal-
the heterogeneity of breast lesions and dense fibro-glandular lenge of how to build a small feature vector with orthogonal
tissue, it is difficult for radiologists to accurately predict or feature components to represent the complex and non-linear
determine the likelihood of the detected suspicious lesions being image feature space remains. For the first time, we in this study
malignant. As a result, mammography screening generates high introduce the RPA to the medical imaging informatics field
false-positive recall rates and majority of biopsies are approved to develop CAD scheme. RPA is a technique that maximally
to be benign [34]. Thus, to help increase specificity of breast preserves the distance between the sub-set of points in the
lesion classification and reduce the unnecessary biopsies, devel- lower-dimension space. As explained in the Introduction section,
oping CAD schemes to assist radiologists more accurately and in the lower space under preserving the distance between points,
consistently classifying between malignant and benign breast classification is much more robust with low risk of overfitting.
lesions remains an active research topic [35]. In this study, we This is not only approved by the simulation or application
develop and assess a new CAD scheme of mammograms to results reported in previous studies, it is also confirmed by this
predict the likelihood of the detected suspicious breast lesions study. The results in Table VI show that by using the optimal
being malignant. This study has following unique characteristics feature vectors generated by RPA, the SVM model yields sig-
as comparing to other previous CAD studies reported in the nificantly higher classification performance in comparison with
literature. other SVM models trained using either all initial features or

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
2774 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 68, NO. 9, SEPTEMBER 2021

other feature vectors generated by other three popular feature technology [3], [4]. Thus, more texture features can be explored
selection and dimensionality reduction methods. Using the RPA in future studies to increase diversity of the initial feature pool,
boosts the AUC value from 0.72 to 0.78 in comparison with which may also increase the chance of selecting or generating
the original feature vector in the lesion region-based analy- more optimal features. Additionally, many deep transfer learning
sis, and from 0.74 to 0.84 in the lesion case-base evaluation, models have been recently tested as feature extractors in medical
which also enhances the classification accuracy from 69.3% to imaging field, which produce much larger number of features
75.2%, and approximately doubling the odds ratio from 4.85 than the radiomics approaches. Thus, whether using RPA can
to 8.86 (Table VIII). Thus, the study results confirm that RPA also help significantly reduce dimensionality of these feature
is a promising technique applicable to generate optimal feature extractors to more effectively and robustly train or build the
vectors for training machine learning models used in CAD of final classification layer of the deep leaning models should be
medical images. investigated in future studies.
Third, since the heterogeneity of breast lesions and surround
fibro-glandular tissues distributed in 3D volumetric space, the V. CONCLUSION
segmented lesion shape and computed image features often vary
significantly in two projection images (CC and MLO view), In summary, due to the difference between human vision and
we investigate and evaluate CAD performance based on single computer vision, it is often difficult to accurately identify a
lesion regions and the combined lesion cases if two images small set of optimal and non-redundant features computed by the
of CC and MLO views were available and the lesions are CAD schemes of medical images. In this study, we investigate
detectable on two view images. Table VI shows and compares feasibility of applying a new approach based on the random pro-
lesion region-based and case-based classification performance jection algorithm (RPA) to generate the optimal feature vectors
of 5 SVM models. The result data clearly indicates that instead for training machine learning models implemented in the CAD
of just selecting one lesion region for likelihood prediction, it schemes of mammograms to classify between malignant and
would be much more accurate when the scheme processes and benign breast lesions. Study results indicate that applying this
examines two lesion regions depicting on both CC and MLO RPA approach creates a more compact feature space that can
view images. For example, when using the SVM trained with reduce feature correlation or redundancy. By comparing with
the feature vectors generated by the RPA, the lesion case-based other three popular feature dimensionality reduction methods,
classification performance increases 7.7% in AUC value from the study results also demonstrate that using RPA enables to
0.78 to 0.84 as comparing to the region-based performance generate an optimal feature vector to build a machine learning
evaluation. model, which yields significantly higher classification perfor-
Last, although the study has tested a new CAD development mance. In addition, since building an optimal feature vector is
method using a RPA to generate optimal feature vector and an important precondition of building optimal machine learning
yielded encouraging results to classify between the malignant models, the new method demonstrated in this study is not only
and benign breast lesions, we realize that the reported study limited to CAD schemes of mammograms, it can also be adopted
results are made on a laboratory-based retrospective image data and used by researchers to develop and optimize CAD schemes
analysis process with several limitations. First, although the of other types of medical images to detect and diagnose different
dataset used in this study is relatively large and diverse, whether types of cancers or diseases in the future
this dataset can sufficiently represent real clinical environment
or breast cancer population is unknown or not tested. All FFDM ACKNOWLEDGMENT
images were acquired using one type of digital mammography The authors would like to thank acknowledge the support
machines. Due to the difference of the image characteristics received from the Peggy and Charles Stephenson Cancer Center,
(i.e., contrast-to-noise ratio) between FFDM machines made University of Oklahoma, USA.
by different vendors, the CAD scheme developed in this study
may not be directly and optimally applicable to mammograms
REFERENCES
produced by other types of FFDM machines. However, we
believe that the concept demonstrated in this study is valid. Thus, [1] J. Katzen and K. Dodelzon, “A review of computer aided detection in
mammography,” Clin. Imag., vol. 52, no. 6, pp. 305–309, Nov. 2018.
the similar CAD schemes can be easily retrained or fine-tuned [2] R. M. Nishikawa and D. Gur, “CADe for early detection of breast cancer
using a new set of digital mammograms acquired using other – current status and why we need to continue to explore new approaches,”
different types of FFDM machines of interest. Second, in this Acad. Radiol., vol. 21, no. 10, pp. 1320–1321, Oct. 2014.
[3] J. Yin et al., “A radiomics signature to identify malignant and benign
retrospective study, the image dataset has a higher ratio between liver tumors on plain CT images,” J. X-Ray Sci. Technol., vol. 28, no. 4,
the malignant and benign lesions, which is different from the pp. 683–694, Aug. 2020.
false-positive recall rates in the clinical practices. Thus, the [4] Z. Q. Sun et al., “Radiomics study for differentiating gastric cancer from
gastric stromal tumor based on contrast-enhanced CT images,” J. X-Ray
reported AUC values may also be different from the real clinical Sci. Technol., vol. 27, no. 6, pp. 1021–1031, Dec. 2019.
practice, which needs to be further tested in future prospective [5] M. Kuhn and K. Johnson, “An introduction to feature selection,” in Appl.
clinical studies. Third, in the initial pool of features, we only Predictive Model.. New York, NY, USA: Springer, 2013, pp. 487–519.
[6] M. Tan, J. Pu, and B. Zheng, “Optimization of breast mass classification
extracted a limited number of 181 statistics, textural and geomet- using sequential forward floating selection (SFFS) and a support vector
rical features, which are much less than the number of features machine (SVM) model,” Int. J. Comput.-Assist. Radiol. Surg., vol. 9, no. 6,
computed based on recently developed radiomics concept and pp. 1005–1020, Mar. 2014.

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
HEIDARI et al.: APPLYING A RANDOM PROJECTION ALGORITHM TO OPTIMIZE MACHINE LEARNING MODEL 2775

[7] M. Heidari et al., “Prediction of breast cancer risk using a machine learning [23] C. C. Aggarwal, A. Hinneburg, and D. A. Keim, “On the surprising
approach embedded with a locality preserving projection algorithm,” Phys. behavior of distance metrics in high dimensional space,” in Proc. Int.
Med. Biol., vol. 63, no. 3, Jan. 2018, Art. no. 035020. Conf. Database Theory. Berlin, Heidelberg: Springer, Jan. 2001, vol. 1973,
[8] Q. Wang et al., “Hierarchical feature selection for random projection,” pp. 420–434.
IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 5, pp. 1581–1586, [24] R. Vershynin, High-Dimensional Probability: An Introduction With Ap-
Sep. 2018. plications in Data Science. Cambridge, England: Cambridge Univ. Press,
[9] L. Qiao, S. Chen, and X. Tan, “Sparsity preserving projections with appli- vol. 47, 2018.
cations to face recognition,” Pattern Recognit., vol. 43, no. 1, pp. 331–341, [25] C. Saunders et al., “Subspace, latent structure and feature selection,” in
Jan. 2010. Proc. Stat. Optim. Perspectives Workshop, 1st ed., SLSFS 2005 Bohinj,
[10] Y. Gao et al., “Extended compressed tracking via random projection based Slovenia, Feb. 23–25, 2005, pp. 523–2005.
on MSERs and online LS-SVM learning,” Pattern Recognit., vol. 59, no. 1, [26] A. Gupta and S. Dasgupta, “An elementary proof of the Johnson-
pp. 245–254, Nov. 2016. lindenstrauss Lemma,” Random Struct. Algorithms, vol. 22, no. 1,
[11] M. L. Mekhalfi et al., “Fast indoor scene description for blind people with pp. 60–65, 2002.
multiresolution random projections,” J. Vis. Commun. Image Representa- [27] P. Saurabh et al., “Random projections for linear support vector machines,”
tion, vol. 44, no. 100, pp. 95–105, Apr. 2017. ACM Trans. Knowl. Discov. Data, vol. 8, no. 4, pp. 1–25, 2014.
[12] J. Tang, C. Deng, and G. Huang, “Extreme learning machine for multi- [28] P. Saurabh et al., “Random projections for support vector machines,” in
layer perceptron,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 4, Proc. Artif. Intell. Statist., 2013, pp. 498–506.
pp. 809–821, May. 2015. [29] S. Karthik, S. Shalev-Shwartz, and N. Srebro, “Fast rates for regularized
[13] B. Zheng et al., “Computer-aided detection of breast masses depicting objectives,” in Proc. 21st Int. Conf. Neural Inf. Process. Syst., Dec. 2008,
on full-field digital mammograms: A performance assessment,” Brit. J. pp. 1545–1552.
Radiol., vol. 85, no. 1014, pp. e153–e161, Jun. 2012. [30] W. Gómez, W. C. A. Pereira, and A. F. C. Infantosi, “Analysis of co-
[14] M. Heidari et al., “Development and assessment of a new global mam- occurrence texture statistics as a function of gray-level quantization for
mographic image feature analysis scheme to predict likelihood of ma- classifying breast ultrasound,” IEEE Trans. Med. Imag., vol. 31, no. 10,
lignant cases,” IEEE Trans. Med. Imag., vol. 39, no. 4, pp. 1235–1244, pp. 1889–1899, Jun. 2012.
Apr. 2020. [31] M. J. Zdilla et al., “Circularity, solidity, axes of a best fit ellipse, aspect
[15] G. Danala et al., “Classification of breast masses using a computer- ratio, and roundness of the foramen ovale: A morphometric analysis
aided diagnosis scheme of contrast enhanced digital mammograms,” Ann. with neurosurgical considerations,” J. Craniofacial Surg., vol. 27, no. 1,
Biomed. Eng., vol. 46, no. 9, pp. 1419–1431, Sep. 2018. pp. 222–228, Jan. 2016.
[16] X. Wang et al., “An interactive system for computer-aided diagnosis of [32] P. Li, T. J. Hastie, and K. W. Church, “Very sparse random projections,” in
breast masses,” J. Digit. Imag., vol. 25, no. 5, pp. 570–579, Oct. 2012. Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, Aug. 2006,
[17] Y. Qiu et al., “A new approach to develop computer-aided diagnosis pp. 287–296.
scheme of breast mass classification using deep learning technology,” J. [33] F. Aghaei et al., “Applying a new quantitative global breast MRI feature
X-Ray Sci. Technol., vol. 25, no. 5, pp. 751–763, Jan. 2017. analysis scheme to assess tumor response to chemotherapy,” J. Magn.
[18] J. S. Weszka, C. R. Dyer, and A. Rosenfeld, “A comparative study Reson. Imag., vol. 44, no. 5, pp. 1099–1106, Nov. 2016.
of texture measures for terrain classification,” IEEE Trans. Syst., Man, [34] H. D. Nelson et al., “Factors associated with rates of false-positive and
Cybern., vol. 6, no. 4, pp. 269–285, Apr. 1976. false-negative results from digital mammography screening: An analysis
[19] R. M. Haralick, K. Shanmugam, and I. H. Dinstein, “Textural features of registry data,” Ann. Intern. Med., vol. 164, no. 4, pp. 226–235, Feb. 2016.
for image classification,” IEEE Trans. Syst., Man, Cybern., vol. 3, no. 6, [35] H. P. Chan, R. K. Samala, and L. M. Hadjiiski, “CAD and AI for breast
pp. 610–621, Nov. 1973. cancer – recent development and challenges,” Brit. J. Radiol., vol. 93,
[20] M. M. Galloway, “Texture classification using gray level run no. 1108, Dec. 2019, Art. no. 20190580.
length,” Comput. Graph. Image Process., vol. 4, no. 2, pp. 172–179, [36] C. Tao et al., “New one-step model of breast tumor locating based on deep
Jun. 1975. learning,” J. X-Ray Sci. Technol., vol. 27, no. 5, pp. 839–856, Oct. 2019.
[21] M. Z. Do Nascimento et al., “Classification of masses in mammographic [37] Y. Wang et al., “Computer-aided classification of mammographic masses
image using wavelet domain features and polynomial classifier,” Expert using visually sensitive image features,” J. X-Ray Sci. Technol., vol. 25,
Syst. Appl., vol. 40, no. 15, pp. 6213–6221, Nov. 2013. no. 1, pp. 171–186, Jan. 2017.
[22] N. R. Mudigonda, R. Rangayyan, and J. E. Leo Desautels, “Gradient and [38] X. Chen et al., “Applying a new quantitative image analysis scheme based
texture analysis for the classification of mammographic masses,” IEEE on global mammographic features to assist diagnosis of breast cancer,”
Trans. Med. Imag., vol. 19, no. 10, pp. 1032–1043, Oct. 2000. Comput. Methods Programs Biomed., vol. 179, Oct. 2019, Art. no. 104995.

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.

You might also like