Applying A Random Projection Algorithm To Optimize Machine Learning Model For Breast Lesion Classification
Applying A Random Projection Algorithm To Optimize Machine Learning Model For Breast Lesion Classification
9, SEPTEMBER 2021
0018-9294 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
HEIDARI et al.: APPLYING A RANDOM PROJECTION ALGORITHM TO OPTIMIZE MACHINE LEARNING MODEL 2765
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
2766 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 68, NO. 9, SEPTEMBER 2021
their classification performance may be quite comparable with reduce image dependence.
an appropriate training and optimization process. Thus, since
this study focus on investigating the feasibility and potential P (i, j) = P (i, j, d = 2, ϕ) (1)
ϕ = 0, π/4,π/2,3π/4
advantages of a new feature dimensionality reduction method
of RPA, we will use a simple approach to compute the initial P (i, j) = P (i,j)
; i, j = 12, 3, . . . , L
image features from both the fixed ROI and the segmented lesion i j P (i,j)
Third, a gray level run length matrix (GLRLM) is another pop-
regions.
ular way to extract textural features. In each local area depicting
Since classification between malignant and benign lesions is
suspicious breast lesion, a set of pixel values are searched within
a difficult task, which depends on optimal fusion of many image
a predefined interval of the gray levels in several directions. They
features related to tissue density heterogeneity, speculation of
are defined as gray level runs. GLRM calculates the length of
lesion boundary, as well as variation of surrounding tissues.
gray-level runs. The length of the run is the number of pixels
Previous studies have demonstrated that statistics and texture
within the run. In the ROI, spatial variation of the pixel values
features can be used to model these valuable image features
for benign and malignant lesions may be different, and gray
including intensity, energy, uniformity, entropy, and statistical
level run is a proper way to delineate this variation. The output
moments, etc. Thus, like most CAD schemes using the ROIs
of a GLRM is a matrix with elements that express the number
with a fixed size as classification targets (including the schemes
of runs in a particular gray level interval with a distinct length.
using deep learning approaches [17]), this CAD scheme also
Depending on the orientation of the run, different matrices can be
focuses on using the statistics and texture-based image features
formed [20]. We in this study consider four different directions
computed from the defined ROIs and the segmented lesion re-
(ϕ = 0, π/4, π/2, 3π/4) for GLRM calculations. Then, just
gions. For this purpose, following methods are used to compute
like GLCM, GLRM is also rotation invariant. Thus, the output
image features that are included in the initial feature pool.
matrices of different angles in a summation mode are merged to
First, from a ROI of an input image, gray level differ-
generate one matrix.
ence method (GLDM) is used to compute the occurrence
Fourth, in addition to the computing texture features from
of the absolute difference between pairs of gray levels di-
the ROI of the original image in the spatial domain, we also
vided in a particularly defined distance in several directions.
explore and conduct multiresolution analysis, which is a reliable
It is a practical way for modeling analytical texture fea-
way to make it possible to perform zooming concept through a
tures. The output of this function is four different probability
wide range of sub-bands in more details [21]. Hence, textural
distributions. For an image I(m, n), we consider displace-
features extracted from the multiresolution sub-bands manifest
ment in different directions like δ(dx , dy ), then Iˆ (m, n) =
the difference in texture more clearly. Specifically, a wavelet
|I(m, n) − I(m + dx , n + dy )| estimates the absolute dif-
transform is performed to extract image texture features. Wavelet
ference between gray levels, where dx , dy are integer val-
decomposes an image into the sub-bands made with high-pass
ues. Now it is possible to determine an estimated probability
ˆ and low-pass filters in horizontal and vertical directions followed
density function for I(m, n) like f (.|δ) in which f (i|δ) =
by a down-sampling process. While down-sampling is suitable
P (Iˆ (m, n) = i). It means for an image with L gray levels, the
for noise cancelation and data compression, high-pass filters are
probability density function is L-dimensional. The components
ˆ beneficial to focus on edge, variations, and the deviation, which
in each index of the function show the probability of I(m, n)
can show and quantify texture difference between benign and
with the same value of the index. In the proposed method
malignant lesions. For this purpose, we apply 2D Daubechies
implemented in this CAD study, we consider dx = dy = 11,
(Db4) wavelet on each ROI to get approximate and detailed
which is calculated heuristically [18]. The probability functions
coefficients. From the computed wavelet maps, a wide range of
are computed in four directions (ϕ = 0, π/4, π/2, 3π/4),
texture features is extracted from principal components of this
which signifies that four probability functions are computed to
domain.
provide the absolute differences in four primary directions that
Moreover, analyzing geometry and boundary of the breast
each of which is used for feature extraction.
lesions and the neighboring area is another way to distinguish
Second, a gray-level co-occurrence matrix (GLCM) estimates
benign and malignant lesions. In general, benign lesions are typ-
the second-order joint conditional probability density function.
ically round, smooth, convex shaped, with well-circumscribed
The GLCM carries information about the locations of pixels
boundary, while malignant lesions tend to be much blurry, irreg-
having similar gray level values, as well as the distance and
ular, rough, with non-convex shapes [22]. Hence, we also extract
angular spatial correlation over an image sub-region. To estab-
and compute a group of features that represent geometry and
lish the occurrence probability of pixels with the gray level of
shape of lesion boundary contour. Then, we add all computed
i, j over an image along a given distance of d and a specific
features as described above to create the initial pool of image
orientation of ϕ, we have P (i, j, d, ϕ). In this way, the output
features.
matrix has a dimension of the gray levels (L) of the image
[19]. Like GLDM, we compute four co-occurrence matrices in
four cardinal directions (ϕ = 0, π/4, π/2, 3π/4). GLCM is C. Applying Random Projection Algorithm (RPA) to
rotation invariant. We combine the results of different angles in Generate Optimal Feature Vector
a summation mode to obtain the following probability density Before using RPA to generate an optimal feature vector from
function for feature extraction, which is also normalized to the initial image feature pool, we first normalize each feature
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
HEIDARI et al.: APPLYING A RANDOM PROJECTION ALGORITHM TO OPTIMIZE MACHINE LEARNING MODEL 2767
to make its value distribution between [0, 1] to reduce case- that all the projected distances in the new space are within a
based dependency and weight all features equally. Thus, for each determined scale-factor of the initial d-dimensional space [25].
case, we have a feature vector of size d , which is valuable to Hence, although some redundant features are removed, the final
determine that case based on the extracted features as a point in a accuracy may not increase, since contrast between the points
d dimensional space. For two points like X = (x1 , . . . , xd ), and may still be not enough to present a robust model.
Y = (y1 , . . . yd ), the distance in d dimensional spaces define To address this issue, we take advantage of Johnson-
as: Lindenstrauss Lemma to optimize the feature space. Based on
the idea of this lemma, for any 0 < < 1, and any number of
d
|X − Y | = (xj − yj )2 (2) cases as N , which are like the points in d-dimensional space
j=1
(Rd ), if we assume k as a positive integer, it can be computed
as:
In addition, it is also possible to define the volume V of a ln N
sphere in a d dimensional space as a function of its radius (r) k ≥4 2 (6)
3
2 − 3
and the dimension of the space as (3). This equation is proved
in [23].
d
Then, for any set V of N points in Rd , for all u, v ∈ V , it
d
r π 2
is possible to prove that there is a map, or random projection
V (d) = 1
d (3)
2 dΓ 2
function like f : Rd → Rk , which preserves the distance in the
following approximation [26], which is known as Restricted
The matrix of features is normalized between [0, 1]. It means
Isometry Property(RIP):
a sphere with r = 1 can encompass all the data. An interesting
fact about a unit-radius sphere is that as equation (4) shows, as (1 − ) |u − v|2 ≤ |f (u) − f (v)|2 ≤ (1 + ) |u − v|2 (7)
d
the dimension increase, the volume goes to zero. Since π 2 is an
Another arrangement of this formula is like:
exponential of d2 , while growing rate of Γ( d2 ) is a factorial of d2 .
At the same time, the maximum possible distance between two |f (u) − f (v)|2 |f (u) − f (v)|2
points stays at 2. ≤ |u − v|2 ≤ (8)
(1 + ) (1 − )
d
π2 ∼0 As these formulas show the distance between the set of points
lim d d = (4) in the lower-dimension space is approximately close to the
2Γ 2
d→∞
distance in high-dimensional space. This Lemma states that it is
Moreover, based on the heavy-tailed distribution theorem, for possible to project a set of points from a high-dimensional space
a case like X = (x1 , . . . , xd ) in the space of features, suppose into a lower dimensional space, while the distances between the
with an acceptable approximation features are independent, or points are nearly preserved.
nearly perpendicular variables as mapped to different axes, with It implies that if we project the initial group of features into
E (xi ) = pi , di = 1 pi = μ and E|(xi − pi )k | ≤ pi for k = a space with a lower-dimensional subspace using the random
23, . . . , t2 /6μ , then, the previous study [24] has proven that: projection method, the distances between points are preserved
d under better contrast. This may help better classify between two
−t2 −t
prob xi − μ ≥ t ≤ M ax 3e 12µ , 4 × 2 e (5) feature classes representing benign and malignant lesions with
i=1 low risk of overfitting.
We can perceive that the farther the value of t increases, the It should be noted that for an input matrix of features like
smaller the chance of having a point out of that distance, which X ∈ Rn×d , n and d represent the number of training samples and
means that X would be concentrated around the mean value. features, respectively. Unlike the principal component analysis
Overall, based on equations (4), and (5) with an acceptable (PCA) that assumes relationship among feature variables are
approximation, all data are encompassed in a sphere of size one, linear and intends to generate new orthogonal features, RPA
and they are concentrated around their mean value. As a result, aims to preserve distance of the points (training samples) while
if the dimensionality is high, the volume of the sphere is close reducing the space dimensionality. Thus, using RPA will create
to zero. Hence, the contrast between the cases is not enough for a subspace X̃ = XR in which R satisfies the RIP condition,
a proper classification. and R ∈ Rd×k , X̃ ∈ Rn×k . Since the subspace’s geometry is
Above analysis also indicates the more features included in the preserved, previous studies [27], [28] proved that a SVM based
initial feature vector, the higher the dimension of the space is, and machine learning classifier could better preserve the character-
the more data is concentrated around the center, which makes it istics of the image dataset to build the optimal hyperplane and
more difficult to have enough contrast between the features. A thus reduce the generalization error. In other words, if an SVM
powerful technique to reduce the dimensionality while approxi- classifier makes the resulting margin γ ∗ = 1/ w∗ 2 for its opti-
mately preserves the distance between the points, which implies mal hyperplane (w∗ ) after solving the optimization problem on
approximate preservation of the highest amount of information, the initial feature space of X, and on the subspace of X̃, it makes
is the key point that we are looking for. If we adopt a typical the resulting margin γ̃ ∗ = 1/ w̃∗ 2 for the respective optimized
feature selection method and randomly select a k-dimensional hyperplane (w̃∗ ). Another study [29] proved that hinge loss (for
sup-space of the initial feature vector, it is possible to prove margin γ̃ ∗ ) of the classifier trained on the subspace data (X̃) is
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
2768 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 68, NO. 9, SEPTEMBER 2021
TABLE II
less than that (γ ∗ ) of the classifier trained on the original data LIST OF THE COMPUTED FEATURES ON ROI AREA
(X). Strictly speaking, the trained classifier’s error rate on the
optimized subspace generated using RPA is lower than that of the
classifier trained on the original space. It indicates that training
a machine learning classifier using an optimal subspace under
RIP condition can build a more accurate and robust model for
the classification purpose
In this study, we investigate and demonstrate whether using
RPA can yield better result as comparable to other popular
feature dimensionality reduction approaches (i.e., PCA).
D. Experiment of Feature Combination and first applied to filter the whole ROI. Next, CAD computes the
Dimensionality Reduction
absolute pixel value difference between the original ROI and
First, the proposed CAD scheme applies an image prepro- the filtered ROI to produce a new image map that highlights
cessing step to the whole images in the dataset to read them the lesion and other regions (or blobs) with locally higher and
one by one, and based on the lesion centers pre-marked by the heterogeneous tissue density. Then, CAD applies morphological
radiologists to extract a squared ROI area in which the centers of filters (i.e., opening and closing) to delete the small and isolated
the lesion and ROI overlap. In order to identify the optimal size blobs (with the pixel members less than 50), and repair boundary
of the ROIs, a heuristic method is applied to select and analyze contour of the lesion and other remaining blobs with higher
ROI size. Basically, the different ROI sizes (i.e., in the range from tissue density. Since in this study, the user clicks the lesion
128×128 to 180×180 pixels) are examined and compared. From center and the ROI is extracted around this clicked point, the
the experiments, we observe that the ROIs with size of 150×150 blob located in the center of ROI represents the segmented
pixels generate the best classification results applying to this lesion. Fig. 2 shows an example of applying this algorithm to
large and diverse dataset, which reveals that this is the most locate and segment suspicious lesion from the surrounding tissue
efficient size to cover all mass lesions included in our diverse background.
dataset, which corresponds to use the ROI of 52.5 × 52.5mm2 . After image segmentation, CAD scheme computes several
Fig. 1 shows examples of 4 ROIs depicting two malignant lesions sets of the relevant image features. The first group of features
and two benign lesions. After ROI determination, all the images are the pixel value (or density) related statistics features as sum-
in the dataset are saved in Portable Network Graphics (PNG) marized in Table II. These 20 statistics features are repeatedly
format with 16 bits in the lossless mode for the feature extraction computed from three types of images namely, 1) the entire ROI
phase. of the original images (as shown in Fig. 2(a)), 2) the segmented
Next, the CAD scheme is applied to segment lesion from the lesion region (as shown in Fig. 2(f)), and 3) all highly dense and
background. For this process, CAD applies an unsharp masking heterogeneous tissue blobs (as shown in Fig. 2(d)). Thus, this
method in which a low-pass filter with a window-size of 30 is group of features includes 60 statistics features.
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
HEIDARI et al.: APPLYING A RANDOM PROJECTION ALGORITHM TO OPTIMIZE MACHINE LEARNING MODEL 2769
Fig 3. Wavelet based feature extraction. Wavelet decomposition is applied three times to make the images compress as possible. Then PCA is
adopted as another way of data compression.
The second group of features is computed from the GLRLM TABLE III
LIST OF WAVELET-BASED FEATURES
matrix of the ROI area. For this purpose, 16 different quantiza-
tion levels are considered to calculate all probability functions in
four different directions from the histograms. After combining
the probability functions, on rotation invariance version of them,
the following group of features is computed. Features are short-
run emphasis, long-run emphasis, gray level non-uniformity,
run percentage, run-length non-uniformity, low gray level run
emphasis, and high gray level run emphasis. Hence, this group
of features includes seven GLRM-based features.
The third group of features includes GLDM based features TABLE IV
computed from the entire ROI. Specifically, we select a distance LIST OF GEOMETRICAL FEATURES
value of 11 pixels for the inter-sample distance calculation. CAD
computes four different probability density functions (PDFs)
based on the image histogram calculation in different directions.
The PDF (p) with (μ) as the mean of the population, standard
deviation, root mean square level, and the first four statistical
moments (n = 1, 2, 3, 4) with the following equation are
calculated as features.
N
m̂n = pi (xi − μ)n (9)
i=1
It is an unbiased estimate of nth moment possible to calculate The fifth group of features includes wavelet-based features.
by: The Daubechies wavelet decomposition is accomplished on the
∞ original ROI (i.e., Fig. 2(a)). Fig. 3 shows a block diagram of
mn = p (x) xn dx (10) the wavelet-based feature extraction procedure. The last four
−∞
sub-bands of wavelet transform are used to build a matrix of
As shown in equation 10, p(x) is weighted by xn . Hence, any four sub-bands in which principal components of this matrix are
change in the р(x) is polynomially reinforced in the statistical driven for feature extraction and computation. The computed
moments. Thus, any difference in the four PDFs computed from features are listed in Table III. We also repeat the same process to
malignant lesions is likely to be polynomially reinforced in the compute wavelet-based feature from the segmented lesion (i.e.,
statistical moments of the computed coefficients. Six features Fig. 2(f)). As a result, this feature group includes 26 wavelet-
from each of four GLDM based PDFs make this feature group, based image features.
which has total 24 features. Last, to address the differences between morphological and
The fourth group of features computes GLCM based texture structural characteristics of benign and malignant lesions, an-
feature. Based on the method proposed in the previous study other group of geometrical based features is derived and com-
[30], our CAD scheme generates a matrix of 44 textural features puted from the segmented lesion region. For this purpose, a
computed from GLCM matrix based on all GLCM based equa- binary version of the lesion, like what we showed in Fig. 2(e), is
tions proposed in [19]. In this way any properties of the GLCM first segmented from the ROI area. Then, all the properties listed
matrix proper for the classification purpose is granted. Hence, in Table IV are calculated from the segmented lesion region in
this group contains 44 features computed from the entire ROI. the image using the equations reported in [31].
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
2770 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 68, NO. 9, SEPTEMBER 2021
Fig 4. Illustration of the overall classification flow of the CAD scheme developed and tested in this study.
By combining all features computed in above 6 groups, CAD Second, we apply the RPA to reduce the dimensionality of
scheme creates an initial pool of 181 image features. Then, image feature space and map to the most efficient feature vector
RPA is applied to reduce feature dimensionality and generate as input features of the SVM model. To demonstrate the poten-
an optimal feature vector. For this purpose, we utilize sparse tial advantages of using RPA in developing machine learning
random matrix as the projection function to achieve the criteria models, we build and compare 5 SVM models, which using
as defined in equation (7). Sparse random matrix is a memory all 181 image features included in the initial feature pool, and
efficient and fast computing way of projecting data, which embedding 4 other feature dimensionality reduction methods
guarantees the embedding quality of this idea. To do so, if including (1) random projection algorithm (RPA), (2) principle
we define s = 1/density, in which density defines ratio of component analyses (PCA), (3) nonnegative matrix factorization
non-zero components in the RPA, the components of the matrix (NMF), and (4) Chi-squared (Chi2).
as random matrix elements (RME) are: Third, to increase size and diversity of training cases, as
⎧ well as reduce the potential bias in case partitions, we use a
⎪ − 1/ s leave-one-case-out (LOCO) based cross-validation method to
⎪
⎨ ncomponents ,
2s
1 train SVM model and evaluate its performance. All feature
RM E = 0, with
probability 1 − /s (11)
⎪
⎪ dimensionality reduction methods discussed in the second step
⎩ 1
ncomponents , /2s
s
are also embedded in this LOCO iteration process to train the
SVM. This can diminish the potential bias in the process of
In this process, we select ncomponents , which is the size of feature dimensionality reduction and machine learning model
the projected subspace. As recommended in [32], we consider training as demonstrated in our previous study [33]. When the
number of non-zero elements to the minimum density, which is: RPA is embedded in the LOCO based model training pro-
1/√ cess, it helps generate a feature vector independent of the test
n_f eatures.
case. Thus, the test case is unknown to both RPA and SVM
model training process. In this way, in each LOCO iteration
E. Development and Evaluation cycle, the trained SVM model is tested on a truly indepen-
of Machine Learning Model dent test case by generating an unbiased classification score
After processing images and computing image features from for the test case. As a result, all SVM-generated classifica-
all 1197 ROIs depicting malignant lesions and 1302 ROIs depict- tion scores are independent of the training data. In addition,
ing benign lesions, we build machine learning model to classify other N-fold cross-validation methods (i.e., N = 3, 5, 10)
between malignant and benign lesions by taking following steps are also tested and compared with LOCO method in the
or measures. Fig. 4 shows a block diagram of the machine study.
learning model along with the training and testing process. First, Fourth, since majority of lesions detected in two ROIs from
although many machine learning models (i.e., artificial neural CC and MLO view mammograms, in the LOCO process, two
networks, K-nearest neighborhood network, Bayesian belief ROIs representing the same lesion will be grouped together
network, support vector machine) have been investigated and to be used for either training or validation to avoid potential
used to develop CAD schemes, based on our previous research bias. After training, ROIs in one remaining case will be used to
experience [14], we adopt the support vector machine (SVM) test the machine learning model that generates a classification
to train a multi-feature fusion based machine leaning model to score to indicate the likelihood of each testing ROI depicting
predict the likelihood of lesions being malignancy in this study. a malignant lesion. The score ranges from 0 to 1. The higher
Under a grid search and hyperparameter analyses, linear kernel score indicates a higher risk of being malignant. In addition
implemented in SVM model can achieve a low computational to the classification score of each ROI, a case-based likelihood
cost and high robustness in prediction results as well. score is also generated by fusion of two scores of two ROIs
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
HEIDARI et al.: APPLYING A RANDOM PROJECTION ALGORITHM TO OPTIMIZE MACHINE LEARNING MODEL 2771
TABLE V
ACCURACY OF THE SVM MODELS FOR CASE-BASED CLASSIFICATION
BASED ON SIX DIFFERENT CATEGORIES OF THE ORIGINAL FEATURES
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
2772 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 68, NO. 9, SEPTEMBER 2021
TABLE VIII
SUMMARY OF THE LESION CASE-BASED CLASSIFICATION ACCURACY,
SENSITIVITY, SPECIFICITY, AND ODD RATIO OF USING 5 SVMS TRAINED
USING DIFFERENT GROUPS OF OPTIMIZED FEATURES
in Table VI and ROC curves in Fig. 7 also indicate that the case-
based lesion classification yields higher performance than the
Fig. 7. Comparison of 10 ROC curves generated using 5 SVM models region-based classification performance, which indicates that
and 2 scoring (region and case-based) methods to classify between using and combining image features computed from two-view
malignant and benign lesion regions or cases. mammograms has advantages.
Table VII presents 5 confusion matrices of lesion case-based
classification using 5 SVM-models after applying the operating
of features is 80. From both Table VI and Fig. 7, which show threshold (T = 0.5). Based on this table, several lesion clas-
and compare the corresponding AUC values and ROC curves, sification performance indices like sensitivity, specificity, and
we observe that a SVM model trained using an embedded RPA odds ratio are measured and shown in Table VIII. This table
feature dimensionality reduction method produces the statisti- also shows that the SVM model trained based on the feature
cally significantly higher or improved classification performance vector generated by the RPA yields the highest classification
including a case-based AUC value of 0.84 ± 0.01 as comparing accuracy comparing to the other 4 SVM models trained using
to all other SVM model (p < 0.05) including the SVM trained feature vectors generated either based on other three feature
using the initial feature pool of 181 features and other SVM dimensionality reduction methods or the original feature pool
models embedded with other three feature dimensionality re- of 181 features.
duction methods namely, principle component analyses (PCA), Table IX shows and compares the classification results using
nonnegative matrix factorization (NMF) and Chi-squared (Chi2) four different cross-validation methods (N = 3, 5, 10 and
in the classification model training process. In addition, the data LOCO). The results show two trends of performance decrease
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
HEIDARI et al.: APPLYING A RANDOM PROJECTION ALGORITHM TO OPTIMIZE MACHINE LEARNING MODEL 2773
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
2774 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 68, NO. 9, SEPTEMBER 2021
other feature vectors generated by other three popular feature technology [3], [4]. Thus, more texture features can be explored
selection and dimensionality reduction methods. Using the RPA in future studies to increase diversity of the initial feature pool,
boosts the AUC value from 0.72 to 0.78 in comparison with which may also increase the chance of selecting or generating
the original feature vector in the lesion region-based analy- more optimal features. Additionally, many deep transfer learning
sis, and from 0.74 to 0.84 in the lesion case-base evaluation, models have been recently tested as feature extractors in medical
which also enhances the classification accuracy from 69.3% to imaging field, which produce much larger number of features
75.2%, and approximately doubling the odds ratio from 4.85 than the radiomics approaches. Thus, whether using RPA can
to 8.86 (Table VIII). Thus, the study results confirm that RPA also help significantly reduce dimensionality of these feature
is a promising technique applicable to generate optimal feature extractors to more effectively and robustly train or build the
vectors for training machine learning models used in CAD of final classification layer of the deep leaning models should be
medical images. investigated in future studies.
Third, since the heterogeneity of breast lesions and surround
fibro-glandular tissues distributed in 3D volumetric space, the V. CONCLUSION
segmented lesion shape and computed image features often vary
significantly in two projection images (CC and MLO view), In summary, due to the difference between human vision and
we investigate and evaluate CAD performance based on single computer vision, it is often difficult to accurately identify a
lesion regions and the combined lesion cases if two images small set of optimal and non-redundant features computed by the
of CC and MLO views were available and the lesions are CAD schemes of medical images. In this study, we investigate
detectable on two view images. Table VI shows and compares feasibility of applying a new approach based on the random pro-
lesion region-based and case-based classification performance jection algorithm (RPA) to generate the optimal feature vectors
of 5 SVM models. The result data clearly indicates that instead for training machine learning models implemented in the CAD
of just selecting one lesion region for likelihood prediction, it schemes of mammograms to classify between malignant and
would be much more accurate when the scheme processes and benign breast lesions. Study results indicate that applying this
examines two lesion regions depicting on both CC and MLO RPA approach creates a more compact feature space that can
view images. For example, when using the SVM trained with reduce feature correlation or redundancy. By comparing with
the feature vectors generated by the RPA, the lesion case-based other three popular feature dimensionality reduction methods,
classification performance increases 7.7% in AUC value from the study results also demonstrate that using RPA enables to
0.78 to 0.84 as comparing to the region-based performance generate an optimal feature vector to build a machine learning
evaluation. model, which yields significantly higher classification perfor-
Last, although the study has tested a new CAD development mance. In addition, since building an optimal feature vector is
method using a RPA to generate optimal feature vector and an important precondition of building optimal machine learning
yielded encouraging results to classify between the malignant models, the new method demonstrated in this study is not only
and benign breast lesions, we realize that the reported study limited to CAD schemes of mammograms, it can also be adopted
results are made on a laboratory-based retrospective image data and used by researchers to develop and optimize CAD schemes
analysis process with several limitations. First, although the of other types of medical images to detect and diagnose different
dataset used in this study is relatively large and diverse, whether types of cancers or diseases in the future
this dataset can sufficiently represent real clinical environment
or breast cancer population is unknown or not tested. All FFDM ACKNOWLEDGMENT
images were acquired using one type of digital mammography The authors would like to thank acknowledge the support
machines. Due to the difference of the image characteristics received from the Peggy and Charles Stephenson Cancer Center,
(i.e., contrast-to-noise ratio) between FFDM machines made University of Oklahoma, USA.
by different vendors, the CAD scheme developed in this study
may not be directly and optimally applicable to mammograms
REFERENCES
produced by other types of FFDM machines. However, we
believe that the concept demonstrated in this study is valid. Thus, [1] J. Katzen and K. Dodelzon, “A review of computer aided detection in
mammography,” Clin. Imag., vol. 52, no. 6, pp. 305–309, Nov. 2018.
the similar CAD schemes can be easily retrained or fine-tuned [2] R. M. Nishikawa and D. Gur, “CADe for early detection of breast cancer
using a new set of digital mammograms acquired using other – current status and why we need to continue to explore new approaches,”
different types of FFDM machines of interest. Second, in this Acad. Radiol., vol. 21, no. 10, pp. 1320–1321, Oct. 2014.
[3] J. Yin et al., “A radiomics signature to identify malignant and benign
retrospective study, the image dataset has a higher ratio between liver tumors on plain CT images,” J. X-Ray Sci. Technol., vol. 28, no. 4,
the malignant and benign lesions, which is different from the pp. 683–694, Aug. 2020.
false-positive recall rates in the clinical practices. Thus, the [4] Z. Q. Sun et al., “Radiomics study for differentiating gastric cancer from
gastric stromal tumor based on contrast-enhanced CT images,” J. X-Ray
reported AUC values may also be different from the real clinical Sci. Technol., vol. 27, no. 6, pp. 1021–1031, Dec. 2019.
practice, which needs to be further tested in future prospective [5] M. Kuhn and K. Johnson, “An introduction to feature selection,” in Appl.
clinical studies. Third, in the initial pool of features, we only Predictive Model.. New York, NY, USA: Springer, 2013, pp. 487–519.
[6] M. Tan, J. Pu, and B. Zheng, “Optimization of breast mass classification
extracted a limited number of 181 statistics, textural and geomet- using sequential forward floating selection (SFFS) and a support vector
rical features, which are much less than the number of features machine (SVM) model,” Int. J. Comput.-Assist. Radiol. Surg., vol. 9, no. 6,
computed based on recently developed radiomics concept and pp. 1005–1020, Mar. 2014.
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.
HEIDARI et al.: APPLYING A RANDOM PROJECTION ALGORITHM TO OPTIMIZE MACHINE LEARNING MODEL 2775
[7] M. Heidari et al., “Prediction of breast cancer risk using a machine learning [23] C. C. Aggarwal, A. Hinneburg, and D. A. Keim, “On the surprising
approach embedded with a locality preserving projection algorithm,” Phys. behavior of distance metrics in high dimensional space,” in Proc. Int.
Med. Biol., vol. 63, no. 3, Jan. 2018, Art. no. 035020. Conf. Database Theory. Berlin, Heidelberg: Springer, Jan. 2001, vol. 1973,
[8] Q. Wang et al., “Hierarchical feature selection for random projection,” pp. 420–434.
IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 5, pp. 1581–1586, [24] R. Vershynin, High-Dimensional Probability: An Introduction With Ap-
Sep. 2018. plications in Data Science. Cambridge, England: Cambridge Univ. Press,
[9] L. Qiao, S. Chen, and X. Tan, “Sparsity preserving projections with appli- vol. 47, 2018.
cations to face recognition,” Pattern Recognit., vol. 43, no. 1, pp. 331–341, [25] C. Saunders et al., “Subspace, latent structure and feature selection,” in
Jan. 2010. Proc. Stat. Optim. Perspectives Workshop, 1st ed., SLSFS 2005 Bohinj,
[10] Y. Gao et al., “Extended compressed tracking via random projection based Slovenia, Feb. 23–25, 2005, pp. 523–2005.
on MSERs and online LS-SVM learning,” Pattern Recognit., vol. 59, no. 1, [26] A. Gupta and S. Dasgupta, “An elementary proof of the Johnson-
pp. 245–254, Nov. 2016. lindenstrauss Lemma,” Random Struct. Algorithms, vol. 22, no. 1,
[11] M. L. Mekhalfi et al., “Fast indoor scene description for blind people with pp. 60–65, 2002.
multiresolution random projections,” J. Vis. Commun. Image Representa- [27] P. Saurabh et al., “Random projections for linear support vector machines,”
tion, vol. 44, no. 100, pp. 95–105, Apr. 2017. ACM Trans. Knowl. Discov. Data, vol. 8, no. 4, pp. 1–25, 2014.
[12] J. Tang, C. Deng, and G. Huang, “Extreme learning machine for multi- [28] P. Saurabh et al., “Random projections for support vector machines,” in
layer perceptron,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 4, Proc. Artif. Intell. Statist., 2013, pp. 498–506.
pp. 809–821, May. 2015. [29] S. Karthik, S. Shalev-Shwartz, and N. Srebro, “Fast rates for regularized
[13] B. Zheng et al., “Computer-aided detection of breast masses depicting objectives,” in Proc. 21st Int. Conf. Neural Inf. Process. Syst., Dec. 2008,
on full-field digital mammograms: A performance assessment,” Brit. J. pp. 1545–1552.
Radiol., vol. 85, no. 1014, pp. e153–e161, Jun. 2012. [30] W. Gómez, W. C. A. Pereira, and A. F. C. Infantosi, “Analysis of co-
[14] M. Heidari et al., “Development and assessment of a new global mam- occurrence texture statistics as a function of gray-level quantization for
mographic image feature analysis scheme to predict likelihood of ma- classifying breast ultrasound,” IEEE Trans. Med. Imag., vol. 31, no. 10,
lignant cases,” IEEE Trans. Med. Imag., vol. 39, no. 4, pp. 1235–1244, pp. 1889–1899, Jun. 2012.
Apr. 2020. [31] M. J. Zdilla et al., “Circularity, solidity, axes of a best fit ellipse, aspect
[15] G. Danala et al., “Classification of breast masses using a computer- ratio, and roundness of the foramen ovale: A morphometric analysis
aided diagnosis scheme of contrast enhanced digital mammograms,” Ann. with neurosurgical considerations,” J. Craniofacial Surg., vol. 27, no. 1,
Biomed. Eng., vol. 46, no. 9, pp. 1419–1431, Sep. 2018. pp. 222–228, Jan. 2016.
[16] X. Wang et al., “An interactive system for computer-aided diagnosis of [32] P. Li, T. J. Hastie, and K. W. Church, “Very sparse random projections,” in
breast masses,” J. Digit. Imag., vol. 25, no. 5, pp. 570–579, Oct. 2012. Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, Aug. 2006,
[17] Y. Qiu et al., “A new approach to develop computer-aided diagnosis pp. 287–296.
scheme of breast mass classification using deep learning technology,” J. [33] F. Aghaei et al., “Applying a new quantitative global breast MRI feature
X-Ray Sci. Technol., vol. 25, no. 5, pp. 751–763, Jan. 2017. analysis scheme to assess tumor response to chemotherapy,” J. Magn.
[18] J. S. Weszka, C. R. Dyer, and A. Rosenfeld, “A comparative study Reson. Imag., vol. 44, no. 5, pp. 1099–1106, Nov. 2016.
of texture measures for terrain classification,” IEEE Trans. Syst., Man, [34] H. D. Nelson et al., “Factors associated with rates of false-positive and
Cybern., vol. 6, no. 4, pp. 269–285, Apr. 1976. false-negative results from digital mammography screening: An analysis
[19] R. M. Haralick, K. Shanmugam, and I. H. Dinstein, “Textural features of registry data,” Ann. Intern. Med., vol. 164, no. 4, pp. 226–235, Feb. 2016.
for image classification,” IEEE Trans. Syst., Man, Cybern., vol. 3, no. 6, [35] H. P. Chan, R. K. Samala, and L. M. Hadjiiski, “CAD and AI for breast
pp. 610–621, Nov. 1973. cancer – recent development and challenges,” Brit. J. Radiol., vol. 93,
[20] M. M. Galloway, “Texture classification using gray level run no. 1108, Dec. 2019, Art. no. 20190580.
length,” Comput. Graph. Image Process., vol. 4, no. 2, pp. 172–179, [36] C. Tao et al., “New one-step model of breast tumor locating based on deep
Jun. 1975. learning,” J. X-Ray Sci. Technol., vol. 27, no. 5, pp. 839–856, Oct. 2019.
[21] M. Z. Do Nascimento et al., “Classification of masses in mammographic [37] Y. Wang et al., “Computer-aided classification of mammographic masses
image using wavelet domain features and polynomial classifier,” Expert using visually sensitive image features,” J. X-Ray Sci. Technol., vol. 25,
Syst. Appl., vol. 40, no. 15, pp. 6213–6221, Nov. 2013. no. 1, pp. 171–186, Jan. 2017.
[22] N. R. Mudigonda, R. Rangayyan, and J. E. Leo Desautels, “Gradient and [38] X. Chen et al., “Applying a new quantitative image analysis scheme based
texture analysis for the classification of mammographic masses,” IEEE on global mammographic features to assist diagnosis of breast cancer,”
Trans. Med. Imag., vol. 19, no. 10, pp. 1032–1043, Oct. 2000. Comput. Methods Programs Biomed., vol. 179, Oct. 2019, Art. no. 104995.
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 05,2024 at 07:34:32 UTC from IEEE Xplore. Restrictions apply.