Jimaging 09 00265 v2
Jimaging 09 00265 v2
Imaging
Article
SCOLIONET: An Automated Scoliosis Cobb Angle
Quantification Using Enhanced X-ray Images and Deep
Learning Models
Renato R. Maaliw III
College of Engineering, Southern Luzon State University, Lucban 4328, Quezon, Philippines;
[email protected]
Abstract: The advancement of medical prognoses hinges on the delivery of timely and reliable
assessments. Conventional methods of assessments and diagnosis, often reliant on human expertise,
lead to inconsistencies due to professionals’ subjectivity, knowledge, and experience. To address these
problems head-on, we harnessed artificial intelligence’s power to introduce a transformative solution.
We leveraged convolutional neural networks to engineer our SCOLIONET architecture, which can
accurately identify Cobb angle measurements. Empirical testing on our pipeline demonstrated a
mean segmentation accuracy of 97.50% (Sorensen–Dice coefficient) and 96.30% (Intersection over
Union), indicating the model’s proficiency in outlining vertebrae. The level of quantification accuracy
was attributed to the state-of-the-art design of the atrous spatial pyramid pooling to better segment
images. We also compared physician’s manual evaluations against our machine driven measurements
to validate our approach’s practicality and reliability further. The results were remarkable, with a
p-value (t-test) of 0.1713 and an average acceptable deviation of 2.86 degrees, suggesting insignificant
difference between the two methods. Our work holds the premise of enabling medical practitioners
to expedite scoliosis examination swiftly and consistently in improving and advancing the quality of
patient care.
Keywords: atrous spatial pyramid pooling; computer vision; image enhancement; image processing;
machine learning; medical image analysis; segmentation; spatial Wiener filter
Citation: Maaliw, R.R., III.
SCOLIONET: An Automated
Scoliosis Cobb Angle Quantification
Using Enhanced X-ray Images and
Deep Learning Models. J. Imaging
1. Introduction
2023, 9, 265. https://doi.org/ The spine is the central pillar of the body’s structure that provides support, stability,
10.3390/jimaging9120265 and facilitates communication between various bodily systems. It comprises thirty-three
(33) individual vertebrae, divided into five areas such as the coccyx (CO), sacrum (SA),
Academic Editors: Gerardo Cazzato
and Francesca Arezzo
lumbar (LU), thoracic (TH), and cervical (CE) [1]. Scoliosis is a medical condition distin-
guished by an irregular spine curvature, either inborn or developed throughout life due
Received: 18 October 2023 to other underlying factors [2]. The condition, if left untreated, can profoundly impact
Revised: 20 November 2023 body posture, causing discomfort, pain, and in severe cases—paralysis. Moreover, it can
Accepted: 27 November 2023 affect cardiopulmonary function, compressing the lungs and ribcage, leading to breathing
Published: 30 November 2023
difficulties. Statistical figures report its prevalence from 510 to 5350 per 100,000 cases
globally, most commonly during adolescence [3]. Although the abnormality’s origin is un-
known and presents as mild conditions, some experience extreme situations, such as organ
Copyright: © 2023 by the author.
damage. The spine’s alignment should be straight (normal) and positioned centrally over
Licensee MDPI, Basel, Switzerland. the pelvis. On the other hand, scoliosis deviates from this norm, with a lateral curvature
This article is an open access article (left or right) that measures significantly more than 10 degrees [4]. Figure 1a compares the
distributed under the terms and two instances. Doctors measure each case’s degree angles based on magnetic resonance
conditions of the Creative Commons imaging (MRI), computerized tomography (CT) scans, and X-rays to classify its severity
Attribution (CC BY) license (https:// using the Cobb angle (CA). Proposed predominantly by orthopedic surgeon John Robert
creativecommons.org/licenses/by/ Cobb and adopted by the Scoliosis Research Society (SRS), the CA is derived from selecting
4.0/). the two most tilted vertebrae depicted in Figure 1b.
Figure
Figure 1. 1.
(a)(a) Normal
Normal and abnormal
and abnormal spine
spine and and (b)
(b) detailed detailedofdescription
description the method toofacquire
the method
the CA. to acqu
CA.
The malformation is classified as mild, moderate, or severe. Table 1 shows the catego-
rization per severity in terms of degrees.
The malformation is classified as mild, moderate, or severe. Table 1 shows the
gorization per severity
Table 1. CA measurements perin terms
severity of degrees.
(source: Cobb method, adapted from ref. [4]).
utilized K-Means clustering for curvature modeling and regression for a vertebra’s
corner detection; both required numerous preprocessing steps. Moura et al. (2016) [16]
proposed techniques to recognize vertebrae’s lateral boundaries, including spine iso-
lation. They removed other bone structures via progressive thresholding using tree
data. Okashi et al. (2017) [11] used mice X-ray images to automatically subdivide
and estimate curvatures with a three-stage process involving the Otsu algorithm,
grayscale morphology, and polynomial fitting to refine spinal edges. Although in-
novative, the main disadvantage lies in its execution complexity without precisely
measuring the CA. Mukherjee et al. (2014) [17] evaluated four denoising (bilateral,
nonlocal means, principal neighborhood, and block) filters to enhance radiograph
contrasts. Otsu thresholds and Hough transformation were employed for Canny edge
points and vertebra endplate line overlays. Another experiment [18] incorporated
scale-invariant features and support vector machines (SVM) for vertebral anterior cor-
ner tracking. The approach was promising. However, it was computationally intensive
due to the intricate operations causing sizable errors. For the last few years, convolu-
tional neural networks (CNNs) have been at the forefront of medical image processing
(MIP) [19–21]. Unlike traditional machine learning (ML), they do not rely on hand-
crafted features for training. In detail, this means that neural networks (NNs) can
intuitively extract and learn complex patterns with different levels of abstraction di-
rectly from the input data rather than requiring human experts to design attributes
manually [22–24]. Additionally, a NN is end-to-end trainable. With better performance,
they can autonomously optimize their layers for the task at hand, whether for object
detection, semantic segmentation (SS), or classification [25–27]. Modern biomedical
SS science has advanced significantly through the U-Net architecture [28]. In a nut-
shell, it uses unique encoder–decoder modules with a central bottleneck to capture
local and global features ideal for image fragmentation. Arif et al. (2017) [29] applied
different U-Net configurations to cluster cervical vertebrae with a Dice similarity co-
efficient (DSC) of 0.944 (shape aware) and 0.943 (standard). The outcome was a clear
benchmark improvement against active shape model (ASM) segmentation types with
ASM-G (0.774), ASM-RF (0.883), and ASM-M (0.877). A similar study was conducted
by [30] using anteroposterior (AP) X-ray images. Results showed that the Residual
U-Net (RU-Net) yielded better accuracy than the Dense U-Net (DU-Net), obtaining
0.951 DSC using RU-Net pitted against DU-Net’s 0.942. The use of non-standard deep
learning models for image processing is crucial as it enables the tailoring of models
to the unique requirements of specific domains, ensuring more accurate and practi-
cal solutions in various applications. It allows the customization of architectures to
capture feature-specific patterns to address the many challenges more effectively than
generic architectures. In resource-constrained environments, non-standard models
can be designed to optimize resources, making them more suitable for deployment
with computational efficiency [31]. As a contribution to data science advancement,
we proposed a non-standard pipeline (Figure 2) codenamed SCOLIONET, composed
of extensive preprocessing (cropping, color adjustments, and image enhancements)
and a robust modified segmentation architecture with a new atrous spatial pyramid
pooling (ASPP) structure to quantify CA accurately. Our initiative can contribute to
swift, consistent, and prompt scoliosis severity diagnosis.
J. Imaging 2023,
J. Imaging 9,9,x 265
2023, FOR PEER REVIEW 4 of 21 4 of 22
Figure2.2.The
Figure The automated
automated scoliosis
scoliosis CobbCobb
angle angle (CA) measurement’s
(CA) measurement’s comprehensive
comprehensive pipeline com
pipeline compris-
prising
ing preprocessing,
preprocessing, featurefeature extraction,
extraction, testing,
testing, and angleand angle estimation.
estimation.
2. Methodology
2. Methodology
This section presents comprehensive details of the procedures involved in automati-
This section
cally acquiring presents comprehensive details of the procedures involved in auto-
CA measurements.
matically acquiring CA measurements.
2.1. Data Collection
2.1. Data Collection
Our dataset contained 318 two-dimensional (2D) spinal X-ray scans, specifically in the
anterior–posterior
Our dataset (AP) view, showcasing
contained scoliosis. In grayscale
318 two-dimensional format,
(2D) spinal these
X-ray photos
scans, had
specifically in
various resolutions. We meticulously curated the samples from various public repositories
the anterior–posterior (AP) view, showcasing scoliosis. In grayscale format, these photos
without traces of personal information, in compliance with the state’s data privacy decrees
had various resolutions. We meticulously curated the samples from various public re-
(see data availability statement). The samples encompass visuals of lumbar and thoracic
positories
parts; withoutfor
a prerequisite traces of personal
the execution of theinformation, in compliance
designed processing procedureswith
withthe state’s data
ten vali-
privacy10%
dations, decrees (see
test sets (20),data availability
and 90% statement).
training sets The
(288). Every samples
dataset had itsencompass
correspondingvisuals of
lumbar and thoracic parts; a prerequisite for the execution of the designed
observed CA measurements annotated by experts for benchmarking of our deep learning processing
procedures
(DL) approachwith tenconventional
against validations,methods.
10% test sets (20), and 90% training sets (288). Every
dataset had its corresponding observed CA measurements annotated by experts for
2.2. Spinal Region Isolation
benchmarking of our deep learning (DL) approach against conventional methods.
Determining the focus area or the ROI (region of interest) is crucial in reaching our
intended outcome. This essential method significantly reduces and eliminates substantial
2.2. Spinal Region Isolation
noise, improving results. We trimmed the size to approximately thirty percent of its original
Determining
dimension, focusing the focus
on the area and
thoracic or the ROIvertebrae.
lumbar (region ofThese
interest)
parts,isbased
crucial in reaching our
on statistics,
intended outcome. This essential method significantly reduces and eliminates
were susceptible to scoliosis. To accomplish the tasks, we utilized an aggregated channel substantia
noise, (ACF)
feature improvingin LUVresults. We trimmed
mode, enabling the sizeof to
the extraction approximately
pixel-based features thirty
directlypercent
from of its
color channels and gradient magnitudes. Utilizing the scheme offers a clear advantage.
original dimension, focusing on the thoracic and lumbar vertebrae. These parts, based on
Primarily,
statistics,we
wereemployed an adaptive
susceptible boost, commonly
to scoliosis. known asthe
To accomplish AdBoost.
tasks,By weincorporating
utilized an aggre-
the classifier, we discerned the patterns associated with spinal images. As a final step, it
gated channel feature (ACF) in LUV mode, enabling the extraction of pixel-based fea
concludes with a cropping operation based on the ROI, isolating the locality for further
tures directly from color channels and gradient magnitudes. Utilizing the scheme offers a
analysis. This procedure not only optimizes computational efficiency by narrowing down
clear
the advantage.
spatial emphasis Primarily, we employed
but also ensures that the an adaptive boost,
segmentation processcommonly
was conductedknown in as Ad-
Boost. By incorporating the classifier, we discerned
relevant areas only. Figure 3 provides the flow of operation. the patterns associated with spina
images. As a final step, it concludes with a cropping operation based on the ROI, isolating
the locality for further analysis. This procedure not only optimizes computational effi-
ciency by narrowing down the spatial emphasis but also ensures that the segmentation
process was conducted in relevant areas only. Figure 3 provides the flow of operation.
J. Imaging 2023, 9, x FOR PEER REVIEW 5 of 22
J.J.Imaging
Imaging2023,
2023,9,9,265
x FOR PEER REVIEW 55 of 21
22
Figure 3. The process of spine region isolation using ACF and AdBoost to reduce the inputs’ di-
mension.
Figure3.3.The
Figure Theprocess
processof of spine
spine region
region isolation
isolation usingusing
ACF ACF and AdBoost
and AdBoost to reduce
to reduce the dimension.
the inputs’ inputs’ di-
mension.
2.3. Color Standardization and Image Enhancement
2.3. Color Standardization and Image Enhancement
Regarding image processing, one must recognize the significance of color shifting, as
2.3. Color Standardization
Regarding and Image
image processing, Enhancement
one must recognize the significance of color shifting, as it
it improves visual quality. It increases interpretability by emphasizing essential features
improves visual image
Regarding quality.processing,
It increasesone
interpretability
must recognize by emphasizing essential features and
and suppressing noise, enabling algorithms to analyzethe
and significance of color
examine images shifting,
more as
effec-
suppressing
it improves noise, enabling
visual quality. algorithms
It increases to analyze and by
interpretability examine images essential
emphasizing more effectively.
features
tively. In preparation for image enhancement, we refined the ideal color settings using
In
andpreparation
suppressing for imageenabling
enhancement, we refined the ideal
and color settings using specific
specific values: rednoise,
= 0.21, green =algorithms to analyze
0.59, and blue = 0.20. Figure examine images
4 illustrates themore effec-
compari-
values:
tively. red
In = 0.21, green
preparation =
for 0.59,
imageand blue = 0.20.
enhancement, Figure
we 4 illustrates
refined the the
ideal comparison
color of using
settings color
son of color channels.
channels.
specific values: red = 0.21, green = 0.59, and blue = 0.20. Figure 4 illustrates the compari-
son of color channels.
Figure 4. Prioritization of the green channel (b) had shown minor detail improvements.
Figure 4. Prioritizationanatomical
Distinguishing
Distinguishing of the greendifferences
anatomical channel (b) had
differences shown
in chest
in chest minorcan
X-rays
X-rays detail
beimprovements.
challenging due to the
overlapping structures
overlapping structures and intersecting details (e.g., bones and organs). We performed
imageDistinguishing
enhancement anatomical
procedures differences
to address in
this chest
issue X-rays
to can
increase
enhancement procedures to address this issue to increase contrasts caused be challenging
contrasts caused due to by
the
by noise
overlapping
and blurring structures
using a and
spatial intersecting
Weiner filterdetails
(SWF). (e.g., bones
Equation and
(1) organs).
express
noise and blurring using a spatial Weiner filter (SWF). Equation (1) express an original an We performed
original image
image
(OI)
imagea(x, enhancement
y), a(x,
(OI) containing procedures
noise or
y), containing tooraddress
blurring
noise n(x, y),this
blurring andissue
n(x, andtoaimage
a noisy
y), increase
noisy (NI)contrasts
imagez(x,
(NI) caused
y) [32].
z(x, by
y) [32].
noise and blurring using a spatial Weiner filter (SWF). Equation (1) express an original
image (OI) a(x, y), containing noise z( x, or = a( x, y)n(x,
y) blurring + ny),
( x, and
y) a noisy image (NI) z(x, y) [32]. (1)
J. Imaging 2023, 9, 265 6 of 21
The noise, which is assumed to be stationary, is described by a zero mean and variance
δn2 . Also, the noise is independent of the OI described by Equation (2) [32]:
o ( x, y) = ms + δs w( x, y) (2)
Localized entities ms and δs represent the mean and standard deviation in prox-
imity, with w( x, y) denoting zero-mean noise variance. The SWF efficiently minimizes
the mean squared error between the OI and the enhanced image ź( x, y) calculated from
Equation (3) [32]:
δ2
ź( x, y) = ms + 2 s 2 [ a( x, y) − ms ] (3)
δs + δr
At each pixel, ms and δs are updated using Equations (4)–(6), which estimate their
values based on NI [32].
i +e a+ f
1
m̂s ( x, y) =
(2e + 1)(2 f + 1) ∑ ∑ v(k, l ) (4)
k =i − e l = a − f
i +e a+ f
1
δ̂a2 ( x, y) =
(2e + 1)(2 f + 1) ∑ ∑ [v(k, l ) − m̂s ( x, y)]2 (5)
k =i − e l = a − f
n o o
δ̂s ( x, y) = max 0, δa2 ( x, y)− δr2 (6)
Afterward, Equation (3) integrates the substitutions of m̂s ( x, y) and δ̂s ( x, y) as part
of each iteration, leading to:
δ̂s2 ( x, y)
ô ( x, y) = m̂s ( x, y) + [z( x, y) − m̂s ( x, y)] (7)
δ̂s2 ( x, y) + δr2
Figure
Figure5.5.
The application
The of the
application SWF
of the revealed
SWF structural
revealed and finer
structural and enhancements (b,d) from
finer enhancements (b,d)colored
from colored
shifted images.
shifted images.
Figure6.6. Step-by-step
Figure Step-by-step procedures
procedures for
for spinal
spinal limit
limit detection
detection such
such as
as superimposition
superimposition(a),
(a),midline
midline
polynomial fitting (b), delineations (c), and boundary polynomial fit (d).
polynomial fitting (b), delineations (c), and boundary polynomial fit (d).
2.5. Initial
For the Vertebra Identification
exploration of the spine’s delineation points in a downward direction, we
utilized
Oncesmall
the widow
spine’s sections (12 × 5 px),
edges’ definition wastraversing (x) the7a),
in place (Figure identified CS depicted
we isolated the fore-in
Figure
ground6c. To ascertain
region displayedtheinspine’s
Figure boundaries, weunwanted
7b to eliminate selected the midpoints
anatomical with the This
structures. most
substantial intensity
critical process difference
facilitates thebetween window
preliminary frames.selection
vertebra The processby iteratively
using four examined
equal
all potentiallines
sub-spaced touch points (r)inalong
showcased Figure CS.toThe
the7c,d CS’s endpoint
generate matching
sets of values window was
and thresholds. In
reconstructed,
addition, we enabling sequential
noticed greater spinal limitwithin
luminosities detection.
the Avertebrae’s
4-degree polynomial fit on and
outer borders each
side assists the process
mathematically (Figure
represented 6d).
their For the identification
histogram projection (pt)ofusing
the edges, we (8)
Equation experimentally
[33]:
set the following hyperparameters: W = 12, H = 52, x = 36, r = 5, q = 12, and p = 11.
0, 𝑖𝑓 𝑝𝑡 (ℎ)>0
The facet of S’s calculation lies in the active involvement of adjacent disc pixels, with
the predominant presence of zero (0) values. By selecting significant ascending shifts in S,
the algorithm identified various reference points. Next, we configured an 18-bin
sub-histograms (non-overlapping) beginning from the lower boundary. The final verte-
bral ROI was enclosed by the contiguous straight lines shown in Figure 7e.
J. Imaging 2023, 9, 265 8 of 21
greater luminosities within the vertebrae’s outer borders and mathematically represented
their histogram projection (pt ) using Equation (8) [33]:
Z 0, i f pt (h)>0
f t (h) = (8)
1, opposite,
where h is the histogram value with a constant B for the histogram’s (pt ) bin dimension; the
summed histogram (S) is the subtotal of each feature ft illustrated in Equation (9) [33]:
n
S(h) = ∑ f t (h) (9)
i =1
The facet of S’s calculation lies in the active involvement of adjacent disc pixels, with
the predominant presence of zero (0) values. By selecting significant ascending shifts
in S, the algorithm identified various reference points. Next, we configured an 18-bin
J. Imaging 2023, 9, x FOR PEER REVIEW 9 of 22
sub-histograms (non-overlapping) beginning from the lower boundary. The final vertebral
ROI was enclosed by the contiguous straight lines shown in Figure 7e.
2.6. SCOLIONET’s
2.6. SCOLIONET’s Detailed
Detailed Core
Core Network
Network Architecture
Architecture
With the
With theinitial
initialindividual
individual vertebra
vertebradetermination
determination completed, a finer
completed, ROI was
a finer ROIextracted.
was ex-
It is worth highlighting that each vertebra’s intensity varies considerably
tracted. It is worth highlighting that each vertebra’s intensity varies considerablyin its AP projection.
in its
Theprojection.
AP lumbar portion exhibitsportion
The lumbar higher exhibits
intensity,higher
whereas the cervical
intensity, whereaspartthedisplays
cervicallower.
part
CNNs arelower.
displays robustCNNs
to different lighting
are robust to conditions as they can
different lighting effectively
conditions as extract
they can features from
effectively
images,features
extract making them
from less sensitive
images, to saturation
making them lessthan other techniques.
sensitive to saturationWethan
customized and
other tech-
tweakedWe
niques. a Standard
customized U-Net
andastweaked
a solution because itU-Net
a Standard was designed for general
as a solution because segmentation
it was de-
tasks and was unsuitable for our purpose after multiple tests. Our architecture is composed
signed for general segmentation tasks and was unsuitable for our purpose after multiple
of three parts (Figure 8). On the left side, a four-block (BLK) encoder takes the input
tests. Our architecture is composed of three parts (Figure 8). On the left side, a four-block
image while the next BLK encoder takes the input image. The next BLK receives the
(BLK) encoder takes the input image while the next BLK encoder takes the input image.
previous output subsampled at a lower rate. The first two convolution (CNV) layers have
The next BLK receives the previous output subsampled at a lower rate. The first two
thirty-two (32) feature maps (FM), while the third contains sixty-four (64) FM. Like its
convolution (CNV) layers have thirty-two (32) feature maps (FM), while the third con-
predecessor, the second unit follows a similar configuration. It features sixty-four (64) and
tains sixty-four (64) FM. Like its predecessor, the second unit follows a similar configu-
one-hundred-twenty-eight (128) FM across its layers. The third and fourth BLK seamlessly
ration. It features sixty-four (64) and one-hundred-twenty-eight (128) FM across its lay-
integrate two CNVs, doubling subsampled FM, reaching 128 and 256. At the end of each
ers. The third and fourth BLK seamlessly integrate two CNVs, doubling subsampled FM,
reaching 128 and 256. At the end of each BLK, maxpooling (MP) downsampled the spatial
dimension of FM, this helps reduce computational complexity and memory usage. It re-
tains the essential information from the original FM while helping the network to focus
on the most discriminative features and to discard spatial redundancies. At the onset of
J. Imaging 2023, 9, 265 9 of 21
BLK, maxpooling (MP) downsampled the spatial dimension of FM, this helps reduce
computational complexity and memory usage. It retains the essential information from the
original FM while helping the network to focus on the most discriminative features and
to discard spatial redundancies. At the onset of each MP, the network creates a copy as
a skip connection to ensure that high-resolution FM from the contracting path is passed
directly to the corresponding layer. Doing so retains the final spatial information lost during
downsampling, leading to segmentation precision. The decoder on the right follows the
same architecture as the encoder, concluding with a 2D upsampling technique as it needs
to restore the information produced by the segmentation mask. In Figure 8, the red arrow
is indispensable for reconstructing stored FM in the encoder block’s cluster layer. These
serve as the basis for comparison with the decoder’s oversampled outputs, increasing
J. Imaging 2023, 9, x FOR PEER REVIEW
its
10 of 22
ability to segment in great detail each vertebra through concatenation. A sigmoid activation
function and a 1 × 1 kernel convolution complete the process and the FM’s outcome.
Figure
Figure 8.
8. SCOLIONET’s
SCOLIONET’s architecture
architecture composed
composed ofof encoder,
encoder, bottleneck
bottleneck with
with atrous
atrous spatial
spatial pyramid
pyramid
pooling
pooling (ASPP),
(ASPP), and
and decoder
decoder for
for spine’s vertebra segmentation.
spine’s vertebra segmentation.
The bottleneck in
The bottleneck inthe
themiddle
middleis is a bridge
a bridge between
between the two
the two pathways
pathways with with
fewerfewer
chan-
channels
nels than than the encoder
the encoder and decoder.
and decoder. It contains
It contains a MP,aaMP,
3 × a3 3kernel
× 3 kernel
appliedapplied
to the to the
input
input to extract
to extract features,
features, and aand2 ×a 2 ×kernel
2 kernel with
with stride
stride extrapolated
extrapolated toto
2 2××22 regions
regions with a
pooling window
window with
with two
two pixels
pixels at each step, thus enriching
enriching the FM and helping the
network to learn intricate and abstract features. A rectified learning unit (ReLU) activa-
tion function helps introduce non-linearity for learning of complex relationships in data
while mitigating vanishing gradient problems. Batch normalization (BN) helps prevent
extensive activation and decrease covariate shifts to provide stability during the training.
J. Imaging 2023, 9, 265 10 of 21
network to learn intricate and abstract features. A rectified learning unit (ReLU) activation
function helps introduce non-linearity for learning of complex relationships in data while
mitigating vanishing gradient problems. Batch normalization (BN) helps prevent extensive
J. Imaging 2023, 9, x FOR PEER REVIEW
activation and decrease covariate shifts to provide stability during the training. As a prime
customization, we integrated atrous spatial pyramid pooling (ASPP) to improve training
speed, increase the receptive field, capture fine details, and strengthen the architecture’s
segmentation efficiency without added parameter overheads. Performance of our SCOL-
2.7. Atrous Spatial Pyramid Pooling (ASPP) Structure
IONET, U-Net, and RU-Net were rigorously compared. To assess the quantitative efficiency
Feature we
of segmentation, pyramid
performednetwork
a 10-fold(FPN) core strength
cross-validation. For the lies in its ability
hyperparameters, theto
fol-seamles
lowing were set based on trial and error: batch size = 12, epoch = 120, learning
grate semantic insights from low-resolution feature maps with the intricate spatia rate = 0.01,
dropout rate = 0.20, and an L2-norm loss function to improve segmentation further.
extracted from high-resolution feature maps. The combination of semantic info
from
2.7. lower-resolution
Atrous levels(ASPP)
Spatial Pyramid Pooling and spatial
Structure intricacies from higher-resolution levels
holistic
Featurerepresentation
pyramid network of the
(FPN) visual content.lies
core strength This synergy
in its ability tois seamlessly
significant in terms
inte-
detection
grate semantic and segmentation,
insights where objects
from low-resolution of interest
feature maps with thecan varyspatial
intricate significantly
details in siz
extracted from high-resolution feature maps. The combination
an image. By fusing these distinct types of information, FPN equips neural netwo of semantic information
from lower-resolution levels and spatial intricacies from higher-resolution levels creates a
a clear understanding of both the global context and fine-grained details prese
holistic representation of the visual content. This synergy is significant in terms of object
visual data.
detection It mimics the
and segmentation, where human
objects ofvisual,
interestwhere
can vary our brain effortlessly
significantly in size within integrate
and
an detailed
image. By fusinginformation
these distincttotypes
form of ainformation,
completeFPN perception.
equips neuralFPN, in its computatio
networks with
aallel,
clear understanding
enables machines of bothto theachieve
global context and fine-grained
a similar details present
level of perceptual in the
completeness.
visual data. It mimics the human visual, where our brain effortlessly integrates coarse and
more, FPN enhances the efficiency of information flow through the network, op
detailed information to form a complete perception. FPN, in its computational parallel,
computational
enables machines toresources and level
achieve a similar facilitating more
of perceptual accurate and
completeness. context-aware
Furthermore, FPN pre
This notthe
enhances only improves
efficiency the accuracy
of information of tasksthe
flow through such as object
network, recognition,
optimizing computa- but also
utes to the model’s ability to generalize well across diverse datasets. In this st
tional resources and facilitating more accurate and context-aware predictions. This not
only improvesan
proposed theFPNaccuracy of tasks such as object
by incorporating atrous recognition,
modulesbut also contributes
(AM) consistingtoof theatrous c
model’s ability to generalize well across diverse datasets. In this study, we proposed an
tion and an image pooling layer (Figure 9). The design philosophy behind these
FPN by incorporating atrous modules (AM) consisting of atrous convolution and an image
is akinlayer
pooling to creating
(Figure 9).a The
dynamic, multi-scale
design philosophy receptive
behind field byis allowing
these modules the neural
akin to creating
ato perceive
dynamic, and analyze
multi-scale receptivefeatures
field byat varying
allowing thegranularities.
neural networkThrough to perceivethisand integra
analyze features at varying granularities. Through this integration,
resultant feature representation becomes adaptive to discern complex structure the resultant feature
representation becomes adaptive to discern complex structures within the input data. The
the input data. The module is an intentional departure from conventional, unifo
module is an intentional departure from conventional, uniform convolutions, showcasing
volutions,
our commitment showcasing
to a robust our commitment
feature extraction. to a robust feature extraction.
Figure
Figure 9. The
9. The structure
structure of theof the proposed
proposed atrous(AM)
atrous modules modules (AM)
with the with
atrous the atrous
convolution layerconvolution
and
image pooling layer.
image pooling layer.
The primary objective is to leverage both spatial and semantic information more effec-
tively within the network architecture to enhance the model’s ability to comprehend and
interpret intricate patterns and structures within the data. We aim to strike a balance be-
tween capturing fined-grained spatial details and grasping the broader context of semantic
meaning. This is pivotal, especially in a task where understanding both the details and
the overarching semantics of the input data is essential. The implemented adjustments
ensure that spatial nuances are preserved and integrated cohesively, with semantic un-
derstanding at various levels of the feature hierarchy. Integrating the AMs into the FPN,
X1 to X4 undergoes a 1 × 1 convolution for optimizing them for the subsequent fusion
J. Imaging 2023, 9, x FOR PEER REVIEW
process. Next, C1 to C4 go through the AMs with atrous rates at 2, 4, and 6. When dealing 12 of 22
with relatively small scales, the use of rates becomes instrumental because a 3 × 3 filter
convolution can lose its efficiency and degenerate into a 1 × 1 filter if the rates become
larger. Understanding
low-level and optimizing
features, specifically from F1theseandatrous
F4, isrates are essential
achieve through for preserving
additional the
edges,
hierarchical
which information
are then subjected with feature maps.
to a summation The integration
process facilitated byof ahigh-level and low-level
1 × 1 convolution layer.
features, specifically from F1 and F4, is achieve through additional edges,
Importantly, this fusion is executed to increase the model’s capacity without inflating which are then
subjected to a summation process facilitated by a 1 × 1 convolution layer.
computational complexity. By using a 1 × 1 convolution layer, the fusion of features is Importantly,
this fusion is
performed executed
with to increase
a minimal the model’s
increase capacityofwithout
in the number inflating
parameters, computational
contributing to a
complexity. By using a 1 ×
streamlined and effective neural network architecture. Finally, C1 to C4 undergo awith
1 convolution layer, the fusion of features is performed 3 × 3a
minimal increase
convolution in the number
independently. Theofpurpose
parameters, contributing
of this operation to is atwofold:
streamlined
First,and effective
it serves to
neural network architecture. Finally, C1 to C4 undergo a 3 × 3 convolution independently.
remove the aliasing effect during upsampling, ensuring fidelity of features. Second, by
The purpose of this operation is twofold: First, it serves to remove the aliasing effect dur-
conducting these convolutions independently, we preserve the unique characteristics of
ing upsampling, ensuring fidelity of features. Second, by conducting these convolutions
each channel. Its design not only mitigates potential instability but also aligns with the
independently, we preserve the unique characteristics of each channel. Its design not only
principle of preserving essential information as it traverses through the network. Figure
mitigates potential instability but also aligns with the principle of preserving essential
10 shows our pyramid network.
information as it traverses through the network. Figure 10 shows our pyramid network.
2.8. Cobb
2.8. Cobb Angle
Angle Reference
Reference and
and Calculation
Calculation
After a fined-tuned segmentation,
After a fined-tuned segmentation, the thecontours
contourswere
wereextracted
extractedtotorepresent
representbounda-
bound-
ariesthrough
ries throughaabounding
boundingbox boxmethod
method(BBM).
(BBM).These
Theseboxes
boxeswere
were then
then stored
stored inin an
an array
array
(maximum x, minimum y, maximum y, and minimum y) illustrated
(maximum x, minimum y, maximum y, and minimum y) illustrated in Figure 11. Using in Figure 11. Using
the array’s
the array’s values,
values, the
the method
method identifies
identifies the
the lower
lower and
and upper
upper borders
borders of of the
the vertebrae.
vertebrae.
Then, the angles of the endplates (flat surfaces at the top and bottom of each
Then, the angles of the endplates (flat surfaces at the top and bottom of each vertebra) vertebra) were
re-stored in another array. These angles were significant in identifying tilted
were re-stored in another array. These angles were significant in identifying tilted verte- vertebrae as
they convey reference points for the analysis. The process was iterative by
brae as they convey reference points for the analysis. The process was iterative by com- comparing the
paring the adjacent endplate angle differences, ensuring the determination of the largest
angle. CA is then calculated using Equation (10) [33]:
𝑧𝑖 − 𝑧𝑗
𝐶𝐴 = max {|𝑡𝑎𝑛−1 ( )|} (10)
1 + 𝑧𝑖 − 𝑧𝑗
where Zi and Zj are the slopes of the upper and lower edges of the identified reference
vertebrae, respectively.
J. Imaging 2023, 9, 265 12 of 21
adjacent endplate angle differences, ensuring the determination of the largest angle. CA is
then calculated using Equation (10) [33]:
( !)
−1 zi − z j
CA = max tan (10)
1 + zi − z j
J. Imaging 2023, 9, x FOR PEER REVIEW 13 of 22
where Zi and Zj are the slopes of the upper and lower edges of the identified reference
vertebrae, respectively.
Figure11.
Figure 11.Border-box
Border-box methods
methods (BBM)
(BBM) for determining
for determining the tilted
the most mostendplates
tilted endplates of the referenced
of the referenced
upperand
upper andlower
lower vertebra’s
vertebra’s border.
border.
Another mathematical framework evaluation is the SDC. At its core, the SDC measures
the extent of the agreement by identifying the common elements between two objects by
dividing the size of the overlap score by the sum of the dimensions of two segmented
regions (Equation (13)). As an advantage, it is scale invariant, robust to class imbalances,
threshold independent, and highly interpretable.
2| GT ∩ PR|
SDC = (13)
| GT | + | PR|
Lastly, we used the MSE to quantify the squared differences between GT and PR at
the pixel level (Equation (14)). The MSE regards minor and major differences, penalizing
larger deviations and making the calculation sensitive to significant changes for spotting
subtle variations. Furthermore, it can offer consistency on a large-scale evaluation.
n
1
MSE =
n ∑ (GTi − PRi )2 (14)
i =1
where g1 and g2 are the means of two groups, n1 and n2 are the group’s sample sizes, with
S p as the pooled standard deviations.
With ai and xi the actual and predicted values, and N as the total number of samples.
3. Results
We conducted our experiment on a computer with an AMD Ryzen 9 5900X processor
(4.8 Ghz, 64M cache), 64 GB DDR4, and a GeForce RTX3080 graphics processing unit
(1.71 GHz, 20 GB). It also boasts a 1 TB NVMe solid-state drive (SSD) and a 4 TB hard disk
drive (HDD). Although there are two storage devices at our disposal, we opted to use the
SSD for its near instantaneous access times, leading to quicker loading of training images
and faster processing. Moreover, we used TensorFlow and other related libraries for the
deep learning network’s construction. Subsequent sections present the detailed results.
Table 3. Computing performance of different deep neural networks with reference to the number
of processors.
Memory constraints can significantly impact the efficiency of the training process. In
scenarios where models need to scale across distributed systems or edge devices, efficient
memory usage is important. We use the nvidia-smi command to retrieve vital information
from the NVIDIA graphics processing unit (GPU). It is imperative to note that, during
model inference, the memory consumption of the GPU can be accurately ascertained by
J. Imaging 2023, 9, 265 15 of 21
examining the results from the application programing interface (API). This serves as a
reliable indicator of the GPU’s memory usage throughout the inference process regarding
J. Imaging 2023, 9, x FOR PEER REVIEW 16 of
resource allocation and overall system performance monitoring. Table 4 depicts the memory
consumption of the selected models based on number of batches. The memory utilization
for Standard U-Net remains relatively stable across different sizes, ranging from 0.63 to
0.68. It suggestsathat
demonstrates the model
distinct doesIts
pattern. notmemory
exhibit sensitivity to changes
consumption in batchlow
is relatively size.atOn
a batch s
the other hand, the Residual U-Net, while starting with a slightly higher base
of 1 (0.72) and gradually increases. The model appears efficient in handling individ line of 0.72,
maintains consistency within a reasonable range with batch sizes of 0.66 to 0.78. However,
instances but experiences a proportional rise in memory demands as the batch size
there is a noticeable increase in memory usage as the batch grows, indicating that the model
pands.
is resource demanding with larger batches. SCOLIONET’s consumption is fairly similar
to the Residual U-Net at lower batches but demonstrates a distinct pattern. Its memory
Table 4. Computing
consumption performance
is relatively of different
low at a batch deepand
size of 1 (0.72) neural networks
gradually with The
increases. reference
model to mem
consumptions.
appears efficient in handling individual instances but experiences a proportional rise in
memory demands as the batch size expands.
Deep Neural Memory Consumption Based on Number of Batches
Table 4. ComputingModel
Network performance of different
1 deep neural
2 networks4with reference
8 to memory
16consumptions.
6432
Standard U-Net
Deep Neural
0.67 Consumption
Memory 0.63 0.67 0.67 of Batches
Based on Number 0.67 0.66 0.68
Residual
Network U-Net
Model 1 0.722 0.66
4 0.72
8 0.72
16 0.71
32 0.73
64 0.78
SCOLIONET
Standard U-Net 0.67 0.72
0.63 0.65
0.67 0.71
0.67 0.70
0.67 0.70
0.66 0.72
0.68 0.77
Note: eight active
Residual U-Net processor
0.72cores. 0.66 0.72 0.72 0.71 0.73 0.78
SCOLIONET 0.72 0.65 0.71 0.70 0.70 0.72 0.77
3.3. Segmentation Performance and Visual Confirmation Evaluation
Note: eight active processor cores.
Table 5 presents the evaluation results of SDC, IoU, and MSE obtained from
3.3.
CNN Segmentation Performance
segmentation and Visual
models’ Confirmation Evaluation
cross-validation. The results disclose that SCOLION
(0.975, 0.963, and 0.025) performed better than aand
Table 5 presents the evaluation results of SDC, IoU, MSE obtained
RU-Net (0.950, from theand
0.942, CNN0.030), a
segmentation models’ cross-validation. The results disclose that SCOLIONET
U-Net (0.941, 0.926, and 0.032) and in achieving overall vertebra segmentation (0.975, 0.963, accura
and 0.025) performed better than a RU-Net (0.950, 0.942, and 0.030), and U-Net (0.941, 0.926,
Moreover, Figure 12 complements and reinforces the quantitative measures by provid
and 0.032) and in achieving overall vertebra segmentation accuracy. Moreover, Figure 12
an excerpt ofand
complements thereinforces
visual representation
the quantitativeproduced
measures by byproviding
the threeanarchitectures
excerpt of thewith ref
ences against the GT.
visual representation produced by the three architectures with references against the GT.
Figure12.12.Visual
Figure Visual segmentation
segmentation excerpt
excerpt results
results of of models
the three the three models
showing showing SCOLIONET’s
SCOLIONET’s capability. ca
bility.
J. Imaging 2023, 9, 265 16 of 21
Table 5. Cross-validated segmentation performance of the three neural network architectures based on various metrics.
3.4. Cobb
J. Imaging 2023, 9, x FOR PEER REVIEW Angle Performance Evaluation 18 of 22
Table 6 demonstrates the detailed outcome of our deep learning approach versus
manual measurements. Notably, the t-test emphasized no significant differences between
the two
3.4. groups,
Cobb by a p-value = 0.8659 and a t-value (degree of freedom = 18) of
highlightedEvaluation
Angle Performance
0.1713. This convergence is evident through the MAPE of 3.86% or 96.13% accuracy. The
Table 6 demonstrates the detailed outcome of our deep learning approach versus
findings affirm our model’s close alignment with the actual observed annotated values,
manual measurements. Notably, the t-test emphasized no significant differences between
showcasing an impressive 2.86-degree discrepancy (very small). Although the angle
the two groups, highlighted by a p-value = 0.8659 and a t-value (degree of freedom = 18)
deviations
of 0.1713. are
Thissmall for minor
convergence scoliosis
is evident cases, the
through it can be observed
MAPE of 3.86% or that, as the
96.13% curvature
accuracy.
increases, it can become challenging for the algorithm to precisely identify
The findings affirm our model’s close alignment with the actual observed annotated the end vertebrae
that can be susceptible to errors [36], as can be seen on X-ray IDs 0203
values, showcasing an impressive 2.86-degree discrepancy (very small). Although the and 0253 with a
difference of a 2.20 degree angle from the actual values (inter- and intra-observer
angle deviations are small for minor scoliosis cases, it can be observed that, as the cur- variability).
Furthermore, an excerpt
vature increases, of thechallenging
it can become visual representation depicted
for the algorithm in Figure
to precisely 13 exhibits
identify the end our
artificial intelligence-based technique for various angles.
vertebrae that can be susceptible to errors [36], as can be seen on X-ray IDs 0203 and 0253
with a difference of a 2.20 degree angle from the actual values (inter- and intra-observer
Table 6. Comparative
variability). calculation
Furthermore, of Cobb
an excerpt angles
of the among
visual healthcare depicted
representation professionals (manual)
in Figure 13 vs.
SCOLIONET (automated).
exhibits our artificial intelligence-based technique for various angles.
Absolute Difference of Vertebral
Table
SCOLIONET’s Cobb Angle 6. Comparative calculation
Experts’s of (Observed)
Cobb Angle Cobb angles among healthcare professionals (manual) vs.
References (SCOLIONET vs. Expert)
X-ray SCOLIONET (automated).
ID Most Tilted Most Tilted Cobb Most Tilted Most Tilted Cobb Most Tilted Most Tilted
Cobb Angle
Upper Lower Angle Upper Lower Angle Upper Lower
Absolute Difference of Vertebral Degree
VertebraeSCOLIONET’s
VertebraeCobb Angle
Degree Experts’s Cobb
Vertebrae Angle (Observed)
Vertebrae Degree Vertebrae Vertebrae
References (SCOLIONET vs. Expert)
0021X-ray TH08
Most Tilted LU03 Tilted 23.80
Most TH08
Most Tilted LU03
Most Tilted 23.50 TH08—0
Most Tilted MostLU03—0
Tilted 0.30
ID Cobb Angle Cobb Angle Cobb Angle
0055 Upper
TH12 Lower
LU02 12.50 Upper
TH12 Lower
LU02 13.70 Upper
TH12—0 Lower
LU02—0 1.20
Degree Degree Degree
0071 Vertebrae
TH09 Vertebrae
LU04 13.60 Vertebrae
TH09 Vertebrae
LU03 15.20 Vertebrae
TH09—0 Vertebrae
LU04/LU03—1 1.60
0021 TH08 LU03 23.80 TH08 LU03 23.50 TH08—0 LU03—0 0.30
0085 TH05 TH11 25.30 TH05 TH11 26.20 TH05—0 TH11—0 0.90
0055 TH12 LU02 12.50 TH12 LU02 13.70 TH12—0 LU02—0 1.20
0103 0071 TH06 TH09 TH12
LU04 24.60
13.60 TH06
TH09 TH12
LU03 24.50
15.20 TH06—0
TH09—0 TH12—0
LU04/LU03—1 1.60 0.10
0123 0085 TH10 TH05 TH11
LU03 25.30
23.10 TH05
TH10 TH11
LU03 26.20
22.90 TH05—0
TH10—0 TH11—0
LU03—0 0.90 0.20
0203 0103 TH03 TH06 TH12
TH09 24.60
41.20 TH06
TH02 TH12
LU01 24.50
43.70 TH06—0
TH03/TH07—3 TH12—0
TH09/LU01—3 0.10 3.50
0123 TH10 LU03 23.10 TH10 LU03 22.90 TH10—0 LU03—0 0.20
0233 TH05 LU04 32.60 TH05 LU04 32.50 TH05—0 LU04—0 0.10
0203 TH03 TH09 41.20 TH02 LU01 43.70 TH03/TH07—3 TH09/LU01—3 3.50
0253 TH05 LU02 40.60 TH06 LU04 42.50 TH03/TH06—2 LU02/LU04—2 3.00
0233 TH05 LU04 32.60 TH05 LU04 32.50 TH05—0 LU04—0 0.10
0313 0253 TH06 TH05 TH12
LU02 16.70
40.60 TH06
TH06 TH12
LU04 17.20
42.50 TH06—0
TH03/TH06—2 LU02/LU04—2 TH12—0 3.00 0.50
Legend: 0313 TH06 TH12 16.70 TH06 TH12 17.20 (thoracic),
TH01–TH12 TH06—0 TH12—0
LU01–LU05 (lumbar) [1] 0.50
Legend:
T-test (SCOLIONET vs. Experts) TH01–TH12 (thoracic), LU01–LU05 (lumbar) [1]
t = 0.1713, p-value = 0.8659 (Not significant at p < 0.05)
T-test (SCOLIONET vs. Experts) t = 0.1713, p-value = 0.8659 (Not significant at p < 0.05)
MAPE (SCOLIONET vs. Experts) 3.86% (Accuracy = 96.13)
MAPE (SCOLIONET vs. Experts) 3.86% (Accuracy = 96.13)
Mean absolute
Mean difference
absolute of measurements
difference (SCOLIONET
of measurements vs. Experts)
(SCOLIONET vs. Experts) 2.86 degrees
2.86 degrees
Figure
Figure 13.13.Excerpt
ExcerptofofCobb
Cobbangle
angle measurements
measurements with
withcomputer
computergenerated reference
generated lines.
reference lines.
J. Imaging 2023, 9, 265 18 of 21
Architectures/Approaches/Models/Mechanisms Accuracy
Standard U-Net (with different configurations) [22] 88.01%
Patch-wise portioning + minimum bounding boxes [14] 88.60%
K-Means clustering + regression [15] 88.20%
Residual U-Net [23] 88.30%
Lateral boundary detection [16] 88.50%
3-Stage process (Otsu algorithm, morphology, and polynomial fitting) [11] 90.20%
Otsu thresholds + Hough transformations [17] 90.30%
Corner tracking + support vector machines [18] 90.40%
Dense U-Net [23] 94.20%
Residual U-Net (polynomial fitting + minimum border box) [24] 95.10%
SCOLIONET (spinal isolation via Adboost + color shifting + SWF +
polynomial fitting + bounding box method + modified U-Net with atrous 97.50%
spatial pyramid pooling)
4. Discussion
The findings of our research represent a noteworthy breakthrough, establishing the
capability of our method to automatically quantify CA, pivotal information for determin-
ing scoliosis severity. Based on experimental results, our SCOLIONET beats U-Net and
RU-Net with segmentation accuracies of 97.50% (SDC), 96.30% (IoU), and 0.025 (MSE). A
reported reduction in segmentation accuracy by 1.92%, or almost two percent, translates
to a marked improvement rate, especially considering the complexities and intricacies
often encountered in image processing. Empirical metrics also show a 95.86% accuracy
based on MAPE (3.86%). The insignificant difference between the AI-powered automated
method (t-test p-value = 0.8659) and the traditional technique validated the entire pipeline’s
robustness. Various factors collaboratively led to this outcome. First, the integration of color
shifting and SWF for image enhancement amplified the visual information inherent in the
raw X-ray without compromising intrinsic structural integrity. These added vibrancy and
J. Imaging 2023, 9, 265 19 of 21
depth to the images serve as crucial markers for deep learning. Second, modularizing the
procedures from spinal isolation, spinal edge detection, and vertebra segmentation dimin-
ished unnecessary overheads in the training and learning phase. Third, the customization
and inclusion of ASPP in the U-Net’s architecture increases segmentation accuracy and its
ability to capture multi-scale contextual information. It adds discriminative power to the
network to identify subtle or minute pixel classification in most spinal images. Lastly, our
examination unveiled that a network’s complexity (RU-Net) does not innately translate to
superior segmentation performance. This realization underscores a fundamental principle
in computer vision, that the model’s efficiency is tied to its alignment with the specific
demands of the application at hand. By drawing parallels between our findings and ex-
isting research on machine learning-based medical image diagnosis, we have collectively
made a conscious effort to build upon the knowledge amassed by our predecessors. We
also acknowledged the ongoing evolution of the field and positioned our work within the
continuum of advancements in automated medical image processing. Like most studies,
we confronted various challenges internal to scoliosis assessment.
While our research focused on moving the field forward, it is essential to divulge that
our efforts have limitations, mainly when dealing with images of highly deficient quality.
It is also important to note that our collected data are limited to a fully developed spine
with ages 16 and above, marking the end of significant longitudinal bone growth. In reality,
human vision is a highly dynamic sensory perception that understands depth, colors,
motion, complex patterns, and prior experiences. While computer vision is becoming
sophisticated, often focusing on object detection, image segmentation, or facial recognition.
These systems still lack the holistic and contextual understanding that human vision
possesses such as creativity that cannot be captured by predefined rules and patterns
created by algorithms. When human expertise is combined with machine knowledge
to aid in diagnosis, the challenges of ambiguity can be blurred. Additionally, patient’s
unbalanced positions and postures during image scans could introduce deviations in the
CA. We recognized that these uncertainties were common to image processing and not
directly addressed by our current approach. These avenues present valuable opportunities
for further research and innovation to achieve a more accurate measurement in a broader
range of clinical scenarios.
5. Conclusions
Scoliosis, a spinal abnormality, poses an expanse of health-related adversities, from
short- to long-term complications. It includes posture deformities, balance issues, degener-
ative diseases, and potential harm to internal organs. For this context, accurately gauging
the severity of spinal curvature is paramount for medical practitioners, serving as valuable
informational support for effective treatment planning. Historically, measuring the CA—a
key indicator of scoliosis—has been a painstaking mechanical process. It is prone to dis-
parities linked to subjective factors of a physician’s training, experience, and case-specific
expertise. The complications of manual vertebra segmentation and referencing, often reliant
solely on the human’s naked eye, introduce further difficulties. Moreover, the implicit noise
with X-ray images with the intersection of anatomical structures such as the ribs, lungs,
and heart exacerbates the complexity. To solve this predicament, we have devised a sys-
tematic end-to-end pipeline harnessing deep learning using our SCOLIONET’s customized
CNN architecture. Our innovative approach aims to automate the CA identification, thus,
addressing the inconsistencies of traditional methods. We capitalized on the capabilities of
AI in assisting human perception.
Our overall findings have been profound, showing notable consistency between ex-
perts and machine learning estimation, with an average difference of 2.86 degrees, a remark-
ably reliable value that significantly reduces the standard manual variations. In essence,
our established framework has made a vital contribution to machine-driven medical imag-
ing examination. The application of this research directly impacts the clinical sphere for
rapid, accurate, and reliable means of scoliosis severity evaluations. Consequently, our
J. Imaging 2023, 9, 265 20 of 21
discoveries are a stride in offering a simplified and robust approach to empower medical
personnel with the modern tools needed to understand scoliosis and enhance patient care
comprehensively.
As for future work, we are committed to refining our models’ precision through
numerous strategies, such as enhancing the segmentation algorithm and evaluating the
performance of other CNN networks. The authors also plan to implement federated
learning approaches, where models are trained locally on distributed devices, as this will
allow for collaborative model training without sharing of sensitive (private) patient data
and to explore collaboration with medical research institutions to obtain more datasets in
accordance with data sharing ethical practices.
References
1. Vertebrae Column. Available online: https://www.britannica.com/science/vertebra/ (accessed on 25 June 2023).
2. Labrom, F.; Izatt, M.; Claus, A.; Little, J. Adolescent idiopathic scoliosis 3D vertebral morphology, progression and nomenclature:
A current concepts and review. Eur. Spine J. 2023, 30, 1823–1834. [CrossRef] [PubMed]
3. McAviney, J.; Roberts, C.; Sullivan, B.; Alevras, A.; Graham, P.; Brown, B. The prevalence of adult de novo scoliosis: A systematic
review and meta-analysis. Eur. Spine J. 2020, 29, 2960–2969. [CrossRef] [PubMed]
4. Scoliosis Degrees of Curvature Chart. Scoliosis Reduction Center. Available online: https://www.scoliosisreductioncenter.com/
blog/scoliosis-degrees-of-curvature-chart/ (accessed on 5 July 2023).
5. Victoria, M.; Lau, H.; Lee, T.; Alarcon, D.; Zheng, Y. Comparison of ultrasound scanning for scoliosis assessment: Robotic versus
manual. Int. J. Med. Robot. Comput. Assist. Surg. 2022, 19, e2468. [CrossRef] [PubMed]
6. Sun, Y.; Xing, Y.; Zhao, Z.; Meng, X.; Xu, G.; Hai, Y. Comparison of manual versus automated measurement of cobb angle in
idiopathic scoliosis based on a deep learning keypoint detection technology. Eur. Spine J. 2021, 31, 1969–1978. [CrossRef]
7. Maaliw, R.; Soni, M.; Delos Santos, M.; De Veluz, M.; Lagrazon, P.; Seño, M.; Salvatierra-Bello, D.; Danganan, R. AWFCNET: An
attention-aware deep learning network with fusion classifier for breast cancer classification using enhanced mammograms. In
Proceedings of the IEEE World Artificial Intelligence and Internet of Things Congress (AIIoT), Seattle, DC, USA, 7–10 June 2023.
8. Pradhan, N.; Sagar, S.; Singh, A. Analysis of MRI image data for Alzheimer disease detection using deep learning techniques.
Multimed. Tools Appl. 2023, 1–24. [CrossRef]
9. Maaliw, R.; Mabunga, Z.; De Veluz, M.; Alon, A.; Lagman, A.; Garcia, M.; Lacatan, L.; Dellosa, R. An enhanced segmentation
and deep learning architecture for early diabetic retinopathy detection. In Proceedings of the IEEE 13th Annual Computing and
Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 8–11 March 2023.
10. Tu, Y.; Wang, N.; Tong, F.; Chen, H. Automatic measurement algorithm of scoliosis Cobb angle based on deep learning. J. Phys.
Conf. Ser. 2019, 1187, 042100. [CrossRef]
11. Okashi, O.; Du, H.; Al-Assam, H. Automatic spine curvature estimation from X-ray images of a mouse model. Comput. Methods
Programs Biomed. 2017, 140, 175–184. [CrossRef]
12. Alharbi, R.; Alshaye, M.; Alhanhal, M.; Alharbi, N.; Alzahrani, M.; Alrehaili, O. Deep learning based algorithm for automatic
scoliosis angle measurement. In Proceedings of the IEEE 3rd International Conference on Computer Applications & Information
Security (ICCAIS), Riyadh, Saudi Arabia, 19–21 March 2020.
13. Zhang, K.; Xu, N.; Yang, G.; Wu, J.; Fu, X. An automated Cobb angle estimation method using convolutional neural network
with area limitation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted
Intervention (MICCAI), Shenzhen, China, 13–17 October 2019.
J. Imaging 2023, 9, 265 21 of 21
14. Huang, C.; Tang, H.; Fan, W.; Cheung, K.; To, M.; Qian, Z.; Terzopoulos, D. Fully-automated analysis of scoliosis from spinal
X-ray images. In Proceedings of the IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester,
MN, USA, 28–30 July 2020.
15. Pasha, S.; Flynn, J. Data-driven classification of the 3d spinal curve in adolescent idiopathic scoliosis with applications in surgical
outcome prediction. Sci. Rep. 2018, 8, 16296. [CrossRef]
16. Moura, C.; Correia, M.; Barbosa, J.; Reis, A.; Laranjeira, M.; Gomes, E. Automatic Vertebra Detection in X-ray Images. In
Proceedings of the International Symposium CompImage, Coimbra, Portugal, 20–21 October 2016.
17. Mukherjee, J.; Kundu, R.; Chakrabarti, A. Variability of Cobb angle measurement from digital X-ray image based on different
de-noising techniques. Int. J. Biomed. Eng. Technol. 2014, 16, 113–134. [CrossRef]
18. Lecron, F.; Benjelloun, M.; Mahmoudi, S. Fully automatic vertebra detection in X-ray images based on multi-class SVM. In
Proceedings of the Medical Imaging, San Diego, CA, USA, 8–9 February 2012.
19. Maaliw, R.; Alon, A.; Lagman, A.; Garcia, M.; Susa, J.; Reyes, R.; Fernando-Raguro, M.; Hernandez, A. A multistage transfer
learning approach for acute lymphoblastic leukemia classification. In Proceedings of the IEEE 13th Annual Ubiquitous Computing,
Electronics & Mobile Communication Conference, New York, NY, USA, 26–29 October 2022.
20. Vijh, S.; Gaurav, P.; Pandey, H. Hybrid bio-inspired algorithm and convolutional neural network for automatic lung tumor
detection. Comput. Math. Methods Med. 2020, 35, 23711–23724. [CrossRef]
21. Peng, C.; Wu, M.; Liu, K. Multiple levels perceptual noise backed visual information fidelity for picture quality assessment.
In Proceedings of the IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS),
Penang, Malaysia, 22–25 November 2022.
22. Tsuneki, M. Deep Learning Models in Medical Image Analysis. J. Oral Biosci. 2022, 64, 312–320. [CrossRef]
23. Chakraborty, S.; Mali, K. An Overview of Biomedical Image Analysis from the Deep Learning Perspective; IGI Global: Hershey, PA, USA,
2023; pp. 43–59.
24. Abdou, M. Literature review: Efficient deep neural networks techniques for medical image analysis. Neural Comput. Appl. 2022,
34, 5791–5812. [CrossRef]
25. Varoquaux, G.; Cheplygina, V. Machine learning for medical imaging: Methodological failures and recommendations for the
future. NPJ Digit. Med. 2022, 5, 48. [CrossRef] [PubMed]
26. Aljabri, M.; AlGhamdi, M. A review on the use of deep learning for medical images segmentation. Neurocomputing 2022, 506,
311–335. [CrossRef]
27. Karpiel, I.; Zi˛ebiński, A.; Kluszczyński, M.; Feige, D. A survey of methods and technologies used for diagnosis of scoliosis. Sensors
2021, 21, 8410. [CrossRef] [PubMed]
28. Yin, X.; Sun, L.; Fu, Y.; Lu, R.; Zhang, Y. U-Net-Based Medical Image Segmentation. J. Healthc. Eng. 2022, 2022, 4189781. [CrossRef]
29. Arif, S.; Knapp, K.; Slabaugh, G. Shapeaware deep convolutional neural network for vertebrae segmentation. In Proceedings
of the International Workshop on Computational Methods and Clinical Applications in Musculoskeletal Imaging (MICCAI),
Quebec City, QC, Canada, 10 September 2017.
30. Zhang, J.; Li, H.; Lu, L.; Zhang, Y. Computer-aided Cobb measurement based on automatic detection of vertebral slope using
deep neural network. Int. J. Biomed. Imaging 2017, 2017, 9083916. [CrossRef]
31. Staritsyn, M.; Pogodaev, N.; Chertovshih, R.; Pereira, F. Feedback maximum principle for ensemble control of local continuity
equations: An application to supervised machine learning. IEEE Control Syst. Lett. 2021, 6, 1046–1051. [CrossRef]
32. Fan, W.; Ge, Z.; Wang, Y. Adaptive Weiner filter based on fast lifting wavelet transform for image enhancement. In Proceedings of
the 7th World Congress on Intelligent Control and Automation, Chongqing, China, 25–27 June 2008.
33. Horng, M.; Kuok, C.; Fu, M.; Lin, C.; Sun, Y. Cobb angle measurement of spine from X-ray images using convolutional neural
network. Comput. Math. Methods Med. 2019, 2019, 6357171. [CrossRef]
34. Prodan, M.; Vlasceanu, G.; Boiangiu, C. Comprehensive evaluation of metrics for image resemblance. J. Inf. Syst. Oper. Manag.
2023, 17, 161–185.
35. Ieremeiev, O.; Lukin, V.; Okarma, K.; Egiazarian, K. Full-Reference quality metric based on neural network to assess the visual
quality of remote sensing images. Remote Sens. 2020, 12, 2349. [CrossRef]
36. Aviles, J.; Medina, F.; Leon-Muñoz, V.; de Baranda, P.S.; Collazo-Diéguez, M.; Cabañero-Castillo, M.; Ponce-Garrido, A.B.;
Fuentes-Santos, V.E.; Santonja-Renedo, F.; González-Ballester, M.; et al. Validity and absolute reliability of the Cobb angle in
idiopathic scoliosis with TraumaMeter software. Int. J. Environ. Res. Public Health 2022, 19, 4655. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.