0% found this document useful (0 votes)
22 views21 pages

Jimaging 09 00265 v2

The article presents SCOLIONET, a deep learning architecture designed to automate the quantification of Cobb angles in scoliosis using enhanced X-ray images. The model achieved high segmentation accuracy of 97.50% and demonstrated comparable results to manual evaluations by physicians, suggesting its potential to improve the consistency and speed of scoliosis assessments. This advancement aims to enhance patient care by providing reliable and efficient diagnostic tools for medical practitioners.

Uploaded by

205542002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views21 pages

Jimaging 09 00265 v2

The article presents SCOLIONET, a deep learning architecture designed to automate the quantification of Cobb angles in scoliosis using enhanced X-ray images. The model achieved high segmentation accuracy of 97.50% and demonstrated comparable results to manual evaluations by physicians, suggesting its potential to improve the consistency and speed of scoliosis assessments. This advancement aims to enhance patient care by providing reliable and efficient diagnostic tools for medical practitioners.

Uploaded by

205542002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Journal of

Imaging

Article
SCOLIONET: An Automated Scoliosis Cobb Angle
Quantification Using Enhanced X-ray Images and Deep
Learning Models
Renato R. Maaliw III

College of Engineering, Southern Luzon State University, Lucban 4328, Quezon, Philippines;
[email protected]

Abstract: The advancement of medical prognoses hinges on the delivery of timely and reliable
assessments. Conventional methods of assessments and diagnosis, often reliant on human expertise,
lead to inconsistencies due to professionals’ subjectivity, knowledge, and experience. To address these
problems head-on, we harnessed artificial intelligence’s power to introduce a transformative solution.
We leveraged convolutional neural networks to engineer our SCOLIONET architecture, which can
accurately identify Cobb angle measurements. Empirical testing on our pipeline demonstrated a
mean segmentation accuracy of 97.50% (Sorensen–Dice coefficient) and 96.30% (Intersection over
Union), indicating the model’s proficiency in outlining vertebrae. The level of quantification accuracy
was attributed to the state-of-the-art design of the atrous spatial pyramid pooling to better segment
images. We also compared physician’s manual evaluations against our machine driven measurements
to validate our approach’s practicality and reliability further. The results were remarkable, with a
p-value (t-test) of 0.1713 and an average acceptable deviation of 2.86 degrees, suggesting insignificant
difference between the two methods. Our work holds the premise of enabling medical practitioners
to expedite scoliosis examination swiftly and consistently in improving and advancing the quality of
patient care.

Keywords: atrous spatial pyramid pooling; computer vision; image enhancement; image processing;
machine learning; medical image analysis; segmentation; spatial Wiener filter
Citation: Maaliw, R.R., III.
SCOLIONET: An Automated
Scoliosis Cobb Angle Quantification
Using Enhanced X-ray Images and
Deep Learning Models. J. Imaging
1. Introduction
2023, 9, 265. https://doi.org/ The spine is the central pillar of the body’s structure that provides support, stability,
10.3390/jimaging9120265 and facilitates communication between various bodily systems. It comprises thirty-three
(33) individual vertebrae, divided into five areas such as the coccyx (CO), sacrum (SA),
Academic Editors: Gerardo Cazzato
and Francesca Arezzo
lumbar (LU), thoracic (TH), and cervical (CE) [1]. Scoliosis is a medical condition distin-
guished by an irregular spine curvature, either inborn or developed throughout life due
Received: 18 October 2023 to other underlying factors [2]. The condition, if left untreated, can profoundly impact
Revised: 20 November 2023 body posture, causing discomfort, pain, and in severe cases—paralysis. Moreover, it can
Accepted: 27 November 2023 affect cardiopulmonary function, compressing the lungs and ribcage, leading to breathing
Published: 30 November 2023
difficulties. Statistical figures report its prevalence from 510 to 5350 per 100,000 cases
globally, most commonly during adolescence [3]. Although the abnormality’s origin is un-
known and presents as mild conditions, some experience extreme situations, such as organ
Copyright: © 2023 by the author.
damage. The spine’s alignment should be straight (normal) and positioned centrally over
Licensee MDPI, Basel, Switzerland. the pelvis. On the other hand, scoliosis deviates from this norm, with a lateral curvature
This article is an open access article (left or right) that measures significantly more than 10 degrees [4]. Figure 1a compares the
distributed under the terms and two instances. Doctors measure each case’s degree angles based on magnetic resonance
conditions of the Creative Commons imaging (MRI), computerized tomography (CT) scans, and X-rays to classify its severity
Attribution (CC BY) license (https:// using the Cobb angle (CA). Proposed predominantly by orthopedic surgeon John Robert
creativecommons.org/licenses/by/ Cobb and adopted by the Scoliosis Research Society (SRS), the CA is derived from selecting
4.0/). the two most tilted vertebrae depicted in Figure 1b.

J. Imaging 2023, 9, 265. https://doi.org/10.3390/jimaging9120265 https://www.mdpi.com/journal/jimaging


J. Imaging 2023, 9, x FOR PEER REVIEW 2

J. Imaging 2023, 9, 265 2 of 21


search Society (SRS), the CA is derived from selecting the two most tilted vertebrae dep
in Figure 1b.

Figure
Figure 1. 1.
(a)(a) Normal
Normal and abnormal
and abnormal spine
spine and and (b)
(b) detailed detailedofdescription
description the method toofacquire
the method
the CA. to acqu
CA.
The malformation is classified as mild, moderate, or severe. Table 1 shows the catego-
rization per severity in terms of degrees.
The malformation is classified as mild, moderate, or severe. Table 1 shows the
gorization per severity
Table 1. CA measurements perin terms
severity of degrees.
(source: Cobb method, adapted from ref. [4]).

Table 1. CANo. Angle in Degrees


measurements per severity Spine Class
(source: Cobb method, adapted from ref. [4]).
1 0–10 Normal
2 No. Angle in Degrees
10–20 Spine Class
Mild
3 20–40 Moderate
4 1 >40 0–10 Normal
Severe
2 10–20 Mild
3 method of analyzing images
A conventional 20–40 Moderate
to diagnose scoliosis involves ex-
tracting specific 4features from the scan, which is
>40
tedious and labor intensive. Severe
Accu-
rately determining the extent of curvature through mechanical measurements can
be challenging for physicians due to individual anatomical differences. In addition,
A conventional method of analyzing images to diagnose scoliosis involves ex
traditional methods are subject to human error, leading to result inconsistencies [5].
ing specific features from the scan, which is tedious and labor intensive. Accurate
Low-quality images and variations in patient’s posture and positioning during scans
termining
influence the extent
scoliosis of curvature
assessments. through
Previous articlesmechanical measurements
revealed three- to nine-degree can be challe
devi-
for physicians
ations due toprofessionals
among medical individual using
anatomical
manualdifferences.
methods [6].InCurrently,
addition,artificial
traditional me
intelligence
are subject(AI) revolutionizes
to human how weto
error, leading perceive and harness information
result inconsistencies [7–9] to image
[5]. Low-quality
improve
variationsimage
in processing through and
patient’s posture its power to analyze
positioning vast amounts
during of data. It can
scans influence scoliosis a
assist in deciphering patterns for early and efficient diagnosis that proved difficult
ments. Previous articles revealed three- to nine-degree deviations among medica
for humans to identify, paving the way for healthcare opportunities and discoveries.
fessionals
Despite using
various manual for
strategies methods
vertebral[6].clustering
Currently, artificial
and scoliosisintelligence
measurement, (AI)the
revolutio
how wedomains
research perceiveareand
stillharness information
in the early [7–9] to improve
stages of development basedimage
on theprocessing
literature. throu
power
Most to analyze
solutions vast amounts
are manual of data. It can
[10], patched-based, assist
have in deciphering
parameter patterns
limitations [11,12],for earl
and do not consider each vertebra [13], thus losing crucial contexts.
efficient diagnosis that proved difficult for humans to identify, paving Huang et al. the wa
(2020) [14] published an article that utilized patch-wise portioning using minimum
healthcare opportunities and discoveries. Despite various strategies for vertebral
bounding boxes, allowing precise isolation. The study by Pasha et al. (2018) [15]
tering and scoliosis measurement, the research domains are still in the early stag
development based on the literature. Most solutions are manual [10], patched-b
have parameter limitations [11,12], and do not consider each vertebra [13], thus l
crucial contexts. Huang et al. (2020) [14] published an article that utilized patch
J. Imaging 2023, 9, 265 3 of 21

utilized K-Means clustering for curvature modeling and regression for a vertebra’s
corner detection; both required numerous preprocessing steps. Moura et al. (2016) [16]
proposed techniques to recognize vertebrae’s lateral boundaries, including spine iso-
lation. They removed other bone structures via progressive thresholding using tree
data. Okashi et al. (2017) [11] used mice X-ray images to automatically subdivide
and estimate curvatures with a three-stage process involving the Otsu algorithm,
grayscale morphology, and polynomial fitting to refine spinal edges. Although in-
novative, the main disadvantage lies in its execution complexity without precisely
measuring the CA. Mukherjee et al. (2014) [17] evaluated four denoising (bilateral,
nonlocal means, principal neighborhood, and block) filters to enhance radiograph
contrasts. Otsu thresholds and Hough transformation were employed for Canny edge
points and vertebra endplate line overlays. Another experiment [18] incorporated
scale-invariant features and support vector machines (SVM) for vertebral anterior cor-
ner tracking. The approach was promising. However, it was computationally intensive
due to the intricate operations causing sizable errors. For the last few years, convolu-
tional neural networks (CNNs) have been at the forefront of medical image processing
(MIP) [19–21]. Unlike traditional machine learning (ML), they do not rely on hand-
crafted features for training. In detail, this means that neural networks (NNs) can
intuitively extract and learn complex patterns with different levels of abstraction di-
rectly from the input data rather than requiring human experts to design attributes
manually [22–24]. Additionally, a NN is end-to-end trainable. With better performance,
they can autonomously optimize their layers for the task at hand, whether for object
detection, semantic segmentation (SS), or classification [25–27]. Modern biomedical
SS science has advanced significantly through the U-Net architecture [28]. In a nut-
shell, it uses unique encoder–decoder modules with a central bottleneck to capture
local and global features ideal for image fragmentation. Arif et al. (2017) [29] applied
different U-Net configurations to cluster cervical vertebrae with a Dice similarity co-
efficient (DSC) of 0.944 (shape aware) and 0.943 (standard). The outcome was a clear
benchmark improvement against active shape model (ASM) segmentation types with
ASM-G (0.774), ASM-RF (0.883), and ASM-M (0.877). A similar study was conducted
by [30] using anteroposterior (AP) X-ray images. Results showed that the Residual
U-Net (RU-Net) yielded better accuracy than the Dense U-Net (DU-Net), obtaining
0.951 DSC using RU-Net pitted against DU-Net’s 0.942. The use of non-standard deep
learning models for image processing is crucial as it enables the tailoring of models
to the unique requirements of specific domains, ensuring more accurate and practi-
cal solutions in various applications. It allows the customization of architectures to
capture feature-specific patterns to address the many challenges more effectively than
generic architectures. In resource-constrained environments, non-standard models
can be designed to optimize resources, making them more suitable for deployment
with computational efficiency [31]. As a contribution to data science advancement,
we proposed a non-standard pipeline (Figure 2) codenamed SCOLIONET, composed
of extensive preprocessing (cropping, color adjustments, and image enhancements)
and a robust modified segmentation architecture with a new atrous spatial pyramid
pooling (ASPP) structure to quantify CA accurately. Our initiative can contribute to
swift, consistent, and prompt scoliosis severity diagnosis.
J. Imaging 2023,
J. Imaging 9,9,x 265
2023, FOR PEER REVIEW 4 of 21 4 of 22

Figure2.2.The
Figure The automated
automated scoliosis
scoliosis CobbCobb
angle angle (CA) measurement’s
(CA) measurement’s comprehensive
comprehensive pipeline com
pipeline compris-
prising
ing preprocessing,
preprocessing, featurefeature extraction,
extraction, testing,
testing, and angleand angle estimation.
estimation.

2. Methodology
2. Methodology
This section presents comprehensive details of the procedures involved in automati-
This section
cally acquiring presents comprehensive details of the procedures involved in auto-
CA measurements.
matically acquiring CA measurements.
2.1. Data Collection
2.1. Data Collection
Our dataset contained 318 two-dimensional (2D) spinal X-ray scans, specifically in the
anterior–posterior
Our dataset (AP) view, showcasing
contained scoliosis. In grayscale
318 two-dimensional format,
(2D) spinal these
X-ray photos
scans, had
specifically in
various resolutions. We meticulously curated the samples from various public repositories
the anterior–posterior (AP) view, showcasing scoliosis. In grayscale format, these photos
without traces of personal information, in compliance with the state’s data privacy decrees
had various resolutions. We meticulously curated the samples from various public re-
(see data availability statement). The samples encompass visuals of lumbar and thoracic
positories
parts; withoutfor
a prerequisite traces of personal
the execution of theinformation, in compliance
designed processing procedureswith
withthe state’s data
ten vali-
privacy10%
dations, decrees (see
test sets (20),data availability
and 90% statement).
training sets The
(288). Every samples
dataset had itsencompass
correspondingvisuals of
lumbar and thoracic parts; a prerequisite for the execution of the designed
observed CA measurements annotated by experts for benchmarking of our deep learning processing
procedures
(DL) approachwith tenconventional
against validations,methods.
10% test sets (20), and 90% training sets (288). Every
dataset had its corresponding observed CA measurements annotated by experts for
2.2. Spinal Region Isolation
benchmarking of our deep learning (DL) approach against conventional methods.
Determining the focus area or the ROI (region of interest) is crucial in reaching our
intended outcome. This essential method significantly reduces and eliminates substantial
2.2. Spinal Region Isolation
noise, improving results. We trimmed the size to approximately thirty percent of its original
Determining
dimension, focusing the focus
on the area and
thoracic or the ROIvertebrae.
lumbar (region ofThese
interest)
parts,isbased
crucial in reaching our
on statistics,
intended outcome. This essential method significantly reduces and eliminates
were susceptible to scoliosis. To accomplish the tasks, we utilized an aggregated channel substantia
noise, (ACF)
feature improvingin LUVresults. We trimmed
mode, enabling the sizeof to
the extraction approximately
pixel-based features thirty
directlypercent
from of its
color channels and gradient magnitudes. Utilizing the scheme offers a clear advantage.
original dimension, focusing on the thoracic and lumbar vertebrae. These parts, based on
Primarily,
statistics,we
wereemployed an adaptive
susceptible boost, commonly
to scoliosis. known asthe
To accomplish AdBoost.
tasks,By weincorporating
utilized an aggre-
the classifier, we discerned the patterns associated with spinal images. As a final step, it
gated channel feature (ACF) in LUV mode, enabling the extraction of pixel-based fea
concludes with a cropping operation based on the ROI, isolating the locality for further
tures directly from color channels and gradient magnitudes. Utilizing the scheme offers a
analysis. This procedure not only optimizes computational efficiency by narrowing down
clear
the advantage.
spatial emphasis Primarily, we employed
but also ensures that the an adaptive boost,
segmentation processcommonly
was conductedknown in as Ad-
Boost. By incorporating the classifier, we discerned
relevant areas only. Figure 3 provides the flow of operation. the patterns associated with spina
images. As a final step, it concludes with a cropping operation based on the ROI, isolating
the locality for further analysis. This procedure not only optimizes computational effi-
ciency by narrowing down the spatial emphasis but also ensures that the segmentation
process was conducted in relevant areas only. Figure 3 provides the flow of operation.
J. Imaging 2023, 9, x FOR PEER REVIEW 5 of 22

J.J.Imaging
Imaging2023,
2023,9,9,265
x FOR PEER REVIEW 55 of 21
22

Figure 3. The process of spine region isolation using ACF and AdBoost to reduce the inputs’ di-
mension.
Figure3.3.The
Figure Theprocess
processof of spine
spine region
region isolation
isolation usingusing
ACF ACF and AdBoost
and AdBoost to reduce
to reduce the dimension.
the inputs’ inputs’ di-
mension.
2.3. Color Standardization and Image Enhancement
2.3. Color Standardization and Image Enhancement
Regarding image processing, one must recognize the significance of color shifting, as
2.3. Color Standardization
Regarding and Image
image processing, Enhancement
one must recognize the significance of color shifting, as it
it improves visual quality. It increases interpretability by emphasizing essential features
improves visual image
Regarding quality.processing,
It increasesone
interpretability
must recognize by emphasizing essential features and
and suppressing noise, enabling algorithms to analyzethe
and significance of color
examine images shifting,
more as
effec-
suppressing
it improves noise, enabling
visual quality. algorithms
It increases to analyze and by
interpretability examine images essential
emphasizing more effectively.
features
tively. In preparation for image enhancement, we refined the ideal color settings using
In
andpreparation
suppressing for imageenabling
enhancement, we refined the ideal
and color settings using specific
specific values: rednoise,
= 0.21, green =algorithms to analyze
0.59, and blue = 0.20. Figure examine images
4 illustrates themore effec-
compari-
values:
tively. red
In = 0.21, green
preparation =
for 0.59,
imageand blue = 0.20.
enhancement, Figure
we 4 illustrates
refined the the
ideal comparison
color of using
settings color
son of color channels.
channels.
specific values: red = 0.21, green = 0.59, and blue = 0.20. Figure 4 illustrates the compari-
son of color channels.

Figure 4. Prioritization of the green channel (b) had shown minor detail improvements.

Figure 4. Prioritizationanatomical
Distinguishing
Distinguishing of the greendifferences
anatomical channel (b) had
differences shown
in chest
in chest minorcan
X-rays
X-rays detail
beimprovements.
challenging due to the
overlapping structures
overlapping structures and intersecting details (e.g., bones and organs). We performed
imageDistinguishing
enhancement anatomical
procedures differences
to address in
this chest
issue X-rays
to can
increase
enhancement procedures to address this issue to increase contrasts caused be challenging
contrasts caused due to by
the
by noise
overlapping
and blurring structures
using a and
spatial intersecting
Weiner filterdetails
(SWF). (e.g., bones
Equation and
(1) organs).
express
noise and blurring using a spatial Weiner filter (SWF). Equation (1) express an original an We performed
original image
image
(OI)
imagea(x, enhancement
y), a(x,
(OI) containing procedures
noise or
y), containing tooraddress
blurring
noise n(x, y),this
blurring andissue
n(x, andtoaimage
a noisy
y), increase
noisy (NI)contrasts
imagez(x,
(NI) caused
y) [32].
z(x, by
y) [32].
noise and blurring using a spatial Weiner filter (SWF). Equation (1) express an original
image (OI) a(x, y), containing noise z( x, or = a( x, y)n(x,
y) blurring + ny),
( x, and
y) a noisy image (NI) z(x, y) [32]. (1)
J. Imaging 2023, 9, 265 6 of 21

The noise, which is assumed to be stationary, is described by a zero mean and variance
δn2 . Also, the noise is independent of the OI described by Equation (2) [32]:

o ( x, y) = ms + δs w( x, y) (2)

Localized entities ms and δs represent the mean and standard deviation in prox-
imity, with w( x, y) denoting zero-mean noise variance. The SWF efficiently minimizes
the mean squared error between the OI and the enhanced image ź( x, y) calculated from
Equation (3) [32]:
δ2
ź( x, y) = ms + 2 s 2 [ a( x, y) − ms ] (3)
δs + δr
At each pixel, ms and δs are updated using Equations (4)–(6), which estimate their
values based on NI [32].
i +e a+ f
1
m̂s ( x, y) =
(2e + 1)(2 f + 1) ∑ ∑ v(k, l ) (4)
k =i − e l = a − f

i +e a+ f
1
δ̂a2 ( x, y) =
(2e + 1)(2 f + 1) ∑ ∑ [v(k, l ) − m̂s ( x, y)]2 (5)
k =i − e l = a − f
n o o
δ̂s ( x, y) = max 0, δa2 ( x, y)− δr2 (6)

Afterward, Equation (3) integrates the substitutions of m̂s ( x, y) and δ̂s ( x, y) as part
of each iteration, leading to:
δ̂s2 ( x, y)
ô ( x, y) = m̂s ( x, y) + [z( x, y) − m̂s ( x, y)] (7)
δ̂s2 ( x, y) + δr2

J. Imaging 2023, 9, x FOR PEER REVIEW


In the end, we established a fixed filter size of 3 × 3 (from 2e + 1 and 2f +71)ofbased
22
on
experimentation. Noteworthy improvements in structural spinal details are displayed in
Figure 5. It presents a visual testament to the enhanced images used for model training.

Figure
Figure5.5.
The application
The of the
application SWF
of the revealed
SWF structural
revealed and finer
structural and enhancements (b,d) from
finer enhancements (b,d)colored
from colored
shifted images.
shifted images.

2.4. Spinal Boundary Detection


We used the ROI image for accurate vertebra localization upon successful spinal re-
gion isolation. Intensity values capture the brightness of each pixel. We extract specific
color information to enhance edge detection, where distinct color features characterize
J. Imaging 2023, 9, 265 7 of 21

2.4. Spinal Boundary Detection


We used the ROI image for accurate vertebra localization upon successful spinal
region isolation. Intensity values capture the brightness of each pixel. We extract specific
color information to enhance edge detection, where distinct color features characterize
different anatomical structures. The initial stage involves crafting a specific vertebral section
serving as the center line (CS). Rectangular windows, defined by height (H) and width (W),
were overlaid horizontally in one-pixel increments from the apex of the spine [33]. The
computation involved determining the luminosity within each rectangle to establish the CS
reference point shown in Figure 6a. Subsequently, we initiate a downward displacement of
the rectangular frame that exhibits the highest intensity, changing its position by a pixel. We
aimed to pinpoint explicit pixels on each side, using distance (q). By conducting iterative
J. Imaging 2023, 9, x FOR PEER REVIEW 8 of 22
operations, we were able to identify multiple reference points (r). This r was then fitted
into CS via polynomial fitting, as illustrated in Figure 6b.

Figure6.6. Step-by-step
Figure Step-by-step procedures
procedures for
for spinal
spinal limit
limit detection
detection such
such as
as superimposition
superimposition(a),
(a),midline
midline
polynomial fitting (b), delineations (c), and boundary polynomial fit (d).
polynomial fitting (b), delineations (c), and boundary polynomial fit (d).

2.5. Initial
For the Vertebra Identification
exploration of the spine’s delineation points in a downward direction, we
utilized
Oncesmall
the widow
spine’s sections (12 × 5 px),
edges’ definition wastraversing (x) the7a),
in place (Figure identified CS depicted
we isolated the fore-in
Figure
ground6c. To ascertain
region displayedtheinspine’s
Figure boundaries, weunwanted
7b to eliminate selected the midpoints
anatomical with the This
structures. most
substantial intensity
critical process difference
facilitates thebetween window
preliminary frames.selection
vertebra The processby iteratively
using four examined
equal
all potentiallines
sub-spaced touch points (r)inalong
showcased Figure CS.toThe
the7c,d CS’s endpoint
generate matching
sets of values window was
and thresholds. In
reconstructed,
addition, we enabling sequential
noticed greater spinal limitwithin
luminosities detection.
the Avertebrae’s
4-degree polynomial fit on and
outer borders each
side assists the process
mathematically (Figure
represented 6d).
their For the identification
histogram projection (pt)ofusing
the edges, we (8)
Equation experimentally
[33]:
set the following hyperparameters: W = 12, H = 52, x = 36, r = 5, q = 12, and p = 11.
0, 𝑖𝑓 𝑝𝑡 (ℎ)>0

2.5. Initial Vertebra Identification 𝑓𝑡 (ℎ) = ∫ 𝑥 (8)


1, 𝑜𝑝𝑝𝑜𝑠𝑖𝑡𝑒,
Once the spine’s edges’ definition was in place (Figure 7a), we isolated the foreground
where h is the histogram
region displayed in Figurevalue
7b towith a constant
eliminate B for the
unwanted histogram’s
anatomical (pt) bin dimension;
structures. This critical
the summed histogram (S) is the subtotal of each feature ft illustrated in Equation (9) [33]:
process facilitates the preliminary vertebra selection by using four equal sub-spaced lines
showcased in Figure 7c,d to generate sets of values
𝑛 and thresholds. In addition, we noticed
𝑆(ℎ) = ∑ 𝑓𝑡 (ℎ) (9)
𝑖=1

The facet of S’s calculation lies in the active involvement of adjacent disc pixels, with
the predominant presence of zero (0) values. By selecting significant ascending shifts in S,
the algorithm identified various reference points. Next, we configured an 18-bin
sub-histograms (non-overlapping) beginning from the lower boundary. The final verte-
bral ROI was enclosed by the contiguous straight lines shown in Figure 7e.
J. Imaging 2023, 9, 265 8 of 21

greater luminosities within the vertebrae’s outer borders and mathematically represented
their histogram projection (pt ) using Equation (8) [33]:
Z 0, i f pt (h)>0
f t (h) = (8)
1, opposite,

where h is the histogram value with a constant B for the histogram’s (pt ) bin dimension; the
summed histogram (S) is the subtotal of each feature ft illustrated in Equation (9) [33]:
n
S(h) = ∑ f t (h) (9)
i =1

The facet of S’s calculation lies in the active involvement of adjacent disc pixels, with
the predominant presence of zero (0) values. By selecting significant ascending shifts
in S, the algorithm identified various reference points. Next, we configured an 18-bin
J. Imaging 2023, 9, x FOR PEER REVIEW 9 of 22
sub-histograms (non-overlapping) beginning from the lower boundary. The final vertebral
ROI was enclosed by the contiguous straight lines shown in Figure 7e.

Figure 7. The process starts with spine edge determination


determination (a), foreground isolation (b), partition
and
and offsets
offsets (c),
(c), luminance
luminance thresholds
thresholds (d), and vertebra
(d), and vertebra detection
detection (e).
(e).

2.6. SCOLIONET’s
2.6. SCOLIONET’s Detailed
Detailed Core
Core Network
Network Architecture
Architecture
With the
With theinitial
initialindividual
individual vertebra
vertebradetermination
determination completed, a finer
completed, ROI was
a finer ROIextracted.
was ex-
It is worth highlighting that each vertebra’s intensity varies considerably
tracted. It is worth highlighting that each vertebra’s intensity varies considerablyin its AP projection.
in its
Theprojection.
AP lumbar portion exhibitsportion
The lumbar higher exhibits
intensity,higher
whereas the cervical
intensity, whereaspartthedisplays
cervicallower.
part
CNNs arelower.
displays robustCNNs
to different lighting
are robust to conditions as they can
different lighting effectively
conditions as extract
they can features from
effectively
images,features
extract making them
from less sensitive
images, to saturation
making them lessthan other techniques.
sensitive to saturationWethan
customized and
other tech-
tweakedWe
niques. a Standard
customized U-Net
andastweaked
a solution because itU-Net
a Standard was designed for general
as a solution because segmentation
it was de-
tasks and was unsuitable for our purpose after multiple tests. Our architecture is composed
signed for general segmentation tasks and was unsuitable for our purpose after multiple
of three parts (Figure 8). On the left side, a four-block (BLK) encoder takes the input
tests. Our architecture is composed of three parts (Figure 8). On the left side, a four-block
image while the next BLK encoder takes the input image. The next BLK receives the
(BLK) encoder takes the input image while the next BLK encoder takes the input image.
previous output subsampled at a lower rate. The first two convolution (CNV) layers have
The next BLK receives the previous output subsampled at a lower rate. The first two
thirty-two (32) feature maps (FM), while the third contains sixty-four (64) FM. Like its
convolution (CNV) layers have thirty-two (32) feature maps (FM), while the third con-
predecessor, the second unit follows a similar configuration. It features sixty-four (64) and
tains sixty-four (64) FM. Like its predecessor, the second unit follows a similar configu-
one-hundred-twenty-eight (128) FM across its layers. The third and fourth BLK seamlessly
ration. It features sixty-four (64) and one-hundred-twenty-eight (128) FM across its lay-
integrate two CNVs, doubling subsampled FM, reaching 128 and 256. At the end of each
ers. The third and fourth BLK seamlessly integrate two CNVs, doubling subsampled FM,
reaching 128 and 256. At the end of each BLK, maxpooling (MP) downsampled the spatial
dimension of FM, this helps reduce computational complexity and memory usage. It re-
tains the essential information from the original FM while helping the network to focus
on the most discriminative features and to discard spatial redundancies. At the onset of
J. Imaging 2023, 9, 265 9 of 21

BLK, maxpooling (MP) downsampled the spatial dimension of FM, this helps reduce
computational complexity and memory usage. It retains the essential information from the
original FM while helping the network to focus on the most discriminative features and
to discard spatial redundancies. At the onset of each MP, the network creates a copy as
a skip connection to ensure that high-resolution FM from the contracting path is passed
directly to the corresponding layer. Doing so retains the final spatial information lost during
downsampling, leading to segmentation precision. The decoder on the right follows the
same architecture as the encoder, concluding with a 2D upsampling technique as it needs
to restore the information produced by the segmentation mask. In Figure 8, the red arrow
is indispensable for reconstructing stored FM in the encoder block’s cluster layer. These
serve as the basis for comparison with the decoder’s oversampled outputs, increasing
J. Imaging 2023, 9, x FOR PEER REVIEW
its
10 of 22
ability to segment in great detail each vertebra through concatenation. A sigmoid activation
function and a 1 × 1 kernel convolution complete the process and the FM’s outcome.

Figure
Figure 8.
8. SCOLIONET’s
SCOLIONET’s architecture
architecture composed
composed ofof encoder,
encoder, bottleneck
bottleneck with
with atrous
atrous spatial
spatial pyramid
pyramid
pooling
pooling (ASPP),
(ASPP), and
and decoder
decoder for
for spine’s vertebra segmentation.
spine’s vertebra segmentation.

The bottleneck in
The bottleneck inthe
themiddle
middleis is a bridge
a bridge between
between the two
the two pathways
pathways with with
fewerfewer
chan-
channels
nels than than the encoder
the encoder and decoder.
and decoder. It contains
It contains a MP,aaMP,
3 × a3 3kernel
× 3 kernel
appliedapplied
to the to the
input
input to extract
to extract features,
features, and aand2 ×a 2 ×kernel
2 kernel with
with stride
stride extrapolated
extrapolated toto
2 2××22 regions
regions with a
pooling window
window with
with two
two pixels
pixels at each step, thus enriching
enriching the FM and helping the
network to learn intricate and abstract features. A rectified learning unit (ReLU) activa-
tion function helps introduce non-linearity for learning of complex relationships in data
while mitigating vanishing gradient problems. Batch normalization (BN) helps prevent
extensive activation and decrease covariate shifts to provide stability during the training.
J. Imaging 2023, 9, 265 10 of 21

network to learn intricate and abstract features. A rectified learning unit (ReLU) activation
function helps introduce non-linearity for learning of complex relationships in data while
mitigating vanishing gradient problems. Batch normalization (BN) helps prevent extensive
J. Imaging 2023, 9, x FOR PEER REVIEW
activation and decrease covariate shifts to provide stability during the training. As a prime
customization, we integrated atrous spatial pyramid pooling (ASPP) to improve training
speed, increase the receptive field, capture fine details, and strengthen the architecture’s
segmentation efficiency without added parameter overheads. Performance of our SCOL-
2.7. Atrous Spatial Pyramid Pooling (ASPP) Structure
IONET, U-Net, and RU-Net were rigorously compared. To assess the quantitative efficiency
Feature we
of segmentation, pyramid
performednetwork
a 10-fold(FPN) core strength
cross-validation. For the lies in its ability
hyperparameters, theto
fol-seamles
lowing were set based on trial and error: batch size = 12, epoch = 120, learning
grate semantic insights from low-resolution feature maps with the intricate spatia rate = 0.01,
dropout rate = 0.20, and an L2-norm loss function to improve segmentation further.
extracted from high-resolution feature maps. The combination of semantic info
from
2.7. lower-resolution
Atrous levels(ASPP)
Spatial Pyramid Pooling and spatial
Structure intricacies from higher-resolution levels
holistic
Featurerepresentation
pyramid network of the
(FPN) visual content.lies
core strength This synergy
in its ability tois seamlessly
significant in terms
inte-
detection
grate semantic and segmentation,
insights where objects
from low-resolution of interest
feature maps with thecan varyspatial
intricate significantly
details in siz
extracted from high-resolution feature maps. The combination
an image. By fusing these distinct types of information, FPN equips neural netwo of semantic information
from lower-resolution levels and spatial intricacies from higher-resolution levels creates a
a clear understanding of both the global context and fine-grained details prese
holistic representation of the visual content. This synergy is significant in terms of object
visual data.
detection It mimics the
and segmentation, where human
objects ofvisual,
interestwhere
can vary our brain effortlessly
significantly in size within integrate
and
an detailed
image. By fusinginformation
these distincttotypes
form of ainformation,
completeFPN perception.
equips neuralFPN, in its computatio
networks with
aallel,
clear understanding
enables machines of bothto theachieve
global context and fine-grained
a similar details present
level of perceptual in the
completeness.
visual data. It mimics the human visual, where our brain effortlessly integrates coarse and
more, FPN enhances the efficiency of information flow through the network, op
detailed information to form a complete perception. FPN, in its computational parallel,
computational
enables machines toresources and level
achieve a similar facilitating more
of perceptual accurate and
completeness. context-aware
Furthermore, FPN pre
This notthe
enhances only improves
efficiency the accuracy
of information of tasksthe
flow through such as object
network, recognition,
optimizing computa- but also
utes to the model’s ability to generalize well across diverse datasets. In this st
tional resources and facilitating more accurate and context-aware predictions. This not
only improvesan
proposed theFPNaccuracy of tasks such as object
by incorporating atrous recognition,
modulesbut also contributes
(AM) consistingtoof theatrous c
model’s ability to generalize well across diverse datasets. In this study, we proposed an
tion and an image pooling layer (Figure 9). The design philosophy behind these
FPN by incorporating atrous modules (AM) consisting of atrous convolution and an image
is akinlayer
pooling to creating
(Figure 9).a The
dynamic, multi-scale
design philosophy receptive
behind field byis allowing
these modules the neural
akin to creating
ato perceive
dynamic, and analyze
multi-scale receptivefeatures
field byat varying
allowing thegranularities.
neural networkThrough to perceivethisand integra
analyze features at varying granularities. Through this integration,
resultant feature representation becomes adaptive to discern complex structure the resultant feature
representation becomes adaptive to discern complex structures within the input data. The
the input data. The module is an intentional departure from conventional, unifo
module is an intentional departure from conventional, uniform convolutions, showcasing
volutions,
our commitment showcasing
to a robust our commitment
feature extraction. to a robust feature extraction.

Figure
Figure 9. The
9. The structure
structure of theof the proposed
proposed atrous(AM)
atrous modules modules (AM)
with the with
atrous the atrous
convolution layerconvolution
and
image pooling layer.
image pooling layer.

The primary objective is to leverage both spatial and semantic informati


effectively within the network architecture to enhance the model’s ability to com
and interpret intricate patterns and structures within the data. We aim to strike a
between capturing fined-grained spatial details and grasping the broader conte
J. Imaging 2023, 9, 265 11 of 21

The primary objective is to leverage both spatial and semantic information more effec-
tively within the network architecture to enhance the model’s ability to comprehend and
interpret intricate patterns and structures within the data. We aim to strike a balance be-
tween capturing fined-grained spatial details and grasping the broader context of semantic
meaning. This is pivotal, especially in a task where understanding both the details and
the overarching semantics of the input data is essential. The implemented adjustments
ensure that spatial nuances are preserved and integrated cohesively, with semantic un-
derstanding at various levels of the feature hierarchy. Integrating the AMs into the FPN,
X1 to X4 undergoes a 1 × 1 convolution for optimizing them for the subsequent fusion
J. Imaging 2023, 9, x FOR PEER REVIEW
process. Next, C1 to C4 go through the AMs with atrous rates at 2, 4, and 6. When dealing 12 of 22
with relatively small scales, the use of rates becomes instrumental because a 3 × 3 filter
convolution can lose its efficiency and degenerate into a 1 × 1 filter if the rates become
larger. Understanding
low-level and optimizing
features, specifically from F1theseandatrous
F4, isrates are essential
achieve through for preserving
additional the
edges,
hierarchical
which information
are then subjected with feature maps.
to a summation The integration
process facilitated byof ahigh-level and low-level
1 × 1 convolution layer.
features, specifically from F1 and F4, is achieve through additional edges,
Importantly, this fusion is executed to increase the model’s capacity without inflating which are then
subjected to a summation process facilitated by a 1 × 1 convolution layer.
computational complexity. By using a 1 × 1 convolution layer, the fusion of features is Importantly,
this fusion is
performed executed
with to increase
a minimal the model’s
increase capacityofwithout
in the number inflating
parameters, computational
contributing to a
complexity. By using a 1 ×
streamlined and effective neural network architecture. Finally, C1 to C4 undergo awith
1 convolution layer, the fusion of features is performed 3 × 3a
minimal increase
convolution in the number
independently. Theofpurpose
parameters, contributing
of this operation to is atwofold:
streamlined
First,and effective
it serves to
neural network architecture. Finally, C1 to C4 undergo a 3 × 3 convolution independently.
remove the aliasing effect during upsampling, ensuring fidelity of features. Second, by
The purpose of this operation is twofold: First, it serves to remove the aliasing effect dur-
conducting these convolutions independently, we preserve the unique characteristics of
ing upsampling, ensuring fidelity of features. Second, by conducting these convolutions
each channel. Its design not only mitigates potential instability but also aligns with the
independently, we preserve the unique characteristics of each channel. Its design not only
principle of preserving essential information as it traverses through the network. Figure
mitigates potential instability but also aligns with the principle of preserving essential
10 shows our pyramid network.
information as it traverses through the network. Figure 10 shows our pyramid network.

Figure 10. The


The proposed
proposed architecture of a feature pyramid network with atrous modules (AM) with
additional edges, F1 to F4 are the fused output of M1 to M4 with the
the same
same dimensions.
dimensions.

2.8. Cobb
2.8. Cobb Angle
Angle Reference
Reference and
and Calculation
Calculation
After a fined-tuned segmentation,
After a fined-tuned segmentation, the thecontours
contourswere
wereextracted
extractedtotorepresent
representbounda-
bound-
ariesthrough
ries throughaabounding
boundingbox boxmethod
method(BBM).
(BBM).These
Theseboxes
boxeswere
were then
then stored
stored inin an
an array
array
(maximum x, minimum y, maximum y, and minimum y) illustrated
(maximum x, minimum y, maximum y, and minimum y) illustrated in Figure 11. Using in Figure 11. Using
the array’s
the array’s values,
values, the
the method
method identifies
identifies the
the lower
lower and
and upper
upper borders
borders of of the
the vertebrae.
vertebrae.
Then, the angles of the endplates (flat surfaces at the top and bottom of each
Then, the angles of the endplates (flat surfaces at the top and bottom of each vertebra) vertebra) were
re-stored in another array. These angles were significant in identifying tilted
were re-stored in another array. These angles were significant in identifying tilted verte- vertebrae as
they convey reference points for the analysis. The process was iterative by
brae as they convey reference points for the analysis. The process was iterative by com- comparing the
paring the adjacent endplate angle differences, ensuring the determination of the largest
angle. CA is then calculated using Equation (10) [33]:
𝑧𝑖 − 𝑧𝑗
𝐶𝐴 = max {|𝑡𝑎𝑛−1 ( )|} (10)
1 + 𝑧𝑖 − 𝑧𝑗
where Zi and Zj are the slopes of the upper and lower edges of the identified reference
vertebrae, respectively.
J. Imaging 2023, 9, 265 12 of 21

adjacent endplate angle differences, ensuring the determination of the largest angle. CA is
then calculated using Equation (10) [33]:
( !)
−1 zi − z j
CA = max tan (10)
1 + zi − z j
J. Imaging 2023, 9, x FOR PEER REVIEW 13 of 22
where Zi and Zj are the slopes of the upper and lower edges of the identified reference
vertebrae, respectively.

Figure11.
Figure 11.Border-box
Border-box methods
methods (BBM)
(BBM) for determining
for determining the tilted
the most mostendplates
tilted endplates of the referenced
of the referenced
upperand
upper andlower
lower vertebra’s
vertebra’s border.
border.

2.9. Quantitative Image Enhancement Evaluation


2.9. Quantitative Image Enhancement Evaluation
Traditionally, the assessment of image quality improvement relied on manual visual
Traditionally,
inspection. However, the
this assessment of imageand
approach is subjective quality improvement
inconsistent. To address relied
this on manual visua
limitation,
we used visual information fidelity (VIF) for an objective and concrete evaluation ofthis limi
inspection. However, this approach is subjective and inconsistent. To address
tation, we
graphics used visualVIF
improvement. information
is a specificfidelity (VIF) fortoanmeasure
metric designed objective andenhancement
digital concrete evaluation
of graphics
quality that isimprovement.
more rigorous VIF is a specific by
and quantitative metric designed
accounting to measure
for various digital
attributes enhance
[34].
Its process compares the original and processed images through overlapping
ment quality that is more rigorous and quantitative by accounting for various attributes blocks to
form
[34]. aItssingle VIF score
process scaledthe
compares from 0 to 1. A
original higher
and value indicates
processed imagesathrough better picture quality. blocks
overlapping
This scaling ensures consistency and reliability, making it valuable where precision is
to form a single VIF score scaled from 0 to 1. A higher value indicates a better picture
essential. We utilized correlation coefficient (CC) and Spearman rank order CC (SROCC)
quality. This scaling ensures consistency and reliability, making it valuable where preci
considering numerous factors influencing the perceived visual fidelity. Radiologists suggest
asion
VIF is essential.
score of 0.9 toWe
1 asutilized correlation
an excellent coefficient
improvement. (CC) and
The general formula Spearman rankby
is expressed order CC
(SROCC) considering
Equation (11) [35]: numerous factors influencing the perceived visual fidelity. Radi
ologists suggest a VIF score of 0.9 to 1 as  N, an
j excellent
N, j improvement. The general formula

is expressed by Equation ∑ j∈sub
(11) −band I C
[35]: ; F S N, j = s N, j
VIF =  N, j
N, j
 (11)
∑ j∈sub−band∑I𝑗∈𝑠𝑢𝑏−𝑏𝑎𝑛𝑑
C ; E𝐼(𝐶̅ 𝑁,𝑗 ;S𝐹̅N,
𝑁,𝑗j |= s N,
𝑆 𝑁,𝑗
j
= 𝑠 𝑁,𝑗 )
VIF = ∑ 𝑁,𝑗 ; 𝐸
̅ 𝑁,𝑗 | 𝑆 𝑁,𝑗 = 𝑠 𝑁,𝑗 )
(11
𝑗∈𝑠𝑢𝑏−𝑏𝑎𝑛𝑑 𝐼(𝐶̅
where N are small overlapping blocks as part of the sub-band decomposition, denoted by j.
where
The C Nrepresents
are small resemblance
N, j overlappingvariables
blocks as part of the
associated sub-band
with specific j decomposition,
∈ sub − band. denoted by
j. The 𝐶̅𝑁,𝑗 represents resemblance variables associated with specific 𝑗 ∈ 𝑠𝑢𝑏 − 𝑏𝑎𝑛𝑑.

2.10. Segmentation Assessment Measures


We calculated the segmentation’s accuracy of results using intersection over union
(IoU), the Sorensen–Dice coefficient (SDC), and mean squared error (MSE). IoU is a
commonly applied metric to evaluate segmentation algorithms’ performances, specifi
cally in computer vision (CV) tasks. It calculates the degree of overlap between ground
J. Imaging 2023, 9, 265 13 of 21

2.10. Segmentation Assessment Measures


We calculated the segmentation’s accuracy of results using intersection over union
(IoU), the Sorensen–Dice coefficient (SDC), and mean squared error (MSE). IoU is a com-
monly applied metric to evaluate segmentation algorithms’ performances, specifically in
computer vision (CV) tasks. It calculates the degree of overlap between ground truth (GT)
and predicted regions (PR). For each image’s object, GT represents the actual location and
PR generated by the segmentation algorithm. As expressed in Equation (12), it is calculated
by dividing the intersection by the union’s area, where the value ranges between 0 (no
overlap) and 1 (perfect match).
| GT ∩ PR|
IoU = (12)
| GT ∪ PR|

Another mathematical framework evaluation is the SDC. At its core, the SDC measures
the extent of the agreement by identifying the common elements between two objects by
dividing the size of the overlap score by the sum of the dimensions of two segmented
regions (Equation (13)). As an advantage, it is scale invariant, robust to class imbalances,
threshold independent, and highly interpretable.

2| GT ∩ PR|
SDC = (13)
| GT | + | PR|

Lastly, we used the MSE to quantify the squared differences between GT and PR at
the pixel level (Equation (14)). The MSE regards minor and major differences, penalizing
larger deviations and making the calculation sensitive to significant changes for spotting
subtle variations. Furthermore, it can offer consistency on a large-scale evaluation.
n
1
MSE =
n ∑ (GTi − PRi )2 (14)
i =1

2.11. Degree of Difference Evaluation Metrics


In this article, we compared the medical expert’s annotated (manual) and our AI
approach CA measurements. We conducted a t-test on normally distributed data, a well-
established statistical technique to ascertain the validity of our findings. The objective
is to identify substantial differences between two distinct and independent groups. We
used a p-value (pv) threshold of 0.05 and confidence interval of 95%. A pv lower than the
configured parameter strongly indicates notable divergence. At the same time, a higher
value suggests significant indifference. The formula is represented by Equation (15):
g − g2
t = q1 (15)
Sp 1 1
n + n2 1

where g1 and g2 are the means of two groups, n1 and n2 are the group’s sample sizes, with
S p as the pooled standard deviations.

2.12. Cobb Angle Measurement’s Reliability Test


We assessed our angle measurements with a clear intention of enhancing reliability.
Our motivation for another metric was to ensure our calculations were consistent and
dependable. To accomplish this, we used a mean absolute percentage error (MAPE), known
for its relevance as a scale-independent (percentage-based) measure for most prediction
scenarios. In addition, it is less sensitive to outliers, allowing a straightforward, clear,
and understandable interpretation for general audiences. The formula is depicted by
Equation (16):
ai − xi
∑iN=1 × 100
xi
MAPE = (16)
N
J. Imaging 2023, 9, 265 14 of 21

With ai and xi the actual and predicted values, and N as the total number of samples.

3. Results
We conducted our experiment on a computer with an AMD Ryzen 9 5900X processor
(4.8 Ghz, 64M cache), 64 GB DDR4, and a GeForce RTX3080 graphics processing unit
(1.71 GHz, 20 GB). It also boasts a 1 TB NVMe solid-state drive (SSD) and a 4 TB hard disk
drive (HDD). Although there are two storage devices at our disposal, we opted to use the
SSD for its near instantaneous access times, leading to quicker loading of training images
and faster processing. Moreover, we used TensorFlow and other related libraries for the
deep learning network’s construction. Subsequent sections present the detailed results.

3.1. Image Enhancement Performance Evaluation


To comprehensively evaluate our enhancement approach, we subjected 100 randomly
selected images to quality assessments. As outlined in Table 2, notable improvements were
observed in mean outcomes for raw input images processed with color shifting and SWF.
The result showcased a highly similar CC value of 0.973 and a positively monotonic SROCC
of 0.968. The preprocessed AP scans revealed more subtle details and distinguishable fea-
tures while preserving the overall integrity of the images’ structures (Figure 5), a precursor
for diagnostic reliability.

Table 2. Average quality enhancement of selected images (N = 100).

Correlation Spearman Rank Order


Method
Coefficient (CC) Correlation Coefficient (SROCC)
VIF 0.973 0.968

3.2. Computing Performance Assessment


The measurement of computational performance in neural networks has far-reaching
implications in research and practical applications. By conducting this evaluation, we
can ascertain the extent to which computational resources are utilized and how they can
provide insights into the networks’ capability to operate within the constraints of the
computing environment. In the context of scalability, the metrics serve as a critical indicator
of a model’s adaptability to varying scales of data and computational infrastructure. Table 3
presents our SCOLIONET’s computational load based on processor cores. The data suggest
that, in general, increasing the number of active processor cores from eight results in
a reduced training time for all three models. This aligns with the parallel processing
power of multi-core systems. In terms of training time, it varies between the architecture’s
complexities. Residual U-Net and SCOLIONET, in this case, show longer training times
compared to the Standard U-Net.

Table 3. Computing performance of different deep neural networks with reference to the number
of processors.

Deep Neural Training Time in Minutes


Network Model Four Active Processor Cores Eight Active Processor Cores
Standard U-Net 21.16 18.35
Residual U-Net 23.81 20.15
SCOLIONET 24.48 19.38

Memory constraints can significantly impact the efficiency of the training process. In
scenarios where models need to scale across distributed systems or edge devices, efficient
memory usage is important. We use the nvidia-smi command to retrieve vital information
from the NVIDIA graphics processing unit (GPU). It is imperative to note that, during
model inference, the memory consumption of the GPU can be accurately ascertained by
J. Imaging 2023, 9, 265 15 of 21

examining the results from the application programing interface (API). This serves as a
reliable indicator of the GPU’s memory usage throughout the inference process regarding
J. Imaging 2023, 9, x FOR PEER REVIEW 16 of
resource allocation and overall system performance monitoring. Table 4 depicts the memory
consumption of the selected models based on number of batches. The memory utilization
for Standard U-Net remains relatively stable across different sizes, ranging from 0.63 to
0.68. It suggestsathat
demonstrates the model
distinct doesIts
pattern. notmemory
exhibit sensitivity to changes
consumption in batchlow
is relatively size.atOn
a batch s
the other hand, the Residual U-Net, while starting with a slightly higher base
of 1 (0.72) and gradually increases. The model appears efficient in handling individ line of 0.72,
maintains consistency within a reasonable range with batch sizes of 0.66 to 0.78. However,
instances but experiences a proportional rise in memory demands as the batch size
there is a noticeable increase in memory usage as the batch grows, indicating that the model
pands.
is resource demanding with larger batches. SCOLIONET’s consumption is fairly similar
to the Residual U-Net at lower batches but demonstrates a distinct pattern. Its memory
Table 4. Computing
consumption performance
is relatively of different
low at a batch deepand
size of 1 (0.72) neural networks
gradually with The
increases. reference
model to mem
consumptions.
appears efficient in handling individual instances but experiences a proportional rise in
memory demands as the batch size expands.
Deep Neural Memory Consumption Based on Number of Batches
Table 4. ComputingModel
Network performance of different
1 deep neural
2 networks4with reference
8 to memory
16consumptions.
6432
Standard U-Net
Deep Neural
0.67 Consumption
Memory 0.63 0.67 0.67 of Batches
Based on Number 0.67 0.66 0.68
Residual
Network U-Net
Model 1 0.722 0.66
4 0.72
8 0.72
16 0.71
32 0.73
64 0.78
SCOLIONET
Standard U-Net 0.67 0.72
0.63 0.65
0.67 0.71
0.67 0.70
0.67 0.70
0.66 0.72
0.68 0.77
Note: eight active
Residual U-Net processor
0.72cores. 0.66 0.72 0.72 0.71 0.73 0.78
SCOLIONET 0.72 0.65 0.71 0.70 0.70 0.72 0.77
3.3. Segmentation Performance and Visual Confirmation Evaluation
Note: eight active processor cores.
Table 5 presents the evaluation results of SDC, IoU, and MSE obtained from
3.3.
CNN Segmentation Performance
segmentation and Visual
models’ Confirmation Evaluation
cross-validation. The results disclose that SCOLION
(0.975, 0.963, and 0.025) performed better than aand
Table 5 presents the evaluation results of SDC, IoU, MSE obtained
RU-Net (0.950, from theand
0.942, CNN0.030), a
segmentation models’ cross-validation. The results disclose that SCOLIONET
U-Net (0.941, 0.926, and 0.032) and in achieving overall vertebra segmentation (0.975, 0.963, accura
and 0.025) performed better than a RU-Net (0.950, 0.942, and 0.030), and U-Net (0.941, 0.926,
Moreover, Figure 12 complements and reinforces the quantitative measures by provid
and 0.032) and in achieving overall vertebra segmentation accuracy. Moreover, Figure 12
an excerpt ofand
complements thereinforces
visual representation
the quantitativeproduced
measures by byproviding
the threeanarchitectures
excerpt of thewith ref
ences against the GT.
visual representation produced by the three architectures with references against the GT.

Figure12.12.Visual
Figure Visual segmentation
segmentation excerpt
excerpt results
results of of models
the three the three models
showing showing SCOLIONET’s
SCOLIONET’s capability. ca
bility.
J. Imaging 2023, 9, 265 16 of 21

Table 5. Cross-validated segmentation performance of the three neural network architectures based on various metrics.

SDC IoU MSE


Fold
U-Network SCOLIONET RU-Network U-Network SCOLIONET RU-Network U-Network SCOLIONET RU-Network
1 0.939 ± 0.034 0.978 ± 0.027 0.952 ± 0.025 0.923 ± 0.042 0.967 ± 0.028 0.940 ± 0.045 0.031 ± 0.018 0.021 ± 0.014 0.028 ± 0.018
2 0.943 ± 0.035 0.973 ± 0.025 0.953 ± 0.024 0.921 ± 0.045 0.963 ± 0.027 0.942 ± 0.040 0.032 ± 0.015 0.025 ± 0.015 0.029 ± 0.016
3 0.941 ± 0.031 0.975 ± 0.026 0.951 ± 0.029 0.922 ± 0.044 0.960 ± 0.028 0.939 ± 0.039 0.033 ± 0.016 0.024 ± 0.016 0.030 ± 0.018
4 0.942 ± 0.034 0.976 ± 0.027 0.949 ± 0.026 0.923 ± 0.045 0.969 ± 0.031 0.941 ± 0.041 0.034 ± 0.017 0.025 ± 0.014 0.032 ± 0.017
5 0.942 ± 0.035 0.977 ± 0.028 0.951 ± 0.027 0.921 ± 0.041 0.961 ± 0.029 0.942 ± 0.043 0.032 ± 0.015 0.027 ± 0.015 0.033 ± 0.019
6 0.943 ± 0.033 0.975 ± 0.024 0.949 ± 0.028 0.929 ± 0.046 0.964 ± 0.030 0.940 ± 0.043 0.033 ± 0.019 0.029 ± 0.020 0.030 ± 0.016
7 0.941 ± 0.032 0.976 ± 0.027 0.950 ± 0.029 0.925 ± 0.043 0.962 ± 0.030 0.948 ± 0.040 0.032 ± 0.017 0.028 ± 0.016 0.032 ± 0.018
8 0.942 ± 0.032 0.977 ± 0.028 0.951 ± 0.030 0.931 ± 0.045 0.965 ± 0.032 0.946 ± 0.039 0.034 ± 0.015 0.027 ± 0.013 0.031 ± 0.016
9 0.941 ± 0.035 0.978 ± 0.026 0.950 ± 0.028 0.932 ± 0.048 0.968 ± 0.033 0.945 ± 0.040 0.033 ± 0.016 0.023 ± 0.012 0.031 ± 0.018
10 0.940 ± 0.037 0.974 ± 0.025 0.949 ± 0.029 0.933 ± 0.049 0.964 ± 0.031 0.943 ± 0.041 0.032 ± 0.018 0.024 ± 0.017 0.029 ± 0.016
Mean ±
Standard 0.941 ± 0.033 0.975 ± 0.026 0.950 ± 0.027 0.926 ± 0.044 0.963 ± 0.029 0.942 ± 0.041 0.032 ± 0.016 0.025 ± 0.015 0.030 ± 0.017
deviation
Training duration, eight processing cores (in minutes): SCOLIONET (19.38), RU-Network (20.15), and U-Network (18.35)
Test duration, eight processing cores (in seconds): SCOLIONET (0.04), RU-Network (0.03), and U-Network (0.02)
J. Imaging 2023, 9, 265 17 of 21

3.4. Cobb
J. Imaging 2023, 9, x FOR PEER REVIEW Angle Performance Evaluation 18 of 22

Table 6 demonstrates the detailed outcome of our deep learning approach versus
manual measurements. Notably, the t-test emphasized no significant differences between
the two
3.4. groups,
Cobb by a p-value = 0.8659 and a t-value (degree of freedom = 18) of
highlightedEvaluation
Angle Performance
0.1713. This convergence is evident through the MAPE of 3.86% or 96.13% accuracy. The
Table 6 demonstrates the detailed outcome of our deep learning approach versus
findings affirm our model’s close alignment with the actual observed annotated values,
manual measurements. Notably, the t-test emphasized no significant differences between
showcasing an impressive 2.86-degree discrepancy (very small). Although the angle
the two groups, highlighted by a p-value = 0.8659 and a t-value (degree of freedom = 18)
deviations
of 0.1713. are
Thissmall for minor
convergence scoliosis
is evident cases, the
through it can be observed
MAPE of 3.86% or that, as the
96.13% curvature
accuracy.
increases, it can become challenging for the algorithm to precisely identify
The findings affirm our model’s close alignment with the actual observed annotated the end vertebrae
that can be susceptible to errors [36], as can be seen on X-ray IDs 0203
values, showcasing an impressive 2.86-degree discrepancy (very small). Although the and 0253 with a
difference of a 2.20 degree angle from the actual values (inter- and intra-observer
angle deviations are small for minor scoliosis cases, it can be observed that, as the cur- variability).
Furthermore, an excerpt
vature increases, of thechallenging
it can become visual representation depicted
for the algorithm in Figure
to precisely 13 exhibits
identify the end our
artificial intelligence-based technique for various angles.
vertebrae that can be susceptible to errors [36], as can be seen on X-ray IDs 0203 and 0253
with a difference of a 2.20 degree angle from the actual values (inter- and intra-observer
Table 6. Comparative
variability). calculation
Furthermore, of Cobb
an excerpt angles
of the among
visual healthcare depicted
representation professionals (manual)
in Figure 13 vs.
SCOLIONET (automated).
exhibits our artificial intelligence-based technique for various angles.
Absolute Difference of Vertebral
Table
SCOLIONET’s Cobb Angle 6. Comparative calculation
Experts’s of (Observed)
Cobb Angle Cobb angles among healthcare professionals (manual) vs.
References (SCOLIONET vs. Expert)
X-ray SCOLIONET (automated).
ID Most Tilted Most Tilted Cobb Most Tilted Most Tilted Cobb Most Tilted Most Tilted
Cobb Angle
Upper Lower Angle Upper Lower Angle Upper Lower
Absolute Difference of Vertebral Degree
VertebraeSCOLIONET’s
VertebraeCobb Angle
Degree Experts’s Cobb
Vertebrae Angle (Observed)
Vertebrae Degree Vertebrae Vertebrae
References (SCOLIONET vs. Expert)
0021X-ray TH08
Most Tilted LU03 Tilted 23.80
Most TH08
Most Tilted LU03
Most Tilted 23.50 TH08—0
Most Tilted MostLU03—0
Tilted 0.30
ID Cobb Angle Cobb Angle Cobb Angle
0055 Upper
TH12 Lower
LU02 12.50 Upper
TH12 Lower
LU02 13.70 Upper
TH12—0 Lower
LU02—0 1.20
Degree Degree Degree
0071 Vertebrae
TH09 Vertebrae
LU04 13.60 Vertebrae
TH09 Vertebrae
LU03 15.20 Vertebrae
TH09—0 Vertebrae
LU04/LU03—1 1.60
0021 TH08 LU03 23.80 TH08 LU03 23.50 TH08—0 LU03—0 0.30
0085 TH05 TH11 25.30 TH05 TH11 26.20 TH05—0 TH11—0 0.90
0055 TH12 LU02 12.50 TH12 LU02 13.70 TH12—0 LU02—0 1.20
0103 0071 TH06 TH09 TH12
LU04 24.60
13.60 TH06
TH09 TH12
LU03 24.50
15.20 TH06—0
TH09—0 TH12—0
LU04/LU03—1 1.60 0.10
0123 0085 TH10 TH05 TH11
LU03 25.30
23.10 TH05
TH10 TH11
LU03 26.20
22.90 TH05—0
TH10—0 TH11—0
LU03—0 0.90 0.20
0203 0103 TH03 TH06 TH12
TH09 24.60
41.20 TH06
TH02 TH12
LU01 24.50
43.70 TH06—0
TH03/TH07—3 TH12—0
TH09/LU01—3 0.10 3.50
0123 TH10 LU03 23.10 TH10 LU03 22.90 TH10—0 LU03—0 0.20
0233 TH05 LU04 32.60 TH05 LU04 32.50 TH05—0 LU04—0 0.10
0203 TH03 TH09 41.20 TH02 LU01 43.70 TH03/TH07—3 TH09/LU01—3 3.50
0253 TH05 LU02 40.60 TH06 LU04 42.50 TH03/TH06—2 LU02/LU04—2 3.00
0233 TH05 LU04 32.60 TH05 LU04 32.50 TH05—0 LU04—0 0.10
0313 0253 TH06 TH05 TH12
LU02 16.70
40.60 TH06
TH06 TH12
LU04 17.20
42.50 TH06—0
TH03/TH06—2 LU02/LU04—2 TH12—0 3.00 0.50
Legend: 0313 TH06 TH12 16.70 TH06 TH12 17.20 (thoracic),
TH01–TH12 TH06—0 TH12—0
LU01–LU05 (lumbar) [1] 0.50
Legend:
T-test (SCOLIONET vs. Experts) TH01–TH12 (thoracic), LU01–LU05 (lumbar) [1]
t = 0.1713, p-value = 0.8659 (Not significant at p < 0.05)
T-test (SCOLIONET vs. Experts) t = 0.1713, p-value = 0.8659 (Not significant at p < 0.05)
MAPE (SCOLIONET vs. Experts) 3.86% (Accuracy = 96.13)
MAPE (SCOLIONET vs. Experts) 3.86% (Accuracy = 96.13)
Mean absolute
Mean difference
absolute of measurements
difference (SCOLIONET
of measurements vs. Experts)
(SCOLIONET vs. Experts) 2.86 degrees
2.86 degrees

Figure
Figure 13.13.Excerpt
ExcerptofofCobb
Cobbangle
angle measurements
measurements with
withcomputer
computergenerated reference
generated lines.
reference lines.
J. Imaging 2023, 9, 265 18 of 21

3.5. Benchmarked Performance versus Existing State-of-the-Art Approaches


Table 7 presents the performance of different architectures, models, and special mech-
anisms for vertebra segmentation.

Table 7. Comparative vertebra segmentation accuracies using various methodologies.

Architectures/Approaches/Models/Mechanisms Accuracy
Standard U-Net (with different configurations) [22] 88.01%
Patch-wise portioning + minimum bounding boxes [14] 88.60%
K-Means clustering + regression [15] 88.20%
Residual U-Net [23] 88.30%
Lateral boundary detection [16] 88.50%
3-Stage process (Otsu algorithm, morphology, and polynomial fitting) [11] 90.20%
Otsu thresholds + Hough transformations [17] 90.30%
Corner tracking + support vector machines [18] 90.40%
Dense U-Net [23] 94.20%
Residual U-Net (polynomial fitting + minimum border box) [24] 95.10%
SCOLIONET (spinal isolation via Adboost + color shifting + SWF +
polynomial fitting + bounding box method + modified U-Net with atrous 97.50%
spatial pyramid pooling)

Despite being a foundational architecture, the Standard U-Net achieves a reasonable


accuracy (88.01%). The variation in configurations suggests the importance of hyperpa-
rameter tuning for optimal performance. A patched-wise approach demonstrates a slight
improvement in accuracy (88.60%). Next, the K-Means clustering and regression offers
efficacy of the unsupervised method (clustering) with the introduction of regression for
refinement (88.20%). Residual U-Net (88.30%) and lateral boundary detection (88.50%)
highlight the importance of detecting boundary information for precise segmentation.
A multistage Otsu (90.20%) and Hough transformation showcases a refined technique
where precise delineation is crucial (90.30%). With corner tracking combined with SVM,
it demonstrates an even better machine learning for image segmentation (90.40%). With
the advancement of algorithms such as Dense U-Net, having dense connectivity patterns
substantially improves accuracy, underscoring the importance of information flow between
densely connected blocks (94.20%). The polynomial fitting method and minimum border-
box techniques with Residual U-Net leads to accuracy leap, demonstrating the synergistic
effect of combined approaches (95.10%). Lastly, our SCOLIONET architecture with com-
prehensive preprocessing and non-standard techniques achieves the highest accuracy of
97.50% by incorporating different strategies for comprehensive segmentation.

4. Discussion
The findings of our research represent a noteworthy breakthrough, establishing the
capability of our method to automatically quantify CA, pivotal information for determin-
ing scoliosis severity. Based on experimental results, our SCOLIONET beats U-Net and
RU-Net with segmentation accuracies of 97.50% (SDC), 96.30% (IoU), and 0.025 (MSE). A
reported reduction in segmentation accuracy by 1.92%, or almost two percent, translates
to a marked improvement rate, especially considering the complexities and intricacies
often encountered in image processing. Empirical metrics also show a 95.86% accuracy
based on MAPE (3.86%). The insignificant difference between the AI-powered automated
method (t-test p-value = 0.8659) and the traditional technique validated the entire pipeline’s
robustness. Various factors collaboratively led to this outcome. First, the integration of color
shifting and SWF for image enhancement amplified the visual information inherent in the
raw X-ray without compromising intrinsic structural integrity. These added vibrancy and
J. Imaging 2023, 9, 265 19 of 21

depth to the images serve as crucial markers for deep learning. Second, modularizing the
procedures from spinal isolation, spinal edge detection, and vertebra segmentation dimin-
ished unnecessary overheads in the training and learning phase. Third, the customization
and inclusion of ASPP in the U-Net’s architecture increases segmentation accuracy and its
ability to capture multi-scale contextual information. It adds discriminative power to the
network to identify subtle or minute pixel classification in most spinal images. Lastly, our
examination unveiled that a network’s complexity (RU-Net) does not innately translate to
superior segmentation performance. This realization underscores a fundamental principle
in computer vision, that the model’s efficiency is tied to its alignment with the specific
demands of the application at hand. By drawing parallels between our findings and ex-
isting research on machine learning-based medical image diagnosis, we have collectively
made a conscious effort to build upon the knowledge amassed by our predecessors. We
also acknowledged the ongoing evolution of the field and positioned our work within the
continuum of advancements in automated medical image processing. Like most studies,
we confronted various challenges internal to scoliosis assessment.
While our research focused on moving the field forward, it is essential to divulge that
our efforts have limitations, mainly when dealing with images of highly deficient quality.
It is also important to note that our collected data are limited to a fully developed spine
with ages 16 and above, marking the end of significant longitudinal bone growth. In reality,
human vision is a highly dynamic sensory perception that understands depth, colors,
motion, complex patterns, and prior experiences. While computer vision is becoming
sophisticated, often focusing on object detection, image segmentation, or facial recognition.
These systems still lack the holistic and contextual understanding that human vision
possesses such as creativity that cannot be captured by predefined rules and patterns
created by algorithms. When human expertise is combined with machine knowledge
to aid in diagnosis, the challenges of ambiguity can be blurred. Additionally, patient’s
unbalanced positions and postures during image scans could introduce deviations in the
CA. We recognized that these uncertainties were common to image processing and not
directly addressed by our current approach. These avenues present valuable opportunities
for further research and innovation to achieve a more accurate measurement in a broader
range of clinical scenarios.

5. Conclusions
Scoliosis, a spinal abnormality, poses an expanse of health-related adversities, from
short- to long-term complications. It includes posture deformities, balance issues, degener-
ative diseases, and potential harm to internal organs. For this context, accurately gauging
the severity of spinal curvature is paramount for medical practitioners, serving as valuable
informational support for effective treatment planning. Historically, measuring the CA—a
key indicator of scoliosis—has been a painstaking mechanical process. It is prone to dis-
parities linked to subjective factors of a physician’s training, experience, and case-specific
expertise. The complications of manual vertebra segmentation and referencing, often reliant
solely on the human’s naked eye, introduce further difficulties. Moreover, the implicit noise
with X-ray images with the intersection of anatomical structures such as the ribs, lungs,
and heart exacerbates the complexity. To solve this predicament, we have devised a sys-
tematic end-to-end pipeline harnessing deep learning using our SCOLIONET’s customized
CNN architecture. Our innovative approach aims to automate the CA identification, thus,
addressing the inconsistencies of traditional methods. We capitalized on the capabilities of
AI in assisting human perception.
Our overall findings have been profound, showing notable consistency between ex-
perts and machine learning estimation, with an average difference of 2.86 degrees, a remark-
ably reliable value that significantly reduces the standard manual variations. In essence,
our established framework has made a vital contribution to machine-driven medical imag-
ing examination. The application of this research directly impacts the clinical sphere for
rapid, accurate, and reliable means of scoliosis severity evaluations. Consequently, our
J. Imaging 2023, 9, 265 20 of 21

discoveries are a stride in offering a simplified and robust approach to empower medical
personnel with the modern tools needed to understand scoliosis and enhance patient care
comprehensively.
As for future work, we are committed to refining our models’ precision through
numerous strategies, such as enhancing the segmentation algorithm and evaluating the
performance of other CNN networks. The authors also plan to implement federated
learning approaches, where models are trained locally on distributed devices, as this will
allow for collaborative model training without sharing of sensitive (private) patient data
and to explore collaboration with medical research institutions to obtain more datasets in
accordance with data sharing ethical practices.

Funding: This research received no external funding.


Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are publicly available: (a) SpineWeb
(collaborative platform for research on spine imaging and image analysis: dataset 16: 609 spinal
anterior–posterior X-ray images); (b) a dataset of scoliosis, spondylolisthesis, and normal vertebra
X-ray images (DOI: 10.17632/xkt857dsxk.1).
Acknowledgments: The researcher gratefully acknowledges the invaluable technical support pro-
vided by Southern Luzon State University and the Commission on Higher Education’s Institutional
and development grant (IDIG) for the completion of this study.
Conflicts of Interest: The author declares no conflict of interest.

References
1. Vertebrae Column. Available online: https://www.britannica.com/science/vertebra/ (accessed on 25 June 2023).
2. Labrom, F.; Izatt, M.; Claus, A.; Little, J. Adolescent idiopathic scoliosis 3D vertebral morphology, progression and nomenclature:
A current concepts and review. Eur. Spine J. 2023, 30, 1823–1834. [CrossRef] [PubMed]
3. McAviney, J.; Roberts, C.; Sullivan, B.; Alevras, A.; Graham, P.; Brown, B. The prevalence of adult de novo scoliosis: A systematic
review and meta-analysis. Eur. Spine J. 2020, 29, 2960–2969. [CrossRef] [PubMed]
4. Scoliosis Degrees of Curvature Chart. Scoliosis Reduction Center. Available online: https://www.scoliosisreductioncenter.com/
blog/scoliosis-degrees-of-curvature-chart/ (accessed on 5 July 2023).
5. Victoria, M.; Lau, H.; Lee, T.; Alarcon, D.; Zheng, Y. Comparison of ultrasound scanning for scoliosis assessment: Robotic versus
manual. Int. J. Med. Robot. Comput. Assist. Surg. 2022, 19, e2468. [CrossRef] [PubMed]
6. Sun, Y.; Xing, Y.; Zhao, Z.; Meng, X.; Xu, G.; Hai, Y. Comparison of manual versus automated measurement of cobb angle in
idiopathic scoliosis based on a deep learning keypoint detection technology. Eur. Spine J. 2021, 31, 1969–1978. [CrossRef]
7. Maaliw, R.; Soni, M.; Delos Santos, M.; De Veluz, M.; Lagrazon, P.; Seño, M.; Salvatierra-Bello, D.; Danganan, R. AWFCNET: An
attention-aware deep learning network with fusion classifier for breast cancer classification using enhanced mammograms. In
Proceedings of the IEEE World Artificial Intelligence and Internet of Things Congress (AIIoT), Seattle, DC, USA, 7–10 June 2023.
8. Pradhan, N.; Sagar, S.; Singh, A. Analysis of MRI image data for Alzheimer disease detection using deep learning techniques.
Multimed. Tools Appl. 2023, 1–24. [CrossRef]
9. Maaliw, R.; Mabunga, Z.; De Veluz, M.; Alon, A.; Lagman, A.; Garcia, M.; Lacatan, L.; Dellosa, R. An enhanced segmentation
and deep learning architecture for early diabetic retinopathy detection. In Proceedings of the IEEE 13th Annual Computing and
Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 8–11 March 2023.
10. Tu, Y.; Wang, N.; Tong, F.; Chen, H. Automatic measurement algorithm of scoliosis Cobb angle based on deep learning. J. Phys.
Conf. Ser. 2019, 1187, 042100. [CrossRef]
11. Okashi, O.; Du, H.; Al-Assam, H. Automatic spine curvature estimation from X-ray images of a mouse model. Comput. Methods
Programs Biomed. 2017, 140, 175–184. [CrossRef]
12. Alharbi, R.; Alshaye, M.; Alhanhal, M.; Alharbi, N.; Alzahrani, M.; Alrehaili, O. Deep learning based algorithm for automatic
scoliosis angle measurement. In Proceedings of the IEEE 3rd International Conference on Computer Applications & Information
Security (ICCAIS), Riyadh, Saudi Arabia, 19–21 March 2020.
13. Zhang, K.; Xu, N.; Yang, G.; Wu, J.; Fu, X. An automated Cobb angle estimation method using convolutional neural network
with area limitation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted
Intervention (MICCAI), Shenzhen, China, 13–17 October 2019.
J. Imaging 2023, 9, 265 21 of 21

14. Huang, C.; Tang, H.; Fan, W.; Cheung, K.; To, M.; Qian, Z.; Terzopoulos, D. Fully-automated analysis of scoliosis from spinal
X-ray images. In Proceedings of the IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester,
MN, USA, 28–30 July 2020.
15. Pasha, S.; Flynn, J. Data-driven classification of the 3d spinal curve in adolescent idiopathic scoliosis with applications in surgical
outcome prediction. Sci. Rep. 2018, 8, 16296. [CrossRef]
16. Moura, C.; Correia, M.; Barbosa, J.; Reis, A.; Laranjeira, M.; Gomes, E. Automatic Vertebra Detection in X-ray Images. In
Proceedings of the International Symposium CompImage, Coimbra, Portugal, 20–21 October 2016.
17. Mukherjee, J.; Kundu, R.; Chakrabarti, A. Variability of Cobb angle measurement from digital X-ray image based on different
de-noising techniques. Int. J. Biomed. Eng. Technol. 2014, 16, 113–134. [CrossRef]
18. Lecron, F.; Benjelloun, M.; Mahmoudi, S. Fully automatic vertebra detection in X-ray images based on multi-class SVM. In
Proceedings of the Medical Imaging, San Diego, CA, USA, 8–9 February 2012.
19. Maaliw, R.; Alon, A.; Lagman, A.; Garcia, M.; Susa, J.; Reyes, R.; Fernando-Raguro, M.; Hernandez, A. A multistage transfer
learning approach for acute lymphoblastic leukemia classification. In Proceedings of the IEEE 13th Annual Ubiquitous Computing,
Electronics & Mobile Communication Conference, New York, NY, USA, 26–29 October 2022.
20. Vijh, S.; Gaurav, P.; Pandey, H. Hybrid bio-inspired algorithm and convolutional neural network for automatic lung tumor
detection. Comput. Math. Methods Med. 2020, 35, 23711–23724. [CrossRef]
21. Peng, C.; Wu, M.; Liu, K. Multiple levels perceptual noise backed visual information fidelity for picture quality assessment.
In Proceedings of the IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS),
Penang, Malaysia, 22–25 November 2022.
22. Tsuneki, M. Deep Learning Models in Medical Image Analysis. J. Oral Biosci. 2022, 64, 312–320. [CrossRef]
23. Chakraborty, S.; Mali, K. An Overview of Biomedical Image Analysis from the Deep Learning Perspective; IGI Global: Hershey, PA, USA,
2023; pp. 43–59.
24. Abdou, M. Literature review: Efficient deep neural networks techniques for medical image analysis. Neural Comput. Appl. 2022,
34, 5791–5812. [CrossRef]
25. Varoquaux, G.; Cheplygina, V. Machine learning for medical imaging: Methodological failures and recommendations for the
future. NPJ Digit. Med. 2022, 5, 48. [CrossRef] [PubMed]
26. Aljabri, M.; AlGhamdi, M. A review on the use of deep learning for medical images segmentation. Neurocomputing 2022, 506,
311–335. [CrossRef]
27. Karpiel, I.; Zi˛ebiński, A.; Kluszczyński, M.; Feige, D. A survey of methods and technologies used for diagnosis of scoliosis. Sensors
2021, 21, 8410. [CrossRef] [PubMed]
28. Yin, X.; Sun, L.; Fu, Y.; Lu, R.; Zhang, Y. U-Net-Based Medical Image Segmentation. J. Healthc. Eng. 2022, 2022, 4189781. [CrossRef]
29. Arif, S.; Knapp, K.; Slabaugh, G. Shapeaware deep convolutional neural network for vertebrae segmentation. In Proceedings
of the International Workshop on Computational Methods and Clinical Applications in Musculoskeletal Imaging (MICCAI),
Quebec City, QC, Canada, 10 September 2017.
30. Zhang, J.; Li, H.; Lu, L.; Zhang, Y. Computer-aided Cobb measurement based on automatic detection of vertebral slope using
deep neural network. Int. J. Biomed. Imaging 2017, 2017, 9083916. [CrossRef]
31. Staritsyn, M.; Pogodaev, N.; Chertovshih, R.; Pereira, F. Feedback maximum principle for ensemble control of local continuity
equations: An application to supervised machine learning. IEEE Control Syst. Lett. 2021, 6, 1046–1051. [CrossRef]
32. Fan, W.; Ge, Z.; Wang, Y. Adaptive Weiner filter based on fast lifting wavelet transform for image enhancement. In Proceedings of
the 7th World Congress on Intelligent Control and Automation, Chongqing, China, 25–27 June 2008.
33. Horng, M.; Kuok, C.; Fu, M.; Lin, C.; Sun, Y. Cobb angle measurement of spine from X-ray images using convolutional neural
network. Comput. Math. Methods Med. 2019, 2019, 6357171. [CrossRef]
34. Prodan, M.; Vlasceanu, G.; Boiangiu, C. Comprehensive evaluation of metrics for image resemblance. J. Inf. Syst. Oper. Manag.
2023, 17, 161–185.
35. Ieremeiev, O.; Lukin, V.; Okarma, K.; Egiazarian, K. Full-Reference quality metric based on neural network to assess the visual
quality of remote sensing images. Remote Sens. 2020, 12, 2349. [CrossRef]
36. Aviles, J.; Medina, F.; Leon-Muñoz, V.; de Baranda, P.S.; Collazo-Diéguez, M.; Cabañero-Castillo, M.; Ponce-Garrido, A.B.;
Fuentes-Santos, V.E.; Santonja-Renedo, F.; González-Ballester, M.; et al. Validity and absolute reliability of the Cobb angle in
idiopathic scoliosis with TraumaMeter software. Int. J. Environ. Res. Public Health 2022, 19, 4655. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like