0% found this document useful (0 votes)
4 views13 pages

2022 Springer

Uploaded by

ultra light
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views13 pages

2022 Springer

Uploaded by

ultra light
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Neural Computing and Applications (2022) 34:18773–18785

https://doi.org/10.1007/s00521-022-07412-0 (0123456789().,-volV)(0123456789().
,- volV)

ORIGINAL ARTICLE

Attention mechanism-based deep learning method for hairline


fracture detection in hand X-rays
Wenkong Wang1 • Weijie Huang1 • Quanli Lu2 • Jiyang Chen2,3 • Menghua Zhang1 • Jia Qiao1 •

Yong Zhang1

Received: 13 January 2022 / Accepted: 9 May 2022 / Published online: 24 June 2022
Ó The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022

Abstract
Wrist and finger fractures detection is always the weak point of associate study, because there are small targets in X-rays,
such as hairline fractures. In this paper, a dataset, consisting of 4346 anteroposterior, lateral and oblique hand X-rays, is
built from many orthopedic cases. Specifically, it contains a lot of hairline fractures. An automatic preprocessing based on
generative adversative network (GAN) and a detection network, called WrisNet, are designed to improve the detection
performance of wrist and finger fractures. In the preprocessing, an attention mechanism-based GAN is proposed for
obtaining the approximation of manual windowing enhancement. A multiscale attention-module-based generator of the
GAN is proposed to increase continuity between pixels. The discriminator and the generator can achieve 93% structural
similarity (SSIM) as manual windowing enhancement without manual parameter adjustment. The designed WrisNet is
composed of two components: a feature extraction module and a detection module. A group convolution and a lightweight
but efficient triplet attention mechanism are elaborately embedded into the feature extraction module, resulting in richer
representations of hairline fractures. To obtain more accurate locating information in this condition, the soft non-maximum
suppression algorithm is employed as the post-processing method of the detection module. As shown in experimental
results, the designed method can have obvious average precision (AP) improvement up to 7% or more than other
mainstream frameworks. The automatic preprocessing and the detection net can greatly reduce the degree of artificial
intervention, so it is easy to be implemented in real clinical environment.

Keywords Attention mechanism  Generative adversative network  Soft non-maximum suppression  Hairline fractures

1 Introduction vision inspires many scholars to apply it to medical image


analysis. A classic example is fracture detection in X-rays
Deep learning [1] is one branch of artificial intelligence by using deep learning-based method. Current deep
that creates computer models to tackle some vision tasks learning-based object detection algorithms are mainly
including object detection [2], image classification, etc. divided into two branches: one-stage and two-stage [3].
The great success of deep learning in the field of computer The one-stage detector shows great real-time performance,
but the detection precision is lower than that of the two-
stage detector. In practice, the fracture detection has no
& Weijie Huang
[email protected] need for real time operating, so two-stage detector is
widely used in the diagnosis of X-rays [4].
& Yong Zhang
[email protected] The gray scale of X-rays is compressed in a small range
that is not conducive to differentiate crack features. So as
1
School of Electrical Engineering, University of Jinan, No. to make the skeletal structure more conspicuous, manual
336, West Road of Nanxinzhuang, Jinan 250022, Shandong, windowing enhancement is adopted as preprocessing to
China
2
deal with this problem, but each image needs to manually
Shandong Zhengzhong Information Technology Co., LTD, select the window level and window width [5]. Further-
Jinan 250014, Shandong, China
3
more, during wrist and finger fracture detection, hairline
Shandong University, Jinan 250061, Shandong, China

123
18774 Neural Computing and Applications (2022) 34:18773–18785

fractures in X-rays are difficult to detect with state-of-the- translation. In the aspect of image generation, the structure
art methods in that small object detection [6, 7] is a chal- information existing in the train dataset is used to generate
lenging task for deep learning-based ways. To solve the new medical images. GANs are often used to increase the
two problems mentioned above, an automatic preprocess- number of train datasets to foster the accuracy of classifi-
ing based on GAN [8] and a detection network, called cation tasks. A new generation method called generating
WrisNet, are designed in this paper. The main contribution adversarial Unet was developed by Chen et al. [9], which
can be concluded as follows. can realize the generation of various medical images to
alleviate the over fitting phenomenon in training. A method
(1) A dataset, consisting of 4346 anteroposterior, lateral
that using cycle-consistent adversarial networks to generate
and oblique hand X-rays, is built from many
COVID-19 samples was suggested by Morı́s et al. [10] to
orthopedic cases. It should be pointed that hairline
improve the accuracy of classification. The applicability of
fractures account for more than 50 percent of the
generating images by GAN in oncology was demonstrated
total targets in the dataset, way more compared to the
by Han et al. [11]. The image translation of medical images
published datasets.
mainly includes super-resolution reconstruction, image
(2) An attention mechanism-based GAN is proposed as
denoising and so on. The conditional generation adversarial
the preprocessing to expand the gray scale range.
network (CGAN) [12] was used as a denoising algorithm in
The goal of the proposed GAN is obtaining the
[13] for low dose chest images and the proposed method
approximation of manual windowing enhancement.
was proved that was superior to the traditional method. A
We design a novel generator, consisted of multiscale
new super-resolution generation countermeasure network
attention-module-based net to process the input
was proposed by Zhu et al. [14], which combines CGAN
image, respectively. The GAN can achieve 93%
and super-resolution generation adversarial network
SSIM of manual windowing enhancement without
(SRGAN) to generate super-resolution images. By
manual parameter adjustment and greatly reduce the
extracting useful information from different channels and
degree of artificial intervention.
paying more attention to meaningful pixels, a new con-
(3) In order to deal with hairline fractures of the dataset,
volutional neural network was proposed by Gu et al. [15]
a novel network, called WrisNet, is proposed to
for super-resolution in medical imaging. Jiang et al. [16]
improve the detection performance. A feature
proposed an improved loss function obtained by combining
extraction module and a detection module are
four loss functions, and this loss function achieved good
formed WristNet. In the feature extraction module,
results in the field of super-resolution CT image recon-
the ResNeXt with the triplet attention (TA) is
struction. In this paper, GAN is firstly used as medical
designed to extract the features while in the detection
image preprocessing to expand the gray scale range.
module, the soft non-maximum suppression (Soft-
Meanwhile, a multiscale attention-module-based generator
NMS) algorithm is used as the post-processing
is proposed to process the image. The result of GAN
mechanism to improve the omission of hairline
achieves 93% SSIM as manual windowing enhancement
fractures. The results show that the AP can achieve
without manual parameter adjustment.
7% or more improvement than the state-of-the-art
frameworks.
2.2 Fracture detection by deep learning-based
This paper is organized as follows. In Sect. 2, medical method
image preprocessing methods and deep learning-based
fracture detection methods are reviewed. The proposed Considering the accuracy of fracture classification and
preprocessing and WrisNet are detailed in Sect. 3. In fracture location, Guan et al. [17] proposed an improved
Sect. 4, several experimental results are illustrated to val- object detection algorithm for the detection of arm frac-
idate the improved detection performance. Finally, the tures and obtained a model with a high AP. Qi et al. [18]
conclusion is given in Sect. 5. trained an object detection model to locate femoral frac-
tures by using a framework based on Faster-RCNN [19]
and achieved a good result. In [20], a dilated convolutional
2 Previous work feature pyramid network was designed, which was applied
to thigh fracture detection. In [21], the deep learning
2.1 GAN in medical image processing method was employed to process the CT images of spine as
well as to locate spinal fracture. In [22], the top layer of the
GANs have great application potential in the field of original model was retrained by using inception v2 network
medical image processing. The main tasks they can solve [23] for leg bone fracture detection. Nonetheless, the above
can be divided into image generation and image methods could not be applied to the proposed dataset due to

123
Neural Computing and Applications (2022) 34:18773–18785 18775

poor hairline break detection performance. To better solve 3.2.1 Feature extraction module
the problem of detecting small targets, a feature extraction
module, called ResNeXt-TA, is proposed to make fracture The proposed feature extraction module is inspired by
features more prominent. In addition, Soft-NMS is Faster-RCNN, mainly composed of ResNeXt-TA and FPN
designed as the specialized post-processing of a detection [26].
module to improve the omission of hairline fractures. (1) ResNeXt-TA
ResNeXt-TA is a proposed backbone, composed of C1,
C2, C3, C4 and C5. A convolution layer, a batch normal-
3 Methodology
ization layer [27], a ReLU activation function [28] and a
maxpool layer are formed as C1.
An automatic preprocessing based on GAN and WrisNet is
C2, C3, C4 and C5 are designed with different number
proposed for X-ray diagnosis of wrist and finger fractures,
(3, 4, 23, and 3) of blocks. The structure of each block is
which are detailed in Sects. 3.1 and 3.2, respectively. The
inspired by ResNet-block [29], and the chart of one block is
original image is input into the GAN for gray stretch. The
described as Fig. 2. Each block is formed by a residual
output is operated into WrisNet for detecting fractures of
connection and a ReLU layer. The residual connection
X-rays.
contains the following components in order:
3.1 GAN-based preprocessing (a) a convolution layer,
(b) a batch normalization layer,
The X-ray gray value is compressed in a small range, (c) a ReLU layer,
which is not conducive to the identification of crack fea- (d) a group convolution [30],
tures. A very efficient way of gray stretch is manual win- (e) a batch normalization layer,
dowing enhancement but the window level and window (f) a ReLU layer,
width of each image are need to be manually set. In this (g) a convolution layer,
paper, a GAN is firstly proposed to expand the gray scale. (h) a batch normalization layer,
Inspired by pix2pix [24], a multiscale attention-module- (i) a TA module,
based generator and a discriminator are designed to form (j) a shortcut connection.
the GAN. The structure of the generator is shown in Fig. 1.
The group convolution: The input tensor is firstly
The architecture is modeled with encoding process and
divided into 64 groups in the channel dimension, then they
decoding process, which are corresponding to 8 down-
are convolved with 64 different convolution layers,
samplings and 8 up-samplings, respectively. A CBAM
respectively. Finally, the results of the convolution are
module [25] is embedded at each scale. 16 CBAM modules
concatenated on the channel dimension as the output of
and the encoding-decoding architecture are formed the
group convolution. When the depth and width of the net-
generator. The discriminator of pix2pix is directly trans-
work are increased to a certain extent, increasing the
planted to the proposed GAN. The designed generator can
number of groups can improve the performance of feature
greatly increase continuity between pixels of the generated
extraction module effectively.
image, compared with pix2pix, and the comparing results
TA module: The TA module is used from [31] and the
can be seen in Sect. 4. The gray scale of the output can be
detailed structure of TA module is shown in Fig. 3, which
controlled in a reasonable range, which can help the fol-
is composed of three different sub-branches. The TA
lowing WrisNet to detect hairline fractures better.
module can be expressed as Eq. (1):
 
3.2 WrisNet-based fracture detection MðFÞ ¼ AVG M0;1;2 ðFÞ þ M1;0;2 ðFÞ þ M1;2;0 ðFÞ ð1Þ

The formulas of the three branches are expressed as


The network diagram of WrisNet mainly consists of two
Eqs. (2), (3) and (4):
components and is shown in Fig. 2. The first component is 
the feature extraction module to extract the feature maps of M0;1;2 ðFÞ ¼ r f 77 ðZ  PoolðFÞÞ ð2Þ
the X-rays, which is detailed in Sect. 3.2.1. The second    
M1;0;2 ðFÞ ¼ P0;1;2 r f 77 Z  Pool P1;0;2 ðFÞ ð3Þ
component is the detection module, which can output the
   
exact location of the fractures by analyzing the feature M1;2;0 ðFÞ ¼ P0;1;2 r f 77 Z  Pool P1;2;0 ðFÞ ð4Þ
maps obtained in the first component and is detailed in
Sect. 3.2.2. where F is the input tensor with size C  H  W. P0;1;2 ðÞ,
P1;0;2 ðÞ, P1;2;0 ðÞ refer to the dimensional transformation
operations that convert the size of F to C  H  W, H 

123
18776 Neural Computing and Applications (2022) 34:18773–18785

Fig. 1 Network diagram of


proposed GAN

Fig. 2 Network diagram of WrisNet

123
Neural Computing and Applications (2022) 34:18773–18785 18777

Fig. 3 Block diagram of ResNeXt-TA and the three-branch structure of TA module

Fig. 4 Size distribution of ground truth boxes in train dataset Fig. 5 Size distribution of ground truth boxes in test dataset

C  W and H  W  C, respectively. f 77 ðÞ refers to the where MaxPoolðÞ and AvgPoolðÞ refer to the global
convolution operation with 7  7 kernel size. And rðÞ is maximum pooling and the global average pooling opera-
the sigmoid operation. Z-Pool(.) in the above formula can tions, respectively.
be expressed Eq. (5): The lightweight TA module is located after the third BN
Z  poolðFÞ ¼ ½MaxPoolðFÞ; AvgPoolðFÞ ð5Þ layer of each block without adding too many parameters.

123
18778 Neural Computing and Applications (2022) 34:18773–18785

Although few parameters does it contain, it could still help detection box has its own confidence. Soft-NMS reduces
each block effectively to understand what information the confidence of the possibly redundant detection boxes
should be laid more emphasis on in the X-ray. In addition, instead of removing them directly. First, the confidences in
spatial attention is combined with channel attention [32] so set S are sorted from high to low. The detected box bm with
that the module could learn the interdependencies between the highest confidence is added to the set M, which is
different dimensions and generate more meaningful rep- merged into D, and bm is removed from B. Then, the
resentations of wrist and finger fractures. remaining boxes in B are checked one by one, and their
(2) Multi-scale feature extraction for small targets confidence scores are reduced by the function
f ðiruðM; bi ÞÞ which are shown in Eq. (6). The progressive
According to the analysis of the statistical data (as
loops until all the boxes in B are put into D. Finally, the
shown in Figs. 4 and 5), we find that the size distribution of
boxes with confidence lower than the threshold in D are
ground truth boxes is scattered and there are a large number
considered as repeated fracture localization. Soft-NMS can
of hairline fractures. FPN is used in feature extraction
greatly improve the detection effect in the above-men-
module to prevent the features of small fractures from
tioned special case by this kind of scoring reduction
being lost during feature extraction. As shown in Fig. 2,
mechanism.

feature maps of different scales are extracted from the 


si ; iouðM; bi Þ\Nt
output of C2, C3, C4 and C5 in ResNeXt-TA. These fea- si ¼ ð6Þ
si ð1  iouðM; bi ÞÞ; iouðM; bi Þ  Nt
ture maps are fused from top to bottom to obtain more
meaningful feature maps. where Nt is the NMS threshold.

3.2.2 Detection module 4 Experiment

In the detection module, a large number of regular anchors 4.1 Dataset


are artificially preset in the RPN, and then the proposal
coordinates representing the foreground area are obtained 4346 X-rays of wrist and finger fractures including distal
through selection and regression. Then they are projected radius fractures, scaphoid fractures, phalanx fractures, and
onto the multi-scale feature maps generated in Sect. 3.2.1. other types are utilized in the experiment, which are col-
The feature matrixes are segmented on the feature maps lected from real medical environment in regular hospitals.
according to the corresponding proposals and flattened with The labels of ground truth boxes are completed by the
the ROI pooling layer. Next, the predicted location and the experienced radiologists using LabelImg over one month.
label information are obtained through the regression layer The annotations are stored as XML files with PSACAL
and Softmax layer, respectively. Finally, a post-processing VOC format. This dataset brings more challenges because
method Soft-NMS [33] is used to filter the redundant there are many X-rays with steel nails, plates, and plaster
output of the network. The execution process is defined in on the hand. The 4346 X-rays are randomly divided into a
the Algorithm 1. Set B contains N detected boxes and each train dataset and a test dataset with a ratio of 8 : 2, which is

123
Neural Computing and Applications (2022) 34:18773–18785 18779

Table 1 Data statistics


Maximum height Maximum width Number of targets Number of small targets Number of medium targets

512 (pixel) 512 (pixel) 1116 600 511

target [34], which accounted for more than 53:7% in the


test dataset. And targets with a size between 32  32 and
96  96 are considered as medium targets. The distribution
of targets in the train dataset and test dataset is shown in
Figs. 4 and 5.

4.2 Training details of GAN

4.2.1 Manual image preprocessing

The manual window technique is generally used to pre-


process the X-ray image. First, a certain range is selected,
where the maximum and the minimum are set as the
thresholds. The pixel value greater than max is set to 255,
Fig. 6 Changes of L1 loss in training GAN and the pixel value less than min is set to 0. Then, the pixel
values in the range are mapped to 0–255 using a linear
always guaranteed during the experiment. We make conversion. The formula for pixel value mapping is shown
statistics on the size of the X-rays and the number of targets in Eq. (7):
of different sizes in test dataset(see Table 1). In this paper, 255  ðPo  minÞ
a target with a size less than 32  32 is defined as a small P¼ ð7Þ
max  min

Fig. 7 The data augmentation process includes random flips, bright- relevant parameters are randomly selected within a certain range. a Is
ness transformations, affine transformations, and image sharpening, the original X-ray, while the data-augmented results are shown in (b–
designed to enhance the X-rays of the train dataset. The input X-rays j)
are randomly subjected to the above four transformations, and the

123
18780 Neural Computing and Applications (2022) 34:18773–18785

Table 2 Hyperparameters for


Learning rate Batch size Momentum (SGD) Weight decay Total epochs
training WrisNet
0.02 16 0.9 0.0001 23

Table 3 Similarity distribution of test dataset using pix2pix


Threshold of SSIM Number of images in test dataset

SSIM\90% 212
SSIM\80% 45
SSIM\50% 3

In the training process, so as to ensure that the auto-


matically preprocessed images are similar to the manually
preprocessed images, L1 loss is used in the loss function of
the generator to guide the generation of images. The
change of L1 loss during training indicates the process of
gradually approaching the pixel values of the automatically
preprocessed image and the manually preprocessed image,
as shown in Fig. 6.

4.3 Training details of WrisNet

4.3.1 Data augmentation

In the experiment, two data augmentation [36] strategies


are set to improve the performance. One strategy is that the
data are tripled by flipping the image in random directions.
And the other strategy is that the data are increased ten
times by using random flips, brightness transformations,
affine transformations, and image sharpening. Some
transformed images are shown in Fig. 7.
Fig. 8 Process of training. WrisNet loads the images and annotations
of the train dataset, updating the weight of the network in repeated 4.3.2 Training process of WrisNet
iterations

The pretrained weights on ImageNet [37] are using to


where Po is the original pixel value and P is the pixel value initialize the backbone. The model is trained end-to-end on
after linear conversion. four GPU NVIDIA GeForce RTX 3090. The hyperpa-
The X-rays with the manual window adjustment are rameters are shown in Table 2. The warm-up strategy is
used as the ground truths of GAN. used in the first 500 iterations. The SGD gradient descent is
adopted. Furthermore, the training process is shown in
4.2.2 Training process of GAN Fig. 8.

The GAN model is trained on a GPU NVIDIA GeForce 4.4 Results and analyses
RTX 3090. The settings of training are as follows. Adam
gradient descent algorithm [35] is adopted. The batch size 4.4.1 GAN-based preprocessing
is set to 1 and a total of 200 epochs are trained. The initial
learning rate is set to 0.0002, and a linear learning rate The proposed GAN is compared with Unet-based [38]
decay strategy is adopted at the 100th epoch. pix2pix in two different ways, which are described as
follows:

123
Neural Computing and Applications (2022) 34:18773–18785 18781

Table 4 SSIM comparison


pix2pix (%) The proposed GAN (%)
between pix2pix and the
proposed GAN Average of test dataset (SSIM\80%) 69.86 74.14
Maximum of test dataset (SSIM\80%) 79.99 95.53
Average of test dataset (SSIM\90%) 82.84 86.17
Maximum of test dataset (SSIM\90%) 89.99 99.12
Average of test dataset (SSIM\100%) 92.62 92.90
Average of test dataset (SSIM\100%) 99.53 99.59
Maximum improvement of single image 77.75 95.53

Fig. 9 Comparison of the generated images. a Is the original X-ray image. b is the ground truth. c, d Are generated by pix2pix and the proposed
GAN, respectively

Table 5 Effect of manual and generated images on object detection


Algorithm Manual image (AP%) Generated by Unet (AP%) Generated by proposed generator (AP%)

Faster R-CNN (ResNet50) 47.4 47.1 47.4


Faster R-CNN (ResNeXt101) 49.2 48.9 49.4

(1) SSIM value of the generated images is shown in Fig. 9. The result
shows that the image generated by the proposed GAN is
The distribution of SSIM value in test dataset using
more similar than pix2pix with ground truth (see circles in
pix2pix are shown in Table 3. The SSIM value which is
Fig. 9) and proves that the attention-module-based gener-
less than 90% can be great improved by using the proposed
ator can greatly improves the correlations between pixels.
GAN. The SSIM comparison between pix2pix and the
(2) AP value
proposed GAN is shown in Table 4, where the SSIM of
single image can be increased from 77.75 to 95.53%, which As shown in Table 5, the proposed generator ensures the
proves that the proposed GAN can obtain the approxima- consistency of the detection results between the generated
tion of manual windowing enhancement. The comparison images and the manual images, compared with Unet. The

123
18782 Neural Computing and Applications (2022) 34:18773–18785

Fig. 10 Our model has a good effect on the detection of phalanx, hand scaphoid and distal radius. The first to third results from the upper left
include the detection of phalangeal fractures. Others include the scaphoid and distal radius fractures

Fig. 11 Under the influence of plaster and steel nails, the model still has considerable detection effect. The first to third results from the upper left
include the detection with steel nails. Others include the detection with plasters

123
Neural Computing and Applications (2022) 34:18773–18785 18783

Table 6 Comparison of different frameworks frameworks use 3476 X-rays as the train dataset and 870
X-rays are set as the test dataset. The same image pre-
Algorithm Backbone AP (%)
processing method and the first data augmentation strategy
Faster R-CNN ResNet50 47.4 are used in this part. Furthermore, the pretrained weights
Faster R-CNN ResNeXt101 49.2 on ImageNet are used in all frameworks to initialize the
Cascade R-CNN [39] ResNet50 48.2 backbone network, and the hyperparameters are adjusted to
Cascade R-CNN ResNet101 48.4 achieve the best effect, to ensure the validity of the com-
Cascade R-CNN?DCN [40] ResNet101 48.3 parative experiment. AP is used as evaluation criteria of
WrisNet ResNeXt-TA 54.7 detection results, which is the most reliable and commonly
WrisNet (best effect) ResNeXt-TA 56.6 used evaluation criteria in current object detection field.
And APs of each framework are obtained when IOU is 0.5.
As shown in Table 6, our network achieves 54:7% AP,
detection effect of generated images even over the manual which have an improvement of at least 5:5% in AP over the
test dataset, due to the elimination of the influence of other frameworks. With the second data augmentation
subjective factors in preprocessing. strategy and Soft-NMS, the AP of WrisNet can reach to
56:6%:
4.4.2 Comparison of detection effect
4.4.3 Ablation experiment
3476 X-rays are using to train the WrisNet, and some
detection results of the test dataset are shown in Figs. 10 A simple ablation experiment is performed and the results
and 11. The green boxes in figures are the ground truth are shown in Table 7, where the significant improvement of
boxes marked by the doctors, and the blue boxes are our method is marked in bold. The impact of the proposed
detected by WrisNet. As shown in Fig. 10, WrisNet has data preprocessing, the proposed backbone network, the
excellent results in the fracture detection of phalanx, sca- data augmentation, and the proposed post-processing are
phoid, and distal radius, which is reflected in the large gradually tested. In ablation experiments, the results can
overlap area between the detection boxes and the corre- demonstrate that the proposed WrisNet have obvious AP
sponding ground truth boxes. At the same time, as shown improvement up to 8:6%. As shown in Table 8, the
in Fig. 11, the model can also perform well in complex improvement is mainly due to the enhancement of small
environments such as X-rays with nails or plaster. The target detection and WrisNet have obvious AP improve-
result can demonstrate that the effectiveness is very close ment of small targets up to 9:4%:
to the diagnosis of radiologists.
The detection effects of the representative object
detection frameworks are compared with WrisNet, and the
results are shown in Table 6, where the significant
improvement of our method is marked in bold. All the

Table 7 Ablation experiment


Data preprocessing Improved backbone Data augmentation (109) Soft-NMS AP (%)

    48.0
U    49.2
U U   53.7
U  U  53.3
U U U  54.0
U U U U 56.6

Table 8 Comparison of AP of
Algorithm AP% AP% (small targets) AP% (medium targets)
different size targets
Faster R-CNN(ResNeXt101) 48.0 27.8 67.6
WrisNet 56.6 37.2 73.4

123
18784 Neural Computing and Applications (2022) 34:18773–18785

5 Conclusion 6. Liu Y, Sun P, Wergeles N, Shang Y (2021) A survey and per-


formance evaluation of deep learning methods for small object
detection. Expert Syst Appl 172:114602. https://doi.org/10.1016/
In this paper, an automatic GAN-based preprocessing and j.eswa.2021.114602
WrisNet are proposed for X-ray diagnosis of wrist and 7. Lim JS, Astrid M, Yoon H J, Lee SI (2021) Small object
finger fractures. The results between the proposed GAN detection using context and attention. In: 2021 International
Conference on Artificial Intelligence in Information and Com-
and manual processing show high similarity for X-ray munication (ICAIIC). IEEE, pp 181–186. https://doi.org/10.1109/
enhancement, as a generator incorporating an attention ICAIIC51459.2021.9415217
mechanism is designed. 93% of SSIM indicates that man- 8. Singh NK, Raza K (2021) Medical image generation using gen-
ual window augmentation can be replaced by automatic erative adversarial networks: a review. Health Inform Comput
Perspect Healthc 932:77–96. https://doi.org/10.1007/978-981-15-
GAN-based preprocessing. The preprocessed images are 9735-0_5
fed into WrisNet for wrist and finger fracture detection. To 9. Chen X, Li Y, Yao L, Adeli E, Zhang Y (2021) Generative
better handle hairline fractures, ResNeXt-TA and Soft- adversarial U-Net for domain-free medical image augmentation.
NMS were used to improve the backbone and post-pro- arXiv preprint arXiv:2101.04793
10. Morı́s DI, de Moura RJJ, Buján JN, Hortas MO (2021) Data
cessing. ResNeXt-TA is constructed by using group con- augmentation approaches using cycle-consistent adversarial net-
volution and attention strategy to extract richer feature works for improving COVID-19 screening in portable chest
maps, while Soft-NMS is used to filter redundant bounding X-ray images. Expert Syst Appl 185:115681. https://doi.org/10.
boxes. The AP value of the proposed method improves by 1016/j.eswa.2021.115681
11. Han C (2021) Pathology-aware generative adversarial networks
7% compared to that of the current mainstream framework, for medical image augmentation. arXiv preprint arXiv:2106.
when the IOU threshold is 0.5. We believe that WrisNet 01915
performs better after being trained on a large number of 12. Mirza M, Osindero S (2014) Conditional generative adversarial
data and has the potential to help doctors in diagnosis. nets. arXiv preprint arXiv:1411.1784
13. Kim HJ, Lee D (2020) Image denoising with conditional gener-
Acknowledgements This work was supported in part by the Key ative adversarial networks (CGAN) in low dose chest images.
R&D Project of Shandong Province under Grant No. Nucl Instrum Methods Phys Res Sect A 954:161914. https://doi.
2022CXGC010503, the Youth Foundation of Shandong Province org/10.1016/j.nima.2019.02.041
under Grant No. ZR202102230323, the National Natural Science 14. Zhu Y, Zhou Z, Liao G, Yuan K (2020) Csrgan: medical image
Foundation for Young Scientists of China under Grant No. 61903155, super-resolution using a generative adversarial network. In: 2020
and the Doctoral Scientific Fund Project under Grant No. xbs1910. IEEE 17th international symposium on biomedical imaging
workshops (ISBI workshops). IEEE, pp 1–4. https://doi.org/10.
Funding Information Not applicable. 1109/ISBIWorkshops50223.2020.9153436
15. Gu Y, Zeng Z, Chen H, Wei J, Zhang Y, Chen B et al (2020)
MedSRGAN: medical images super-resolution using generative
Declarations adversarial networks. Multimed Tools Appl 79:21815–21840.
https://doi.org/10.1007/s11042-020-08980-w
Conflicts of interest The authors declare that they have no conflict of 16. Jiang X, Liu M, Zhao F, Liu X, Zhou H (2020) A novel super-
interest. resolution CT image reconstruction via semi-supervised genera-
tive adversarial network. Neural Comput Appl 32:14563–14578.
https://doi.org/10.1007/s00521-020-04905-8
17. Guan B, Zhang G, Yao J, Wang X, Wang M (2020) Arm fracture
References detection in X-rays based on improved deep convolutional neural
network. Comput Electr Eng 81:106530. https://doi.org/10.1016/
1. Abdou MA (2022) Literature review: efficient deep neural net- j.compeleceng.2019.106530
works techniques for medical image analysis. Neural Comput & 18. Qi Y, Zhao J, Shi Y, Zuo G, Zhang H et al (2020) Ground truth
Applic 34:5791–5812. https://doi.org/10.1007/s00521-022- annotated femoral X-ray image dataset and object detection based
06960-9 method for fracture types classification. IEEE Access
2. Zhao ZQ, Zheng P, Xu ST, Wu X (2019) Object detection with 8:189436–189444. https://doi.org/10.1109/ACCESS.2020.
deep learning: a review. IEEE Trans Neural Netw Learn Syst 3029039
30(11):3212–3232. https://doi.org/10.1109/TNNLS.2018. 19. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards
2876865 real-time object detection with region proposal networks. IEEE
3. Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X (2020) A Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/
review of object detection based on deep learning. Multimed 10.1109/TPAMI.2016.2577031
Tools Appl 79(33):23729–23791. https://doi.org/10.1007/s11042- 20. Guan B, Yao J, Zhang G, Wang X (2019) Thigh fracture detec-
020-08976-6 tion using deep learning method based on new dilated convolu-
4. Ren M, Paul HY (2022) Deep learning detection of subtle frac- tional feature pyramid network. Pattern Recognit Lett
tures using staged algorithms to mimic radiologist search pattern. 125:521–526. https://doi.org/10.1016/j.patrec.2019.06.015
Skeletal Radiol 51:345–353. https://doi.org/10.1007/s00256-021- 21. Sha G, Wu J, Yu B (2020) Detection of spinal fracture lesions
03739-2 based on improved Yolov2. In: 2020 IEEE International Con-
5. Mourya GK, Gogoi M, Talbar SN, Dutande PV, Baid U (2021) ference on Artificial Intelligence and Computer Applications
Cascaded dilated deep residual network for volumetric liver (ICAICA). IEEE, pp 235–238. https://doi.org/10.1109/
segmentation from CT image. Int J E-Health Med Commun ICAICA50127.2020.9182582
12(1):34–45. https://doi.org/10.4018/IJEHMC.2021010103

123
Neural Computing and Applications (2022) 34:18773–18785 18785

22. Abbas W, Adnan SM, Javid MA, Majeed F, Ahsan T, Hassan SS the IEEE/CVF Winter Conference on Applications of Computer
(2020) Lower leg bone fracture detection and classification using Vision, pp 3139–3148. https://doi.org/10.1109/WACV48630.
faster RCNN for X-rays images. In: 2020 IEEE 23rd International 2021.00318
Multitopic Conference (INMIC). IEEE, pp 1–6. https://doi.org/ 32. Niu Z, Zhong G, Yu H (2021) A review on the attention mech-
10.1109/INMIC50486.2020.9318052 anism of deep learning. Neurocomputing 452:48–62. https://doi.
23. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) org/10.1016/j.neucom.2021.03.091
Rethinking the inception architecture for computer vision. In: 33. Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-NMS—
Proceedings of the IEEE conference on computer vision and improving object detection with one line of code. In: Proceedings
pattern recognition (CVPR), pp 2818–2826. https://doi.org/10. of the IEEE international conference on computer vision (ICCV),
1109/cvpr.2016.308 pp 5561–5569. https://doi.org/10.1109/iccv.2017.593
24. Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image 34. Lin TY, Maire M, Belongie S, Hays J, Perona P et al (2014)
translation with conditional adversarial networks. In: Proceedings Microsoft coco: common objects in context. In: European con-
of the IEEE conference on computer vision and pattern recog- ference on computer vision. Springer, Cham, pp 740–755. https://
nition (CVPR), pp 1125–1134. https://doi.org/10.1109/cvpr.2017. doi.org/10.1007/978-3-319-10602-1_48
632 35. Kingma DP, Ba J (2014) Adam: a method for stochastic opti-
25. Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional mization. arXiv preprint arXiv:1412.6980
block attention module. In: Proceedings of the European con- 36. Chlap P, Min H, Vandenberg N, Dowling J, Holloway L,
ference on computer vision (ECCV), pp 3–19. https://doi.org/10. Haworth A (2021) A review of medical image data augmentation
1007/978-3-030-01234-2_1 techniques for deep learning applications. J Med Imaging Radiat
26. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S Oncol 65:545–563. https://doi.org/10.1111/1754-9485.13261
(2017) Feature pyramid networks for object detection. In: Pro- 37. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Ima-
ceedings of the IEEE conference on computer vision and pattern genet: a large-scale hierarchical image database. In: 2009 IEEE
recognition (CVPR), pp 2117–2125. https://doi.org/10.1109/cvpr. conference on computer vision and pattern recognition (CVPR).
2017.106 IEEE, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
27. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep 38. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional
network training by reducing internal covariate shift. In: Inter- networks for biomedical image segmentation. In: International
national conference on machine learning (ICML). PMLR, conference on medical image computing and computer-assisted
pp 448–456 intervention. Springer, Cham, pp 234–241. https://doi.org/10.
28. Nair V, Hinton GE (2010) Rectified linear units improve 1007/978-3-319-24574-4_28
restricted Boltzmann machines. In: International conference on 39. Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high
machine learning (ICML), pp 807–814 quality object detection. In: Proceedings of the IEEE conference
29. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for on computer vision and pattern recognition (CVPR),
image recognition. In: Proceedings of the IEEE conference on pp 6154–6162. https://doi.org/10.1109/cvpr.2018.00644
computer vision and pattern recognition (CVPR), pp 770–778. 40. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017)
https://doi.org/10.1109/cvpr.2016.90 Deformable convolutional networks. In: Proceedings of the IEEE
30. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated international conference on computer vision (ICCV),
residual transformations for deep neural networks. In: Proceed- pp 764–773. https://doi.org/10.1109/ICCV.2017.89
ings of the IEEE conference on computer vision and pattern
recognition (CVPR), pp 1492–1500. https://doi.org/10.1109/cvpr. Publisher’s Note Springer Nature remains neutral with regard to
2017.634 jurisdictional claims in published maps and institutional affiliations.
31. Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to
attend: convolutional triplet attention module. In: Proceedings of

123

You might also like