JMC19013 36137
JMC19013 36137
Deep learning technologies have become a leading tool for disease diagnosis supporting and timely treatment. Many
approaches have been developed for the detection and classification of fractures in human bones. These approaches vary
from several parameters producing different detection and classification traits. Therefore, fracture detection, along with
recognizing its category, is helpful for radiologists and doctors to analyze and handle fracture cases effectively. In this
study, we employed a Faster R-CNN transfer-learning technique. The model was retrained using 50 X-ray images of
tibia-fibula bone fractures. For the evaluation of this study, we used parameters such as Kappa coefficient and the mean
average precision. The overall accuracy of this proposed method has been 97%. It is comprehensively inspected and
correlated with the earlier work on bone fractures, concerning training, detection, classification, and efficiency. The
proposed work proved to have a good impact on accurate classification and detection of fractures. Moreover, here we
analyzed six bone fracture classes: transverse, spiral, oblique, linear, comminuted, and normal. The best configuration
tends to show that this study is tremendously correct and economical with high accuracy. This work shows that the
proposed approach is an effective and useful technique for the dynamic detection, classification, and analysis of various
types of fractures. Furthermore, this approach improved the results, the run time performance, and detection quality
compared with the state-of-the-art techniques used in this area.
1. INTRODUCTION
Machine learning is an emerging field of CAD (computer-aided diagnosis) that boasts high acceptance in the health-
care industry and provides an impressive way to help in disease diagnosis and its treatment. In most developed
countries, bone fracture is a common problem and is rapidly increasing. There are a total of 206 bones in a human
body; Tibia/fibula bone fractures more frequently happen compared to other bone fractures, because of the thin pe-
riosteum lining, the front part of the tibia-fibula covered by the skin, which is directly located under the skin fracture,
as discussed in Myint et al. (2018). The bone fracture occurs due to accidents or other bone diseases such as trauma,
stressing either a specific medical condition that weakens the bones for more details see Muchtar et al. (2018). The
overall incidence of tibia-fibula fractures reported in the Swedish Fracture Registeris (SFR) is 51.7 per 100,000, in a
year as discussed in Wennergren et al. (2018). A bone fracture is complete or partial; complete bone fracture happens
when the parts of the bone are wholly separated from each other, while an incomplete bone fracture extends partly
across the bone. It occurs to the wrist, ankle, hip, rib, tibia-fibula, chest, and knee.
At present, several state-of-the-art machines are available to generate digital images of living organism/organs;
these are ultrasound, X-ray computed tomography (CT), and magnetic resonance imaging (MRI). Still, X-ray tech-
nology is the mast common and oldest. It is also economical with a painless detection method. It is used to make X-ray
images of bones in the human body. Bone fractures are classified into different classes such as transverse, oblique, spi-
ral, comminuted, avulsed, segmented, impacted, torus, and greenstick, for more detail see Al-Ayyoub and Al-Zghool
(2013). Diagnosis and treatment of the bone fracture are important as the wrong diagnoses lead to dissatisfaction of
the patient and tend to have dangerous consequences, which tarnish the reputation of a diagnostic institution as well.
Moreover, manual analysis of medical images is error-prone and time-consuming. Therefore, computer vision plays
a tremendous role in automatically providing the visual semantics of medical images. Consequently, it has helped to
analyze a large amount of medical data with high accuracy of diagnosis with minimum time and effort.
Recently, machine learning techniques have improved medical diagnosis; for example, the tedious task of screen-
ing for identical findings is no longer performed by doctors and radiologists. Hence, time is saved that allows them to
interact with more patients. Past studies used hand engineering and static techniques to identify fractures as discussed
by Badgeley et al. (2019). However, this is inefficient because they require several preprocessing steps such as noise
removal, segmentation, the region growing, region merging, edge detection, feature extraction, and threshold setting;
more detail is available in Castro-Gutierrez et al. (2019). Moreover, these are static approaches that are weak in
locating the exact region of interest (bounding box) of bone fractures.
In this study, we employed a Faster R-CNN transfer-learning technique and configured it with its default param-
eters. In addition, we retrained it on 50 X-ray images of tibia-fibula bone fractures. Moreover, here we analyzed six
bone fracture classes: transverse, spiral, oblique, linear, comminuted, and normal. For the evaluation of this study,
we used mAP, kappa coefficient as evaluation parameters for defining accuracy. The best results tend to show that
this study is tremendously precise. The results of this research show that this is an efficient technique, which is bet-
ter for dynamic detection, and classification of fracture classes. Furthermore, this approach improves the run-time
performance and detection quality, even with a limited dataset as compared to earlier techniques.
2. LITERATURE REVIEW
Various artificial intelligence techniques are being used for automatic features detection and its analysis in different
fields of life from plant disease detection to COVID-19 detection from X-ray images. In one study, Tanzi et al.
(2020) artificial intelligence techniques, were used to analyze bone fractures. This technique used scale invariant
feature transform (SIFT) and Haar wavelet transforms. The SIFT method is used to detect feature points such as
compression, rotation, and scaling, while Haar wavelet transforms is used to save memory space. The study used a
dataset with a such as the total of 100 X-ray images. The model was trained with only 30 X-ray images and tested
with 70 X-ray images. The model is evaluated using state-of-the art evaluation parameters area under the curve,
sensitivity, specificity, and accuracy. The study claims an average accuracy of 94.30%. In another study, Jin et al.
(2020) developed a customized 3D UNet architecture named the FracNet model to analyze rib fractures. This model
is composed of encoder-decoder, 3D convolution, bath-normalization, nonlinearity, and max pooling. This model was
trained using the RibFrac dataset, where 420 images were used for training, and 120 images for testing. This method
achieved a detection sensitivity of 92.9% and a segmentation Dice of 71.5% on the test cohort.
Zhou et al. (2020) presented a technique for automatic fracture detection and classification of rib bones based
on Faster R-CNN. This technique was used to achieve three goals which are the robustness of the model, the frac-
ture detection and classification, and an efficient mechanism. The study claims that the Faster R-CNN performed
better than YOLO V3 in detection accuracy and detection speed. This study achieved a high precision of 91.1% and
sensitivity of 86.3%. In another study, Hržić et al. (2019) proposed a fracture detection and classification method
using X-ray images. This method used local entropy to remove noise from X-ray images. The local Shannon entropy
was computed for each pixel of the image by using a sliding 2D window. First, image segmentation was performed
on the original image, then a graph theory technique was applied to images for removing negative bone contours
and enhancing the edge detection. Finally, the difference of the extracted and estimated contour was calculated for
detection and classification of the fracture. The study reports an overall accuracy of 86.22% with a 91.16% detection
rate.
In another study, Yang et al. (2019) proposed two line-based fracture detection schemes including standard
line-based detection and adaptive differential parameter optimized (ADPO) line-based fracture detection using X-ray
images. It differentiates the fracture line from a nonfracture line using extracted features of recognized patterns, and
the fractures are classified with the artificial neural network (ANN). The ADPO-based fracture detection technique
performed better than standard line-based fracture detection with an average accuracy of 72.89%. In another study
Castro-Gutierrez et al. (2019) proposed local binary pattern (LBP) based feature extractor and SVM for acetabulum
fracture detection and classification. This approach deals with low-resolution images in a better way by improving
image quality in the preprocessing phase. The study claims an overall accuracy of 80%.
Kim and MacKinnon (2018) aimed to identify the extent to which deep learning models pretrained on nonmedi-
cal images can be used for fracture detection and classification. In this technique the top layer of the Inception version
3 network was retrained using wrist radiographs and used for classification of fractures. This model was fully trained
using 11,112 X-ray images for eightfold and achieved an accuracy of 88%. In another study, Dimililer (2017) de-
veloped a classification system that can detect and classify the bone fractures. The system comprises two principal
stages, namely, preprocessing of images using image enhancement techniques and a classification phase using a neu-
ral network. The system was tested on bone fracture images and claimed a high classification rate. In another similar
study, Al-Ayyoub and Al-Zghool (2013) developed a system which can detect bone fracture and fracture type. The
developed system extracts features after preprocessing of the X-ray images, and then they used different classification
algorithms to detect the existence of a fracture along with its type. The shared results show that the proposed system
is accurate and efficient.
In summary, computer aided fracture detection and classification is an active field of research. However, an
efficient, accurate, and cost-effective diagnosis is still a challenge which is also subject to the qualification of a
radiologist. It is obvious from the literature that previous works lack in accuracy and reliability. Moreover, there
is no systematic approach to localize the fracture in an efficient manner, unlike the proposed technique. The main
contributions of this research are as follows:
• We employed the Faster RCNN deep transfer learning technique for fast, accurate, and automatic detection and
classification of fracture.
• Analyzed six classes of tibia-fibula bone fractures.
• The method dynamically detects the fractures.
• Experimental analysis proved that the proposed technique delivers better results as compared to the state-of
the-art existing techniques.
network is less time-consuming as compared to other networks and has a high detection efficiency. The first layer
of the proposed model is the input layer, which accepted the X-ray image of the dimension P × Q and automati-
cally converted it into M × N. The X-ray image is cropped to a specific size and fed to the convolution layer. In the
convolutional layer, a kernel with a specific size convolves on a received image. The proposed model has a default
configuration. The convolutional layer minimizes the size of the input X-ray image, which reduces the effort and
computation cost of the model. The produced size of the convolution layer is shown in Eq. (1).
(i − k) + 1
outputsize = , (1)
s
where i is the input dimension, k is the kernel dimension, and s is the stride dimension.
The pooling layer reduces the X-ray image size for further input to the next layer; it works as down-sampling.
Max pooling was used in the whole model for the best results. The pooling layer output is shown in Eq. (2).
(i − p) + 1
outputsize = , (2)
s
where i is the input dimension, p is the pooling dimension, and s is the stride size. ReLu is used as a activation
function; its gradient is zero or one, and it ignores the negative values, providing much faster computation for proposed
work as compared to other activation functions. The ReLu activation function is shown in Eq. (3).
or not. Also, the regression-bounding box locates where the fracture is present. Localization information is provided
by the position of a sliding window. More excellent localization information for the sliding window is supplied by
box regression. The RPN layer is used to locate the region of interest by creating a bounding box. RPN is a total
convolution network that is used to predict the target area from each input X-ray image and shows the score on it.
The real target value was represented by probability, and to create a high-value region of interest box, the RPN was
the end-to-end training for classification and detection.
The convolutional features region of interest was used as the input to the ROI pooling layer. It made a bounding
box around the fracture with its class name (type). It only reduced the error of the feature map such as average pooling
and max pooling. It was also used to split the feature into the equatorial region; after, that, we used max-pool for an
entire area; hence, the input size did not max-pool the output of max pooling. The classifier layer is the final layer.
It is used to classify the input X-ray image with its class name by generating a bounding box around it. There exist
two output layers; i.e., the Soft-max activation layer is used for an object classification of fracture types and the linear
activation function for bounding boxes coordinates regression.
The deep, fully convolutional network (DFCN) is used for detection and segmentation. The DFCN is an ordinary
CNN, but the main difference is that the last fully connected layer is replaced with another convolution layer. A box
is generated around the fracture that plays a vital role in the identification and classification of input X-ray images.
Multiple bounding boxes are used at each position. The position of the bounding box is 320 × 320 of an image size of
600 × 800. There are three scales of the box: 128 × 128, 256 × 256, and 512 × 512 represented by three colors with
a ratio of 1:1, 1:2, and 2:1. Each bounding box has its features of prediction with a single-scale feature and multiscale
feature.
The labels of the bounding box are positive and negative. It depends upon the value of IoU (intersection over
union). In the case that IoU is more than 0.7 then its detection is positive for ground truth; on the other hand, if IoU is
less than 0.3, the label is assigned as unfavorable to all ground-truth boxes. A clear set of boxes increased speed and
accuracy.
1 X T P (x)
mAP = x ∈ classes . (7)
|classes| T P (x) + F P (x)
T P (x) is true positive: probability is a predicted bounding box for class x, tab is a ground truth bounding box
of class x, and IoU is an intersection over union which is (pred-bb, gt-bb) ≥ 0.5. F P (x) is false positive: a bounding
box for class x prediction is pred-bb, there is no ground truth for class x where gt-bb and IoU is an intersection over
union which is (pred-bb, gt-bb) ≤ 0.5. For a class x to compute IoU between any predicted box and ground box take
the best overlap. The highest overlap is between the box of truth predicted and any ground.
4. RESULTS
4.1 Training
In this stage, the top layer of Faster R-CNN is retrained using inception v2 (Version 2) networks. The training is
continued until loss reaches 0.0005%. We used stochastic gradient descent (SGD) to train the proposed method to
minimize convolution layer filters, proposal region weights, and fully connected layer weights. Stochastic gradient
descent (SGD) performs an update of the parameters for each example of training x(i) and label y(i).
β = β − η.∇βk (β : xi , y i ), (8)
where β is the learned rate and η is the new learning rate.
rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving
toward a minimum of loss function.
ηn = η0 df loor([1+n]/r) . (9)
In the above equation, ηn is the learning, at iteration η, η0 ; d is the learning rate, and r is the drop rate. Here the
floor function is used to convert all values into zeros, which are less than 1.
In the above equation, Ijobj = 1 when an object is present in cell j; otherwise 0. ẑj (b) = 1 denotes the probability
of class b in the cell j.
2 2
S X
B S X
B
obj obj √ √
X X p p
2 2
λcoord Ijk [(yj − ŷj ) + (zj − ẑj ) ] + λcoord Ijk [( gj − ĝj )2 + ( mj − m̂j )2 ]. (11)
j=0 k=0 j=0 k=0
In the above equation, roman on-line a boundary box kth is located in a cell j for detecting the object, else 0. λcoord
represents loss increasing in the boundary box coordinates.
The RPN loss of the model describes that an object is detected or not to each bounding box. When we train the RPN,
we give a class label to each class. A box that has an IoU measure higher than 0.7, considers a positive detection of
fracture, and if the IoU value is less than 0.3 it considers a negative detection of the fracture. If the box is neither
negative nor positive, do not consider it the training objective. The first part of Eq. (12) shows the loss of the clas-
sification over two classes, i.e., if is an object or not an object. The second part of the equation shows the bounding
boxes, regression loss if an object is detected. RPN loss as shown in Fig. 5.
1 X 1 X ∗
L(qi , ri ) = (Lcls qi ,qi∗ ) + λ q Lreg (ri , ri∗ ). (12)
Zcls i Zreg i i
In the above equation, i is used to represent the box as an index. Moreover, qi shows the prediction probability of
the box i being an object. If there is an object, the value of ground-truth is given by qi∗ being equal to 1 else 0 (zero).
ri is used to show the detected box coordinates and ri∗ is used to show the ground-truth box value, which is associated
with the positive box. Lcls is used to show classification loss, which is log loss two classes (object vs. not object).
Lcls (ri , ri∗ ) = R(ri , ri∗ ) is used to show the regression loss, where loss function (smooth L1) is, and represents the
regression loss. The regression loss is given by qi∗ Lreg when a positive box qi∗ = 1 otherwise qi∗ = 0. The cls and
reg layers contain upon the output qi∗ , ri∗ . The Zcls and Zreg are used to represent the normalized and equal weights.
The model is trained in a such way that if an object is detected with a correct bounding box, then it has a perfect IoU
score, otherwise a smaller score.
2
S X
B
obj
X
λobj Ijk [(bj − b̂j )2 . (13)
j=0 k=0
obj
In the above equation, b̂j is used for the object in the cell. The core of a box k in a cell j, Ijk = 1. If the jth
boundary box is in cell j, then an object is detected in the bounding box, otherwise not. Objectness loss is given by
Eq. (13) and is shown in Fig. 6.
2
S X
B
noobj
X
λnoobj Ijk [(bj − b̂j )2 . (14)
j=0 k=0
noobj obj
Ijk is the complement of Ijk , the confidence score of the box k in the cell j. λnoobj when only background
is detected, but the object is not detected. The weight loss of the factor λnoobj is default 0.5.
5. DISCUSSION
This study considers the tibia-fibula bone fracture problem for detection, classification, and analysis. We employed a
Faster R-CNN deep learning technique with transferlearning for tibia-fibula bone fracture analysis. We evaluated the
proposed method with the Kappa coefficient as shown in Table 1. Overall accuracy has been found to be 98% and the
Kappa coefficient is computed as 97% as shown in Table 1. The mean average precision (mAP) also obtained good
accuracy by using the values in Tables 2 and 3.
The high accuracy of the proposed technique of fracture detection and its analysis provide’s a higher efficiency
than earlier methods for detection, classification, and analysis of different fracture types as discussed in Muchtar et al.
(2018), Castro-Gutierrez et al. (2019), and Yang et al. (2019). Before this, the proposed technique was pre-trained
upon a nonmedical dataset. Now, it is used for medical purposes and we employed this proposed model of bone
fracture X-ray images. Earlier proposed techniques for bone fracture analysis have been static, but for the first time,
we introduced a dynamic method. Moreover, the proposed technique also detects fractures from live streaming-like
video. The proposed method provides better results than earlier ones, as shown in Figs. 7–12.
The proposed technique shows excellent performance in detection, classification, and analysis of the fracture
with its different classes. The findings show that the tibia-fibula fracture study has enhanced interpretation, which is
helpful for radiologists to identify and examine tibia-fibula patients. An effective approach comes from our proposed
study. The small size datasets, which are not publicly available, can be used for training. The overall performance
of the proposed method is excellent from every aspect. During the training, the learning rate remains constant, as
shown in Fig. 2, but losses in the start of training are much higher, but with passes the time loss decreases as shown
in Figs. 3–6. The model is fully trained in 40k steps as shown in Fig. 7 and loss remains only 0.0005%. Similarly, in
the Hržić et al. (2019) study using the segmentation technique, and local entropy and for fracture classification (not
based on deep learning) reported an accuracy of 91.16% for fracture classification in the X-ray image of a child’s
(a) (b)
FIG. 7: Oblique displaced fracture
(a) (b)
FIG. 8: Comminuted fracture
(a) (b)
FIG. 9: Oblique nondisplaced fracture
ulna and radius bones. This study detects the fracture manually, but the proposed technique automatically classify, as
shown in Fig. 11, this demonstrates transfer learning, which was earlier trained upon the ImageNet dataset, which is
nonmedical and we trained it upon the medical images using the Inception v2 network.
Furthermore, this study minimizes the radiologist’s involvement in detecting, classifying, and analyzing fractures.
This level of accuracy is beneficial for the radiologist in workflow optimization and minimization of error, which
happened in manual detection and classification analysis. It improves the diagnosis and productivity of the radiologist
by saving time. This study reduces the patient’s loss, which he faced for reporting errors or delays. This research is
(a) (b)
FIG. 10: Linear fracture
(a) (b)
FIG. 11: Multiple fracture
primarily for the detection of fracture, where the detecting feature is missing rather than misdiagnosed. The proposed
technique provides detection, classification, and analysis of fracture types accurately. In these comparative studies,
the proposed technique is robust in the performance of fracture analysis under various conditions of the X-ray images
which were comparable to those of the traditional methods.
This technique successfully detects the fracture area of the bone and makes it under the rectangle within the input
image. Although there are many techniques to identify the fracture area of bone using image processing and machine
learning, these are not enough. The proposed technique detect six classes of fractures. Also, we compare our model
accuracy with the other state-of-the-art techniques as shown in Table 4. To summarize, the proposed technique pre-
sented in this paper is improved from the earlier work because we employed Faster R-CNN and retrained the top layer
of it, and operated end-to-end on the deep neural network. Therefore, in distinction to the earlier techniques, the pro-
posed technique presents potential benefits, such as efficiency, high accuracy, consistent interpretation, instantaneous
reporting results, reproducibility, and an accurate analysis tool in this domain.
(a) (b)
FIG. 12: Normal
6. CONCLUSIONS
This study presents the Faster R-CNN-based method for automatic identification of tibia-fibula bone fractures along
with classification of its different fracture types (classes). This study analyzed six different classes of fractures, i.e.,
transverse, oblique, spiral, segmented, comminuted, and normal. We employed a deep transfer learning technique,
which can detect fractures with the bounding box, and its type. This method eliminates preprocessing and reduces
the training complexity of detection and classification. The proposed method was found accurate with an average
accuracy of 97%. Future work of this study is to localize the dimensions of the fracture, and to find the applicability
of this study on other long bones such as legs and arms.
REFERENCES
Al-Ayyoub, M. and Al-Zghool, D., Determining the Type of Long Bone Fractures in X-Ray Images, WSEAS Trans. Inf. Sci. Appl.,
vol. 10, no. 8, pp. 261–270, 2013.
Badgeley, M.A., Zech, J.R., Oakden-Rayner, L., Glicksberg, B.S., Liu, M., Gale, W., McConnell, M.V., Percha, B., Snyder, T.M.,
and Dudley, J.T., Deep Learning Predicts Hip Fracture Using Confounding Patient and Healthcare Variables, NPJ Dig. Med.,
vol. 2, no. 1, pp. 1–10, 2019.
Castro-Gutierrez, E., Estacio-Cerquin, L., Gallegos-Guillen, J., and Obando, J.D., Detection of Acetabulum Fractures Using X-
ray Imaging and Processing Methods Focused on Noisy Images, in Proc. of 2019 Amity International Conference on Artificial
Intelligence (AICAI), IEEE, pp. 296–302, 2019.
Dimililer, K., IBFDS: Intelligent Bone Fracture Detection System, Procedia Comput. Sci., vol. 20, pp. 260–267, 2017.
Hržić, F., Štajduhar, I., Tschauner, S., Sorantin, E., and Lerga, J., Local-Entropy Based Approach for X-Ray Image Segmentation
and Fracture Detection, Entropy, vol. 21, no. 4, p. 338, 2019.
Jin, L., Yang, J., Kuang, K., Ni, B., Gao, Y., Sun, Y., Gao, P., Ma, W., Tan, M., and Kang, H., Deep-Learning-Assisted Detection
and Segmentation of Rib Fractures from CT Scans: Development and Validation of Fracnet, EBioMedicine, vol. 62, p. 103106,
2020.
Kim, D. and MacKinnon, T., Artificial Intelligence in Fracture Detection: Transfer Learning from Deep Convolutional Neural
Networks, Clin. Radiol., vol. 73, no. 5, pp. 439–445, 2018.
Muchtar, M., Simanjuntak, S., Rahmat, R., Mawengkang, H., Zarlis, M., Sitompul, O., Winanto, I., Andayani, U., Syahputra,
M., and Siregar, I., Identification of Tibia and Fibula Bone Fracture Location Using Scanline Algorithm, J. Phys.: Conf. Ser.,
vol. 978, no. 1, p. 012043, 2018.
Myint, W.W., Tun, K.S., and Tun, H.M., Analysis on Leg Bone Fracture Detection and Classification Using X-Ray Images, Mach.
Learn. Res., vol. 3, no. 3, pp. 49–59, 2018.
Ranganathan, P., The Effects of Land Use Change on Carnivore Use of Wildlife Dispersal Routes in Ranthambhore Tiger Reserve,
India, PhD, Duke University, Durham, NC, 2017.
Tanzi, L., Vezzetti, E., Moreno, R., and Moos, S., X-Ray Bone Fracture Classification Using Deep Learning: A Baseline for
Designing a Reliable Approach, Appl. Sci., vol. 10, no. 4, p. 1507, 2020.
Tomita, N., Cheung, Y.Y., and Hassanpour, S., Deep Neural Networks for Automatic Detection of Osteoporotic Vertebral Fractures
on Ct Scans, Comput. Biol. Med., vol. 98, pp. 8–15, 2018.
Wennergren, D., Bergdahl, C., Ekelund, J., Juto, H., Sundfeldt, M., and Möller, M., Epidemiology and Incidence of Tibia Fractures
in the Swedish Fracture Register, Injury, vol. 49, no. 11, pp. 2068–2074, 2018.
Yang, A.Y., Cheng, L., Shimaponda-Nawa, M., and Zhu, H.Y., Long-Bone Fracture Detection Using Artificial Neural Networks
Based on Line Features of X-Ray Images, in Proc. of 2019 IEEE Symposium Series on Computational Intelligence (SSCI),
IEEE, pp. 2595–2602, 2019.
Zhou, Q.Q., Wang, J., Tang, W., Hu, Z.C., Xia, Z.Y., Li, X.S., Zhang, R., Yin, X., Zhang, B., and Zhang, H., Automatic Detection
and Classification of Rib Fractures on Thoracic CT Using Convolutional Neural Network: Accuracy and Feasibility, Kor. J.
Radiol., vol. 21, no. 7, pp. 869–879, 2020.