0% found this document useful (0 votes)
85 views12 pages

Goyal 2018

1) This article proposes deep learning methods for real-time detection and localization of diabetic foot ulcers on mobile devices. 2) The authors collected a dataset of 1775 foot images with ground truth ulcer locations annotated by experts. 3) Using this dataset, a Faster R-CNN model with InceptionV2 achieved 91.8% average precision for ulcer localization, processing images in 48 ms on a mobile device. 4) The deep learning model demonstrates the potential for accurate and fast real-time ulcer localization, which could enable improved diabetic foot screening.

Uploaded by

ahmad asadullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views12 pages

Goyal 2018

1) This article proposes deep learning methods for real-time detection and localization of diabetic foot ulcers on mobile devices. 2) The authors collected a dataset of 1775 foot images with ground truth ulcer locations annotated by experts. 3) Using this dataset, a Faster R-CNN model with InceptionV2 achieved 91.8% average precision for ulcer localization, processing images in 48 ms on a mobile device. 4) The deep learning model demonstrates the potential for accurate and fast real-time ulcer localization, which could enable improved diabetic foot screening.

Uploaded by

ahmad asadullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
1

Robust Methods for Real-time Diabetic Foot Ulcer


Detection and Localization on Mobile Devices
Manu Goyal, Student Member, IEEE, Neil D. Reeves, Satyan Rajbhandari, Moi Hoon Yap, Member, IEEE

Abstract—Current practice for Diabetic Foot Ulcers (DFU) [5] and human motion analysis [6]. The computer vision and
screening involves detection and localization by podiatrists. deep learning algorithms are extensively used for the analysis
Existing automated solutions either focus on segmentation or of medical imaging of various modalities such as MRI, CT
classification. In this work, we design deep learning methods for
real-time DFU localization. To produce a robust deep learning scan, X-ray, dermoscopy, and ultrasound [7]. Recently, com-
model, we collected an extensive database of 1775 images of DFU. puter vision algorithms are extended to assess different types
Two medical experts produced the ground truths of this dataset of skin condition such as skin cancer and DFU [8], [9].
by outlining the region of interest of DFU with an annotator From a computer vision and medical imaging perspective,
software. Using 5-fold cross-validation, overall, Faster R-CNN
with InceptionV2 model using two-tier transfer learning achieved there are three common tasks can be performed for the
a mean average precision of 91.8%, the speed of 48 ms for detection of abnormalities on medical images, which are 1)
inferencing a single image and with a model size of 57.2 MB. Classification 2) Localization 3) Segmentation. These tasks
To demonstrate the robustness and practicality of our solution on DFU are illustrated by Fig. 1. Various researchers have
to real-time prediction, we evaluated the performance of the made contributions related to computer vision methods for the
models on a NVIDIA Jetson TX2 and a smartphone app. This
work demonstrates the capability of deep learning in real-time detection of DFU. We divided these contributions into four
localization of DFU, which can be further improved with a more categories:
extensive dataset.
1) Algorithms development based on basic image process-
Index Terms—Diabetic foot ulcers, deep learning, convolutional ing and traditional machine learning techniques
neural networks, DFU localization, real-time localization. 2) Algorithms development based on deep learning tech-
niques
I. I NTRODUCTION 3) Research based on different modalities of images
4) Smartphone applications for DFU
D IABETIC Foot Ulcers (DFU) that affect the lower ex-
tremities is a major complication of Diabetes. Accord-
ing to the global prevalence data of International Diabetes
Several studies suggested computer vision methods based on
basic image processing approaches and supervised traditional
Federation in 2015, annually, DFU develop in 9.1 million to machine learning for the detection of DFU/wound. Mainly,
26.1 million people with diabetes worldwide [1]. It has been these studies have performed the segmentation task by extract-
estimated that patients with diabetes have a lifetime risk of ing texture descriptors and color descriptors on small patches
15% to 25% in developing DFU with nearly contributing to of wound/DFU images, followed by traditional machine learn-
85% of the lower limb amputation due to infected and non- ing algorithms to classify them into normal and abnormal
healing DFU [2], [3]. In a more recent study, when additional skin patches [11], [12], [13], [14]. In conventional machine
data is considered, the risk is suggested to be in-between 19% learning, the hand-crafted features are usually affected by
to 34% [4]. skin shades, illumination, and image resolution. Also, these
Due to the proliferation of Information Communication techniques struggled to segment the irregular contour of the
Technology, the intelligent automated telemedicine systems ulcers or wounds. On the other hand, the unsupervised ap-
are often tipped as one of the most cost-effective solutions proaches rely upon image processing techniques, edge detec-
for remote detection and prevention of DFU. Telemedicine tion, morphological operations and clustering algorithms using
systems along with current healthcare services can integrate different color space to segment the wounds from images [15],
with each other to provide more cost-effective, efficient and [16], [17]. Wang et al. [18] used an image capture box to
quality treatment for DFU. In recent years, there has been capture image data and determined the area of DFU using
a rapid development in computer vision, especially towards cascaded two-stage SVM-based classification. They proposed
the difficult and vital issues of understanding images from the use of superpixel technique for segmentation and extracted
different domains such as spectral, medical, object detection the number of features to perform two-stage classification.
Although this system reported promising results, it has not
M. Goyal and M. H. Yap are with the School of Computing, Mathematics been validated on a more substantial dataset. In addition,
and Digital Technology, Manchester Metropolitan University, John Dalton
Building, M1 5GD, Manchester, UK. (e-mail: [email protected]) the image capture box is very impractical for data collection
N.D. Reeves is with the Musculoskeletal Science & Sports Medicine as there is a need for the patient’s barefoot to be placed
Research Centre, School of Healthcare Science, Faculty of Science & En- directly in contact with the screen of image capture box.
gineering, Manchester Metropolitan University, John Dalton Building, M1
5GD, Manchester, UK In healthcare, such setting would not be allowed due to the
S. Rajbhandari is at Lancashire Teaching Hospital, PR2 9HT, Preston, UK. concerns regarding infection control.

2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
2

Classification Localization Segmentation


Fig. 1. Examples of three common tasks for abnormalities inspection on a DFU image. (a) Classification, (b) Localization and (c) Segmentation of DFU
(Green) and Surrounding Skin (Red) [10].

The majority of these methods involve manually tuning


of the parameters according to different input images and
multi-stage processing which make them hard to implement in
clinical settings. These state-of-the-art methods were validated
on relatively small datasets, ranging from 10 to 172 images.
Current state-of-the-art methods based on basic image process-
ing and traditional machine learning techniques are not robust,
due to their nature of reliance on specific regulators and rules,
with certain assumptions.
In contrast to traditional machine learning, deep learning
methods do not require such intense assumptions and have
demonstrated superiority in object localization and segmenta-
tion of DFU, which suggests that the robust fully automated
detection of DFU may be achieved, by adopting such approach
[10], [9], [19]. In the field of deep learning, several researchers
made contributions on the classification and segmentation
of DFU. Goyal et al. [9] proposed a new deep learning
framework called DFUNet which classified the skin lesions
of the foot region into two classes, i.e. normal skin (healthy
skin) and abnormal skin (DFU). In addition, they used deep
learning methods for the semantic segmentation of DFU and
Fig. 2. Illustration of high-resolution full feet images of our DFU dataset.
its surrounding skin with a limited dataset of 600 images [10].
Wang et al. [19] proposed a new deep learning architecture
based on encoder-decoder to perform wound segmentation and Yap et al. [25], [26] developed an app called FootSnap, which
analysis to measure the healing progress of wound. To date, is used to produce the standardized dataset of the DFU images.
this paper is the first attempt to develop deep learning methods This application used basic image processing techniques such
for the DFU localization task. as edge detection to provide the ghost images of the foot
Then, in a separate study from computer vision techniques, which is useful to monitor the progress of DFU. Since this
Van et al. [20] proposed the detection of DFU using a different was designed to standardizing image capture conditions, it
modality called infra-red thermal imaging. They found that did not perform any automated detection function. Recently,
there is a significant temperature difference between the DFU Brown et al. [27] developed a smartphone application called
and the surrounding healthy skin of the foot. Hence, they MyFootCare, which provides useful guidance to the DFU
used this considerable temperature difference on a heat-map to patients as well as keep the record of foot images. In this
detect the DFU. Liu et al. presented a preliminary case study application, the end-users need to crop the patch of the
to evaluate the effectiveness of infra-red dermal thermography captured image, and with basic color clustering algorithms,
on diabetic feet soles to identify pre-signs of ulceration [21]. it can produce DFU segmentation. But, previous research [10]
Harding et al. [22] performed a study to assess the infra- has already shown that the basic clustering algorithms are not
red imaging for the prevention of secondary osteomyelitis. robust enough to provide accurate DFU segmentation on full
Similarly, infra-red thermography has been used in various foot images.
studies to detect the complications related to the DFU [23], The major challenges of DFU localization task are as
[24]. follow: 1) Expensive in data collection and expert labelling
Health applications on the smartphone are fast becoming on the DFU dataset; 2) High inter-class similarity between
popular in monitoring essential aspects of the human body. the DFU lesions and intraclass variation depending upon the

2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
3

Fig. 4. Comparison of Size of DFU against the size of image

Fig. 3. Example of delineating ground truth on DFU dataset using Brett et


al. annotation tool [28]. plane of an ulcer. The use of flash as the primary light source
was avoided, and instead, adequate room lights are used to get
classification of DFU [29]; and 3) Lighting conditions and the consistent colors in images. The sample foot images in the
patient’s ethnicity. In this work, we provide a large-scale dataset are shown in the Fig. 2. To test the specificity measure
annotated DFU dataset and propose an end-to-end mobile for the algorithms, we have included 105 healthy foot images
solution for DFU localisation. The key contributions of this in the DFU dataset from the FootSnap application [26].
paper include: In this dataset, the size of images varies between
1600×1200 and 3648×2736. We resized all the images to
1) We present one of the largest DFU dataset, which
640×640 to improve the performance and reduce the com-
consists of 1775 images with annotated bounding box
putational costs. We used Brett et al. [28] annotation tool
indicating the ground truth of DFU location. To date, the
for producing the ground truths in the form of bounding box
largest dataset we encountered is of 600 DFU images,
as shown in Fig. 3. The ground truth was produced by two
where it was used for the semantic segmentation of DFU
healthcare professionals (a podiatrist and a consultant physi-
and its surrounding skin [10].
cian with specialization in the diabetic foot) specialized in
2) We propose the use of convolutional neural networks
diabetic wounds and ulcers. When there was disagreement, the
(CNNs) to localize DFU in real-time with two-tier
final decision were made by mutually settled with the consent
transfer learning. To our best knowledge, this is the first
of both. In the DFU dataset, there is only one bounding box
time CNNs are used for this task. Since our main focus is
in approximately 90% images, two bounding boxes in 7% and
on mobile devices, we emphasize on light-weight object
finally, more than two bounding boxes in the remaining 3%
localization models.
images of the whole dataset. The medical experts delineated
3) Finally, we demonstrate the application of our proposed
a total of 2080 DFUs (some images with more than one
methods on two types of mobile devices: Nvidia Jetson
ulcer) using an annotator software. As shown in the Fig. 4,
TX2 and an android mobile application.
approximately 88% DFU have the size less than 10% of the
actual size of an image. The size varied considerably across
II. M ETHODOLOGY the DFUs in the dataset.
This section describes the preparation of the dataset and
expert labeling of the DFU on foot images. The description B. Conventional Methods for DFU Localization
of CNNs for DFU localization is detailed. Finally, the perfor-
mance metrics used for validation are reported. In this section, we assessed the performance of conventional
methods for the localization of DFU. For traditional machine
learning, we delineated 2028 normal skin patches and 2080
A. DFU Dataset abnormal skin patches for feature extraction and training of
We received the NHS Research Ethics Committee approval classifier using 5-fold cross-validation [9]. We also used data-
with REC reference number 15/NW/0539 to use the foot augmentation techniques such as flipping, rotation, random
images of DFU for our research. Foot images with DFU were crop, color channels to make a total of 28392 normal and
collected from the Lancashire Teaching Hospitals over the past 29120 abnormal patches. 80% of the image data is used to
few years. All the participants signed the consent to use these train the classifier and remaining 20% of the data is used as test
images for research purposes. A DFU dataset has a total of images. Since these two classes of skin (normal and abnormal)
1775 foot images with DFU. There were three cameras mainly have significant textural differences amongst them, we investi-
used for capturing the foot images, Kodak DX4530, Nikon gated various feature extraction techniques including low-level
D3300 and Nikon COOLPIX P100. Whenever possible, the features such as edge detection, corner detection [30], texture
images were acquired with close-ups of the full foot with the descriptors such as Local Binary Patterns (LBP) [31], Gabor
distance of around 30-40 cm with the parallel orientation to the filter [32], Histogram of Oriented Gradients (HOG) [33], shape

2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
4

Fig. 5. Stage 1: The feature map extracted by CNN that acts as backbone
for object localization network. Conv refers convolutional layer.

Fig. 6. Stage 2: Detected proposal boxes with translate/scale operation to fit


the object. There can be several proposals on a single object.
based descriptors such as hough transform [34] and color
descriptors such as Normalized RGB, HSV, and L*u*v features
[35]. With exhaustive feature selection technique, we settled 2) Generation of proposals and refinement: In Stage 2, the
with LBP, HOG, color descriptors to extract features from skin network scans the image in a sliding-window fashion and finds
patches of both normal and abnormal classes. For a single specific areas that contain the objects using the feature map
patch, 209 features were extracted with above mentioned extracted in Stage 1. These areas are known as proposals which
feature extraction techniques. After the feature extraction from have different boxes distributed over the image. In general,
images, we used Quadratic support vector machine [36] as a around 200,000 proposals of different sizes and aspect ratios
classifier for the classification task. Then, to perform DFU are found to cover as many objects as possible in the image.
localization task with multiple scales, we used the sliding With GPU, Faster-RCNN produces these much anchors in
window approach to mask each box if the corresponding patch 10ms [39]. Stage 2 generates two outputs for each proposal:
is detected as ulcer by trained classifier. • Proposal Class: It can be either foreground or back-
This technique has achieved a good score in evaluation ground. The foreground class means there is likely an
metrics, 70.3% in Mean Average Precision. The conventional object in that proposal and it is also known as a positive
machine learning methods require a lot of intermediate steps proposal.
like pre-processing of images, extracting hand-crafted features • Proposal Refinement: A positive proposal might not be
and multiple stages to get the final results which makes them perfectly captured the object. So the network estimates a
very slow. Whereas, deep learning provides the faster end- delta (% change in x, y, width, height) for refinement of
to-end models on various computing platforms which simply the proposal box to center the object better as illustrated
take images as input and provide the final localization results in Fig. 6.
as output.
3) RoI Classifier and Bounding Box Regressor: Stage 3
consists of the classification of RoI boxes provided by Stage
C. Deep Learning Methods for DFU Localization 2 and further refinement of the RoI boxes as shown in the Fig.
CNNs proved their superiority compared to the conventional 7. First, all RoI boxes are fed into the RoI pooling layer to
machine learning techniques in image recognition tasks such resize them into fixed input size for classifier as RoI boxes
as ImageNet [37] and MS-COCO challenges [38]. They are can have different sizes. Similar to Stage 2, it generates two
very capable of classifying the images into different classes outputs for each RoI:
of objects from both non-medical and medical imaging by • RoI Class: The softmax layer provides the classification
extracting the hierarchies of features. One of the important of regions to specific classes (if more than one class). If
tasks in computer vision is object localization where algo- the RoI is classified as background class, it is discarded.
rithms need to localize and identify the multiple objects in an • Bbox Refinement: Its purpose is to refine the location of
image. Mainly, object localization networks consist of three RoI boxes.
stages as described in the following subsections. We considered three types of object localization networks to
1) CNN as feature extractor: In Stage 1, the standard CNN perform on the DFU dataset. First is Faster R-CNN [39], which
such as MobileNet, InceptionV2, the convolutional layers is a successor of Fast R-CNN [40] for object localization
extract the features from input images as feature maps. These in terms of speed. It consists of all three stages of object
feature maps are used to identify the objects in the image localization network as shown in the Fig. 8. It has two-
with particular attention focused on DFU regions as shown stage loss function whereas first stage loss function that
in the Fig. 5. These feature maps serve as input for the later consists of the parameters such as space, scale and aspect
stages such as generation of proposals in the second stage and ratio of the proposals. Then, second stage loss function re-
classification and regression of RoI in the third stage. runs the crops of proposal produced by the second stage with

2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
5

Fig. 7. Illustration of Stage 3: The classification and further box refinement of RoI boxes from the second stage proposal with softmax and Bbox regression.
Where FC refers to Fully-connected layer

Fig. 10. The architecture of Single Shot Multibox Detector (SSD). It considers
only two stage by eliminating the last stage to produce faster box proposals.

Fig. 8. Faster R-CNN Architecture for DFU localization which consists of


all three stages discussed earlier. localization networks, which makes it more suitable for the
mobile platforms.
There are six popular state-of-the-art object localization
models which are based on these three region based detector
meta-architectures i.e. Single Shot multibox detector [42],
R-FCN [41] and Faster R-CNN [39]. These three meta-
architectures used the state-of-the-art classification algorithms
like MobileNet [43], InceptionV2 [44], ResNet101 [45],
Inception-ResNetV2 [46] to get the anchor boxes from the
features maps, and finally, classify these anchors to different
classes. Table I summarises the size of models, speed (infer-
ence per image), and accuracy (mAP) trained on MS-COCO
dataset with 90 classes [47], [38].
Fig. 9. R-FCN Architecture which considers only the feature map from the Since our work is limited by the hardware on mobile de-
last convolutional layer which speeds up the three stage network vices and real-time prediction, we only considered lightweight
models (very small, low latency) in terms of size of the
model and inference speed. We used the first three models
feature extractor to produce more accurate box proposals for (SSD-MobileNet, SSD-InceptionV2 and Faster R-CNN with
classification. InceptionV2) for the DFU dataset as illustrated in Table I.
Dai et al. [41] proposed the Region-based Fully Convolu- These small models are specifically chosen to match the
tional Networks (R-FCN) to produce faster box proposals by resource restrictions (latency, size) on mobile devices for this
considering the crops only from the last layer of features with application. To evaluate the performance of DFU localization
comparable accuracy as Faster R-CNN which crop features using heavy model, we also include R-FCN with ResNet101
from the same layer where region proposals are predicted as to our experiment.
shown in the Fig. 9. Due to cropping limited only to the last Inception-V2 is a new iteration of the original inception
layer, it minimizes the time to get the box refinement. architecture called GoogleNet with new features such as
Single Shot Multibox Detector (SSD) [42] is a new archi- factorization of bigger convolution kernels to multiple smaller
tecture for the object localization which uses a single stage convolution kernels and improved normalization. For the first
CNN to predict classes directly and anchor offsets without the time, this network used depth-wise separable convolutions
need of second stage proposal generator unlike Faster R-CNN to reduce the computations in the first few layers. They
[39] and R-FCN [41] as shown in the Fig. 10. The SSD meta- also introduced batch normalization layer which can decrease
architecture produces anchors much faster than other object internal covariate shift, also combat the gradient vanishing

2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
6

TABLE I
P ERFORMANCE OF STATE - OF - THE - ART OBJECT LOCALIZATION MODELS ON MS-COCO DATASET. [38]

Model Name Speed (ms) Size of Model (MB) COCO mAP

SSD-MobileNet 30 29.2 21

SSD-InceptionV2 42 102.2 24

Faster R-CNN with InceptionV2 57.2 58 28

R-FCN with ResNet101 92 218.3 30

Faster R-CNN with ResNet101 106 196.9 32

Faster R-CNN with Inception-ResnetV2 620 247.5 37

problem to improve the convergence during training [44]. 1.5 million images with 1000 classes [37]. In the second tier,
MobileNet is a recent lightweight CNN which uses depth- we used full transfer learning to transfer the features from a
wise separable convolutions to build small, low latency models model trained on object localization dataset called MS-COCO
with a reasonable amount of accuracy that matches the limited that consists of more than 80000 images with 90 classes [38].
resource on mobile devices. The basic block of depth-wise Hence, we used the two-tier transfer learning technique to
separable convolution consists of depth-wise convolution and produce the pre-trained model for all frameworks in our DFU
pointwise convolution. The 3 × 3 depth-wise convolution is localization task.
used to apply a single filter per each input channel whereas
pointwise convolution is just simple 1 × 1 convolution used E. Performance Measures of Deep Learning Methods
to create the linear combination of the depth-wise convolution
We used four performance metrics i.e. Speed, Size of the
output. Also, it uses both batchnorm layers as well as RELU
model, mean average precision (mAP), and Overlap Percent-
layers after both layers [43].
age. The Speed determines the time model takes to perform
ResNet101 is one of the residual learning networks which
inference on single image whereas Size of the model is the
won the first place on ILSVRC 2015 classification task [45].
total size of the frozen model that is used for the inference
As suggested by the name, ResNet101 is a very deep network
of test images. These are crucial factors for the real-time
consists of 101 layers which is about 5 times much deeper than
prediction on mobile platforms. The mAP has an ”overlap
VGG nets but still having lower complexity. The core idea
criterion” of intersection-over-union greater than 0.5. The mAP
of ResNet is providing shortcut connection between layers,
is an important performance metric extensively used for the
which make it safe to train very deep network to gain maximal
evaluation of the object localization task. The prediction by
representation power without worrying about the degradation
model to be considered a correct detection, the area of overlap
problem, i.e., learning difficulties introduced by deep layers.
Ao between the bounding box of prediction Bp and bounding
box of ground truth Bg must exceed 0.5 (50%) [50]. The last
D. The Transfer Learning Approach evaluation metric is called Overlap Percentage, which is mean
average of intersection over union for all correct detection.
CNNs requires a considerable dataset to learn the features
to get the positive results for detection of objects in images area(Bp ∩ Bg )
[5]. It is vital to use transfer learning from massive datasets Ao = (1)
area(Bp ∪ Bg )
in non-medical backgrounds such as ImageNet and MS-
COCO dataset to converge the weights associated with each
convolutional layers of network [48], [49], [10] for training III. E XPERIMENT AND R ESULT
the limited dataset. The main reason for using two-tier transfer As mentioned previously, we used the deep learning models
learning in this work is because, the medical imaging datasets based on three meta-architectures for the DFU localization
are very limited. Hence, when CNNs are trained from scratch task. Tensorflow object detection API [47] provides an open
on these datasets, they do not produce useful results. There source framework which makes very convenient to design
are two types of transfer learning i.e. partial transfer learning and build various object localization models. The experiments
in which only the features from few convolutional layers were carried out on the DFU dataset and evaluated with
are transferred and full transfer learning in which features 5-fold cross-validation technique. First, we randomly split
are transferred from all the layers of previous pre-trained the whole dataset into 5 testing sets (20% each) for 5-fold
models. We used both types of transfer learning known as cross validation. This is to ensure that the whole dataset was
two-tier transfer learning [10]. In the first tier, we used partial evaluated on testing sets. For each testing set (20%), the
transfer learning by transferring the features only from the remaining images was randomly split into 70% for training
convolutional layers trained on most significant classification set and 10% validation set. Hence, for each fold, we divided
challenge dataset called ImageNet which consists of more than the whole dataset of 1775 images into approximately 1242

2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
7

TABLE II
P ERFORMANCE MEASURES OF OBJECT LOCALIZATION MODELS ON DFU DATASET

Model Name Speed (ms) Size of Model (MB) Ulcer mAP Overlap Percentage (%)

SSD-MobileNet 28 22.6 84.9 89.4

SSD-InceptionV2 37 53.5 87.2 92.6

Faster R-CNN with InceptionV2 48 52.2 91.8 95.8

R-FCN with Resnet 101 90 199.1 90.6 96.1

images in training set, 178 in validation set and 355 in testing truncated normal distribution with standard deviation of 0.01,
set. This was repeated for 5-fold to ensure the whole dataset batch_norm with decay of 0.9997 and epsilon of 0.001. For
was included in testing set. training, we used a batch size of 2, optimizer as momentum
a) Configuration of GPU Machine for Experiments: (1) with manual step learning rate with an initial rate as 0.0002,
Hardware: CPU - Intel i7-6700 @ 4.00Ghz, GPU - NVIDIA 0.00002 at epoch 40 and 0.000002 at epoch 60. The momen-
TITAN X 12GB, RAM - 32GB DDR4 (2) Software: Tensor- tum optimizer value is set at 0.9. For training RFCN, we used
flow [47]. same hyper-parameters as Faster-RCNN with only change in
We tested four state-of-the-art deep convolutional networks the learning rate set as 0.0005. For data augmentation, we used
for our proposed object localization task as described in only random horizontal flip for these two meta-architectures.
Section III B. We train the models with input-size of 640x640 In Table II, we report the performance evaluation of object
using stochastic gradient descent with different learning rate localization networks for DFU dataset on 5-fold cross valida-
on Nvidia GeForce GTX TITAN X card. We initialised the tion. Overall, all the models achieved promising localization
network with pre-trained weights using transfer learning rather results with high confidence on DFU dataset. Few instances of
than randomly initialized weights for the better convergence accurate localization by all trained models are demonstrated by
of the network. We tested the multiple learning rates by the Fig. 11. SSD-MobileNet ranked first in the Size of Model
decreasing the original learning rates with the 10 and 100 and Average Speed performance index. This is mainly due
times as well as multiplication factor from 1 to 5 to check the to the simpler architecture to generate anchor boxes in SSD
overall minimal validation loss. For example, if the original [42]. Whereas in Ulcer mAP and Overlap Percentage, R-FCN
Inception-V2 learning rate was set at 0.001. Then, for training with ResNet101 and Faster R-CNN with InceptionV2 were
on DFU dataset, we used 10 learning rates of 0.0001, 0.0002, almost equally competitive in these performance measures.
0.0003, 0.0004, 0.0005, 0.00001, 0.00002, 0.00003, 0.00004, In Ulcer mAP, Faster R-CNN with InceptionV2 ranked first
0.00005. with overall mAP of 91.8%, just slightly better than R-
We used 100 epochs for training of each reported model, FCN with ResNet101 with mAP of 90.6%. But, in Overlap
which we found are sufficient to train the DFU dataset as Percentage, R-FCN-Resnet101 achieved a score of 96.1%,
both training and validation loss finally converge to optimal which was slightly better than Faster R-CNN with Inception.
lowest. We selected the models on the basis of minimum SSD-InceptionV2 ranked third in both of these performance
validation losses for the evaluation. We tried different hyper- measure categories with difference of 4.6% in Ulcer mAP and
parameters such as learning rate, number of steps and data 3.5% in Overlap Percentage from the first position. In perfor-
augmentation options for each model to minimize both training mance measures, overall Faster R-CNN with InceptionV2 was
and validation losses. In next section, we report the different the best performer, and the most lightweight SSD-MobileNet
network hyper-parameters and configurations for each model emerged as the worst performer in terms of accuracy. Finally,
used for evaluation on the DFU dataset. we tested models on the dataset of 105 healthy foot images
We set the appropriate hyper-parameters on the basis of for specificity measure. None of the above-mentioned models
meta-architecture to train the models on DFU dataset. For produce any DFU localization on these healthy images.
SSD, we used two CNNs, MobileNet and Inception-V2 (both
of them use depth-wise separable convolutions), we set the
weight for l2_regularizer as 0.00004, initializer that A. Inaccurate DFU Localization Cases
generates a truncated normal distribution with standard de- In this work, we explored different object localization meta-
viation of 0.03 and mean of 0.0, batch_norm with decay of architectures to localize DFU on full foot images. Although
0.9997 and epsilon of 0.001. For training, we used a batch size the performance of all models is quite accurate as shown in
of 24, optimizer as RMS_Prop with a learning rate of 0.004 the Fig. 11, this section explores inaccurate localization cases
and decay factor of 0.95. The momentum optimizer value is by trained models on DFU dataset in 5-fold cross-validation
set at 0.9 with a decay of 0.9 and epsilon of 0.1. We also as shown in the Fig. 12. We found that trained models were
used two types of data augmentation as random horizontal struggled to localize the DFU of very small size and that has
flip and random crop. For Faster-RCNN, we set the weight the similar skin tone of the foot especially, SSD-MobileNet
for l2_regularizer as 0.0, initializer that generates a and SSD-InceptionV2. There are cases of DFU that have very

2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
8

GT SSD-MobNet SSD-IncV2 FRCNN-IncV2 RFCN-Res101

Fig. 11. The accurate localization results to visually compare the performance of object localization networks on DFU dataset. Where SSD-MobNet is
SSD-MobileNet, SSD-IncV2 is SSD-InceptionV2, FRCNN-IncV2 is Faster R-CNN with InceptionV2, and RFCN-Res101 is R-FCN with ResNet101.

subtle features, not even, most accurate models such as Faster- a) Configuration of Jetson TX2 for Inference: (1)
RCNN with InceptionV2 and R-FCN with ResNet101 were Hardware: CPU - dual-core NVIDIA Denver2 + quad-core
able to detect these conditions. ARM Cortex-A57, GPU - 256-core Pascal GPU, RAM - 8GB
LPDDR4 (2) Software: Ubuntu Linux 16.04 & Tensor-flow.
IV. I NFERENCE OF T RAINED M ODELS ON NVIDIA
J ETSON TX2 D EVELOPER K IT
We did not find any difference in the prediction of the
Nvidia Jetson TX2 is the latest mobile computer hardware models on Jetson TX2 hardware and the GPU machine; the
with an onboard 5-megapixel camera and a GPU card for only let-off is the slow inference speed on the Jetson TX2. It
the remote deep learning applications as shown in the Fig. is obviously due to limited hardware compared to the GPU
13. However, it is not capable of training large deep learning machine. For example, the speed of SSD-MobileNet was 70
models. We installed tensor-flow specifically designed for this ms per inference on Jetson TX2 as compared to 30 ms on
hardware to produce inference from the DFU localization GPU machine. Also, for real-time localization, models can
models that we trained on the GPU machine. Jetson TX2 is a produce the visualization of maximum 5 fps using the on-
very compact and portable device that can be used in various board camera with lightweight model. Fig 14 demonstrates
remote locations. the inference using Jetson TX2.

2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
9

GT SSD-MobNet SSD-IncV2 FRCNN-IncV2 RFCN-Res101

Fig. 12. Incorrect localization results to visually compare the performance of object localization networks on DFU dataset. Where SSD-MobNet is SSD-
MobileNet, SSD-IncV2 is SSD-InceptionV2, FRCNN-IncV2 is Faster R-CNN with InceptionV2, and RFCN-Res101 is R-FCN with ResNet101.

V. R EAL - TIME DFU LOCALIZATION WITH SMARTPHONE


APPLICATION
Training and inference of the deep learning frameworks
on smartphone are challenging tasks due to limited resources
of a smartphone. Hence, we trained these object localization
frameworks on the desktop with a GPU card. We utilized the
whole dataset of 1775 DFU images for further experiments by
randomly splitting 90% data in the training set and remaining
10% in the validation set. We trained only Faster R-CNN
with InceptionV2 on this dataset because of the best trade-off
between the accuracy and the speed. With android studio and
tensor-flow deep learning mobile library, we deployed these
models on Samsung A5 2017 (Android Phone) to create the
real-time object localization for DFU. As mentioned in the
previous section, we finalized Faster R-CNN with InceptionV2 Fig. 13. Nvidia Jetson TX2.
model for the prototype android application.

2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
10

improve human performance. Developing the remote, com-


puterized and innovative DFU diagnosis system according to
the medical classification systems and exactness accomplished
by the podiatrist, it demands a significant amount of research.
To assist podiatrist, foot analysis with computerized methods
in the near future, the following issues need to be addressed.
1) The detection of DFU on foot images with computer-
ized methods is a difficult task due to high inter-class
similarities and intra-class variations in terms of color,
size, shape, texture and site amongst different classes of
DFU. Although, detection and localization of DFU on
full foot images is a valuable study, further analysis of
each DFU on foot images is required according to the
Fig. 14. DFU localization on Nvidia Jetson TX2 using Faster R-CNN with medical classification systems followed by podiatrists
InceptionV2 on tensor-flow.
such as Texas Classification of DFU [29] and SINBAD
Classification System [52]. Most of the state-of-the-art
We tested our prototype application for the real-time appli- computerized imaging methods rely on the supervised
cation in real-time healthcare settings as shown in the Fig. 15. learning. Hence, there is a need of laborious manual
We tested this application on 30 people in this preliminary annotation by medical experts according to these popular
test in which 10 people were with DFU. Out of 10 people classification systems. For example, Texas classification
with DFU, our application detected 8 DFU and out of 20 system classifies DFU into 16 classes depending on
people with normal foot, our application did not detect any conditions of DFU based on ischemia, infection, area
false detection. Furthermore, more user-friendly features, care, and depth. These methods can be extended to produce
and guidance will be added to this application to make it a localization of DFU and determine the outcome of
complete package of DFU care for diabetic patients. DFU according to the Texas classification system with
substantial image data belonging to each class and expert
annotations.
VI. D ISCUSSION AND C ONCLUSION 2) Deep learning methods require considerable amount of
data to learn features of abnormality in medical imaging.
Diagnosis and detection of DFU by the computerized
To achieve accurate DFU detection according to differ-
method has been an emerging research area with the evolution
ent classification systems, multiple images of same DFU
of computer vision, especially deep learning methods. In this
covering key specific conditions such as lighting condi-
work, we investigated the use of both conventional machine
tions, the distance of image capture from the foot and
learning and deep learning for the DFU localization task.
orientation of the camera relative to the foot. To our best
We achieved relatively good performance using conventional
knowledge, there are no publicly available standardized
machine learning technique. But, due to multiple intermediate
DFU dataset with descriptions and annotation. Hence,
steps, this approach is very slow for the DFU localization
there is a requirement of publicly available annotated
task. In deep learning, we used different object localization
DFU dataset with essential diagnostic in this regard.
meta-architectures to train the end-to-end models on the DFU
The standardized dataset can help to produce even more
dataset with different hyper-parameter settings and two-tier
accurate results with these methods.
transfer learning to localize DFU on the full foot images with
3) Early detection of key pathological changes in the di-
high accuracy. As shown in the Fig. 11, these methods are
abetic foot leading to the development of a DFU is
capable of localizing multiple DFU with high inference speed.
really important. Hence, the time-line dataset of patients
We also found that though SSD meta-architecture produced
with early signs of DFU till the diagnosis is required to
fastest inference due to the two-stage architecture, Faster R-
achieve this objective. With these methods and time-line
CNN produced the most accurate results in our task. Then,
dataset, the early prediction, healing progress and other
we demonstrated how these methods can be easily transferred
potential outcomes of DFU could be possible.
to a portable device, Nvidia Jetson TX2, to produce inference
4) The combination of image features and diagnosis fea-
remotely. Finally, these deep learning methods were used in
tures such as patient’s ethnicity, the presence of is-
android application to provide real-time DFU localization. In
chemia, depth of DFU to the tendon, neuropathy would
this work, we developed mobile systems that can assist both
aid to a more robust DFU diagnosis system.
medical experts and patients for the DFU diagnosis and follow-
5) The DFU diagnosis system should be scalable to multi-
up in the remote settings.
ple devices, platforms and operating systems.
In the present situation, manual inspection by podiatrists
remains the ideal solution for the diagnosis of DFU. However, With limited human resources and facilities in healthcare
Netten et al. [51] claimed that human observers achieved systems, DFU diagnosis is a significant workload and burden
low validity and reliability for remote assessment of DFU. for the government. The computer-based systems have huge
Therefore, computerized method could be used as a tool to potential to assist healthcare systems in DFU assessment. The

2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
11

Fig. 15. Real-time localization using smartphone android application. In the first row, images are captured by default camera. In the second row, the snapshot
of real-time localization by our prototype android application.

new technologies like the Internet of Things (IoT), cloud [5] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
computing, computer vision and deep learning can enable no. 7553, pp. 436–444, 2015.
[6] D. Leightley, J. S. McPhee, and M. H. Yap, “Automated analysis and
computer systems to remotely assess the wounds, provide quantification of human mobility using a depth sensor,” IEEE journal of
faster feedback with good accuracy. But, this integrated system biomedical and health informatics, vol. 21, no. 4, pp. 939–948, 2017.
should be tested and validated rigorously by podiatrists and [7] M. H. Yap, G. Pons, J. Martı́, S. Ganau, M. Sentı́s, R. Zwiggelaar, A. K.
Davison, and R. Martı́, “Automated breast ultrasound lesions detection
medical experts, before it is implemented in the real healthcare using convolutional neural networks,” IEEE journal of biomedical and
setting and deployed as a mobile application. health informatics, vol. 22, no. 4, 2018.
[8] M. Goyal and M. H. Yap, “Multi-class semantic segmentation
of skin lesions via fully convolutional networks,” arXiv preprint
VII. ACKNOWLEDGEMENTS arXiv:1711.10449, 2017.
[9] M. Goyal, N. D. Reeves, A. K. Davison, S. Rajbhandari, J. Spragg,
The authors express their gratitude to Lancashire Teaching and M. H. Yap, “Dfunet: Convolutional neural networks for diabetic
Hospitals and Jennifer Spragg who is Podiatrist / Chiropodist foot ulcer classification,” IEEE Transactions on Emerging Topics in
in the Rossendale Practice, Rawtenstall, Lancashire for their Computational Intelligence, 2018.
[10] M. Goyal, M. H. Yap, N. D. Reeves, S. Rajbhandari, and J. Spragg,
extensive support and contribution in carrying out this re- “Fully convolutional networks for diabetic foot ulcer segmentation,” in
search. We gratefully acknowledge the support of NVIDIA 2017 IEEE International Conference on Systems, Man, and Cybernetics
Corporation with the donation of the GPU used for this (SMC), Oct 2017, pp. 618–623.
[11] M. Kolesnik and A. Fexa, “Multi-dimensional color histograms for
research. segmentation of wounds in images,” in International Conference Image
Analysis and Recognition. Springer, 2005, pp. 1014–1022.
R EFERENCES [12] M. Kolesnik and A. Fexa, “How robust is the svm wound segmentation?”
in Signal Processing Symposium, 2006. NORSIG 2006. Proceedings of
[1] K. Ogurtsova, J. da Rocha Fernandes, Y. Huang, U. Linnenkamp, the 7th Nordic. IEEE, 2006, pp. 50–53.
L. Guariguata, N. Cho, D. Cavan, J. Shaw, and L. Makaroff, “Idf diabetes [13] E. S. Papazoglou, L. Zubkov, X. Mao, M. Neidrauer, N. Rannou, and
atlas: Global estimates for the prevalence of diabetes for 2015 and 2040,” M. S. Weingarten, “Image analysis of chronic wounds for determining
Diabetes research and clinical practice, vol. 128, pp. 40–50, 2017. the surface area,” Wound repair and regeneration, vol. 18, no. 4, pp.
[2] S. D. Ramsey, K. Newton, D. Blough, D. K. McCulloch, N. Sandhu, 349–358, 2010.
G. E. Reiber, and E. H. Wagner, “Incidence, outcomes, and cost of [14] F. Veredas, H. Mesa, and L. Morente, “Binary tissue classification on
foot ulcers in patients with diabetes.” Diabetes care, vol. 22, no. 3, pp. wound images with neural networks and bayesian classifiers,” IEEE
382–387, 1999. transactions on medical imaging, vol. 29, no. 2, pp. 410–427, 2010.
[3] R. E. Pecoraro, G. E. Reiber, and E. M. Burgess, “Pathways to diabetic [15] M. K. Yadav, D. D. Manohar, G. Mukherjee, and C. Chakraborty,
limb amputation: basis for prevention,” Diabetes care, vol. 13, no. 5, “Segmentation of chronic wound areas by clustering techniques using
pp. 513–521, 1990. selected color space,” Journal of Medical Imaging and Health Informat-
[4] D. G. Armstrong, A. J. Boulton, and S. A. Bus, “Diabetic foot ulcers and ics, vol. 3, no. 1, pp. 22–29, 2013.
their recurrence,” New England Journal of Medicine, vol. 376, no. 24, [16] A. Castro, C. Bóveda, and B. Arcay, “Analysis of fuzzy clustering
pp. 2367–2375, 2017. algorithms for the segmentation of burn wounds photographs,” in

2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
12

International Conference Image Analysis and Recognition. Springer, [38] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,
2006, pp. 491–501. P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in
[17] D. H. Chung and G. Sapiro, “Segmenting skin lesions with partial- context,” in European conference on computer vision. Springer, 2014,
differential-equations-based image processing algorithms,” IEEE trans- pp. 740–755.
actions on Medical Imaging, vol. 19, no. 7, pp. 763–767, 2000. [39] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time
[18] L. Wang, P. Pedersen, E. Agu, D. Strong, and B. Tulu, “Area determina- object detection with region proposal networks,” in Advances in neural
tion of diabetic foot ulcer images using a cascaded two-stage svm based information processing systems, 2015, pp. 91–99.
classification,” IEEE Transactions on Biomedical Engineering, 2016. [40] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international
[19] C. Wang, X. Yan, M. Smith, K. Kochhar, M. Rubin, S. M. Warren, conference on computer vision, 2015, pp. 1440–1448.
J. Wrobel, and H. Lee, “A unified framework for automatic wound [41] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-
segmentation and analysis with deep convolutional neural networks,” based fully convolutional networks,” in Advances in neural information
in Engineering in Medicine and Biology Society (EMBC), 2015 37th processing systems, 2016, pp. 379–387.
Annual International Conference of the IEEE. IEEE, 2015, pp. 2415– [42] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
2418. Berg, “Ssd: Single shot multibox detector,” in European conference on
[20] J. J. van Netten, J. G. van Baal, C. Liu, F. van Der Heijden, and S. A. computer vision. Springer, 2016, pp. 21–37.
Bus, “Infrared thermal imaging for automated detection of diabetic foot [43] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,
complications,” 2013. T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convo-
[21] C. Liu, F. van der Heijden, M. E. Klein, J. G. van Baal, S. A. Bus, and lutional neural networks for mobile vision applications,” arXiv preprint
J. J. van Netten, “Infrared dermal thermography on diabetic feet soles to arXiv:1704.04861, 2017.
predict ulcerations: a case study,” in Advanced Biomedical and Clinical [44] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking
Diagnostic Systems XI, vol. 8572. International Society for Optics and the inception architecture for computer vision,” in Proceedings of the
Photonics, 2013, p. 85720N. IEEE Conference on Computer Vision and Pattern Recognition, 2016,
[22] J. Harding, D. Wertheim, R. Williams, J. Melhuish, D. Banerjee, and pp. 2818–2826.
K. Harding, “Infrared imaging in diabetic foot ulceration,” in Engineer- [45] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
ing in Medicine and Biology Society, 1998. Proceedings of the 20th recognition,” in Proceedings of the IEEE conference on computer vision
Annual International Conference of the IEEE, vol. 2. IEEE, 1998, pp. and pattern recognition, 2016, pp. 770–778.
916–918. [46] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4,
[23] D. Hernandez-Contreras, H. Peregrina-Barreto, J. Rangel-Magdaleno, inception-resnet and the impact of residual connections on learning.” in
and J. Gonzalez-Bernal, “Narrative review: Diabetic foot and infrared AAAI, 2017, pp. 4278–4284.
thermography,” Infrared Physics & Technology, vol. 78, pp. 105–117, [47] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi,
2016. I. Fischer, Z. Wojna, Y. Song, S. Guadarrama et al., “Speed/accuracy
[24] M. Adam, E. Y. Ng, J. H. Tan, M. L. Heng, J. W. Tong, and U. R. trade-offs for modern convolutional object detectors,” arXiv preprint
Acharya, “Computer aided diagnosis of diabetic foot using infrared arXiv:1611.10012, 2016.
thermography: A review,” Computers in biology and medicine, vol. 91, [48] A. Van Opbroek, M. A. Ikram, M. W. Vernooij, and M. De Brui-
pp. 326–336, 2017. jne, “Transfer learning improves supervised image segmentation across
[25] M. H. Yap, C.-C. Ng, K. Chatwin, C. A. Abbott, F. L. Bowling, A. J. imaging protocols,” IEEE transactions on medical imaging, vol. 34,
Boulton, and N. D. Reeves, “Computer vision algorithms in the detection no. 5, pp. 1018–1030, 2015.
of diabetic foot ulceration a new paradigm for diabetic foot care?” [49] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
Journal of diabetes science and technology, p. 1932296815611425, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large
2015. scale visual recognition challenge,” International Journal of Computer
Vision, vol. 115, no. 3, pp. 211–252, 2015.
[26] M. H. Yap, K. E. Chatwin, C.-C. Ng, C. A. Abbott, F. L. Bowling,
[50] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser-
S. Rajbhandari, A. J. Boulton, and N. D. Reeves, “A new mobile
man, “The pascal visual object classes (voc) challenge,” International
application for standardizing diabetic foot images,” Journal of diabetes
journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.
science and technology, vol. 12, no. 1, pp. 169–173, 2018.
[51] J. J. van Netten, D. Clark, P. A. Lazzarini, M. Janda, and L. F. Reed,
[27] R. Brown, B. Ploderer, L. S. D. Seng, J. J. van Netten, and P. A.
“The validity and reliability of remote diabetic foot ulcer assessment
Lazzarini, “Myfootcare: A mobile self-tracking tool to promote self-care
using mobile phone images,” Scientific Reports, vol. 7, no. 1, p. 9480,
amongst people with diabetic foot ulcers,” 2017.
2017.
[28] B. Hewitt, M. H. Yap, and R. Grant, “Manual whisker annotator (mwa): [52] P. Ince, Z. G. Abbas, J. K. Lutale, A. Basit, S. M. Ali, F. Chohan,
A modular open-source tool,” Journal of Open Research Software, vol. 4, S. Morbach, J. Möllenberg, F. L. Game, and W. J. Jeffcoate, “Use of
no. 1, 2016. the sinbad classification system and score in comparing outcome of foot
[29] L. A. Lavery, D. G. Armstrong, and L. B. Harkless, “Classification of ulcer management on three continents,” Diabetes care, vol. 31, no. 5,
diabetic foot wounds,” The Journal of Foot and Ankle Surgery, vol. 35, pp. 964–967, 2008.
no. 6, pp. 528–531, 1996.
[30] W. Förstner, “A framework for low level feature extraction,” in European
Conference on Computer Vision. Springer, 1994, pp. 383–394.
[31] Z. Guo, L. Zhang, and D. Zhang, “A completed modeling of local binary
pattern operator for texture classification,” IEEE Transactions on Image
Processing, vol. 19, no. 6, pp. 1657–1663, 2010.
[32] J. P. Jones and L. A. Palmer, “An evaluation of the two-dimensional
gabor filter model of simple receptive fields in cat striate cortex,” Journal
of neurophysiology, vol. 58, no. 6, pp. 1233–1258, 1987.
[33] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
detection,” in 2005 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR’05), vol. 1. IEEE, 2005, pp.
886–893.
[34] D. H. Ballard, “Generalizing the hough transform to detect arbitrary
shapes,” Pattern recognition, vol. 13, no. 2, pp. 111–122, 1981.
[35] Y.-I. Ohta, T. Kanade, and T. Sakai, “Color information for region
segmentation,” Computer graphics and image processing, vol. 13, no. 3,
pp. 222–241, 1980.
[36] C. J. Burges, “A tutorial on support vector machines for pattern recogni-
tion,” Data mining and knowledge discovery, vol. 2, no. 2, pp. 121–167,
1998.
[37] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in neural infor-
mation processing systems, 2012, pp. 1097–1105.

2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like