Goyal 2018
Goyal 2018
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
1
Abstract—Current practice for Diabetic Foot Ulcers (DFU) [5] and human motion analysis [6]. The computer vision and
screening involves detection and localization by podiatrists. deep learning algorithms are extensively used for the analysis
Existing automated solutions either focus on segmentation or of medical imaging of various modalities such as MRI, CT
classification. In this work, we design deep learning methods for
real-time DFU localization. To produce a robust deep learning scan, X-ray, dermoscopy, and ultrasound [7]. Recently, com-
model, we collected an extensive database of 1775 images of DFU. puter vision algorithms are extended to assess different types
Two medical experts produced the ground truths of this dataset of skin condition such as skin cancer and DFU [8], [9].
by outlining the region of interest of DFU with an annotator From a computer vision and medical imaging perspective,
software. Using 5-fold cross-validation, overall, Faster R-CNN
with InceptionV2 model using two-tier transfer learning achieved there are three common tasks can be performed for the
a mean average precision of 91.8%, the speed of 48 ms for detection of abnormalities on medical images, which are 1)
inferencing a single image and with a model size of 57.2 MB. Classification 2) Localization 3) Segmentation. These tasks
To demonstrate the robustness and practicality of our solution on DFU are illustrated by Fig. 1. Various researchers have
to real-time prediction, we evaluated the performance of the made contributions related to computer vision methods for the
models on a NVIDIA Jetson TX2 and a smartphone app. This
work demonstrates the capability of deep learning in real-time detection of DFU. We divided these contributions into four
localization of DFU, which can be further improved with a more categories:
extensive dataset.
1) Algorithms development based on basic image process-
Index Terms—Diabetic foot ulcers, deep learning, convolutional ing and traditional machine learning techniques
neural networks, DFU localization, real-time localization. 2) Algorithms development based on deep learning tech-
niques
I. I NTRODUCTION 3) Research based on different modalities of images
4) Smartphone applications for DFU
D IABETIC Foot Ulcers (DFU) that affect the lower ex-
tremities is a major complication of Diabetes. Accord-
ing to the global prevalence data of International Diabetes
Several studies suggested computer vision methods based on
basic image processing approaches and supervised traditional
Federation in 2015, annually, DFU develop in 9.1 million to machine learning for the detection of DFU/wound. Mainly,
26.1 million people with diabetes worldwide [1]. It has been these studies have performed the segmentation task by extract-
estimated that patients with diabetes have a lifetime risk of ing texture descriptors and color descriptors on small patches
15% to 25% in developing DFU with nearly contributing to of wound/DFU images, followed by traditional machine learn-
85% of the lower limb amputation due to infected and non- ing algorithms to classify them into normal and abnormal
healing DFU [2], [3]. In a more recent study, when additional skin patches [11], [12], [13], [14]. In conventional machine
data is considered, the risk is suggested to be in-between 19% learning, the hand-crafted features are usually affected by
to 34% [4]. skin shades, illumination, and image resolution. Also, these
Due to the proliferation of Information Communication techniques struggled to segment the irregular contour of the
Technology, the intelligent automated telemedicine systems ulcers or wounds. On the other hand, the unsupervised ap-
are often tipped as one of the most cost-effective solutions proaches rely upon image processing techniques, edge detec-
for remote detection and prevention of DFU. Telemedicine tion, morphological operations and clustering algorithms using
systems along with current healthcare services can integrate different color space to segment the wounds from images [15],
with each other to provide more cost-effective, efficient and [16], [17]. Wang et al. [18] used an image capture box to
quality treatment for DFU. In recent years, there has been capture image data and determined the area of DFU using
a rapid development in computer vision, especially towards cascaded two-stage SVM-based classification. They proposed
the difficult and vital issues of understanding images from the use of superpixel technique for segmentation and extracted
different domains such as spectral, medical, object detection the number of features to perform two-stage classification.
Although this system reported promising results, it has not
M. Goyal and M. H. Yap are with the School of Computing, Mathematics been validated on a more substantial dataset. In addition,
and Digital Technology, Manchester Metropolitan University, John Dalton
Building, M1 5GD, Manchester, UK. (e-mail: [email protected]) the image capture box is very impractical for data collection
N.D. Reeves is with the Musculoskeletal Science & Sports Medicine as there is a need for the patient’s barefoot to be placed
Research Centre, School of Healthcare Science, Faculty of Science & En- directly in contact with the screen of image capture box.
gineering, Manchester Metropolitan University, John Dalton Building, M1
5GD, Manchester, UK In healthcare, such setting would not be allowed due to the
S. Rajbhandari is at Lancashire Teaching Hospital, PR2 9HT, Preston, UK. concerns regarding infection control.
2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
2
2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
3
2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
4
Fig. 5. Stage 1: The feature map extracted by CNN that acts as backbone
for object localization network. Conv refers convolutional layer.
2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
5
Fig. 7. Illustration of Stage 3: The classification and further box refinement of RoI boxes from the second stage proposal with softmax and Bbox regression.
Where FC refers to Fully-connected layer
Fig. 10. The architecture of Single Shot Multibox Detector (SSD). It considers
only two stage by eliminating the last stage to produce faster box proposals.
2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
6
TABLE I
P ERFORMANCE OF STATE - OF - THE - ART OBJECT LOCALIZATION MODELS ON MS-COCO DATASET. [38]
SSD-MobileNet 30 29.2 21
SSD-InceptionV2 42 102.2 24
problem to improve the convergence during training [44]. 1.5 million images with 1000 classes [37]. In the second tier,
MobileNet is a recent lightweight CNN which uses depth- we used full transfer learning to transfer the features from a
wise separable convolutions to build small, low latency models model trained on object localization dataset called MS-COCO
with a reasonable amount of accuracy that matches the limited that consists of more than 80000 images with 90 classes [38].
resource on mobile devices. The basic block of depth-wise Hence, we used the two-tier transfer learning technique to
separable convolution consists of depth-wise convolution and produce the pre-trained model for all frameworks in our DFU
pointwise convolution. The 3 × 3 depth-wise convolution is localization task.
used to apply a single filter per each input channel whereas
pointwise convolution is just simple 1 × 1 convolution used E. Performance Measures of Deep Learning Methods
to create the linear combination of the depth-wise convolution
We used four performance metrics i.e. Speed, Size of the
output. Also, it uses both batchnorm layers as well as RELU
model, mean average precision (mAP), and Overlap Percent-
layers after both layers [43].
age. The Speed determines the time model takes to perform
ResNet101 is one of the residual learning networks which
inference on single image whereas Size of the model is the
won the first place on ILSVRC 2015 classification task [45].
total size of the frozen model that is used for the inference
As suggested by the name, ResNet101 is a very deep network
of test images. These are crucial factors for the real-time
consists of 101 layers which is about 5 times much deeper than
prediction on mobile platforms. The mAP has an ”overlap
VGG nets but still having lower complexity. The core idea
criterion” of intersection-over-union greater than 0.5. The mAP
of ResNet is providing shortcut connection between layers,
is an important performance metric extensively used for the
which make it safe to train very deep network to gain maximal
evaluation of the object localization task. The prediction by
representation power without worrying about the degradation
model to be considered a correct detection, the area of overlap
problem, i.e., learning difficulties introduced by deep layers.
Ao between the bounding box of prediction Bp and bounding
box of ground truth Bg must exceed 0.5 (50%) [50]. The last
D. The Transfer Learning Approach evaluation metric is called Overlap Percentage, which is mean
average of intersection over union for all correct detection.
CNNs requires a considerable dataset to learn the features
to get the positive results for detection of objects in images area(Bp ∩ Bg )
[5]. It is vital to use transfer learning from massive datasets Ao = (1)
area(Bp ∪ Bg )
in non-medical backgrounds such as ImageNet and MS-
COCO dataset to converge the weights associated with each
convolutional layers of network [48], [49], [10] for training III. E XPERIMENT AND R ESULT
the limited dataset. The main reason for using two-tier transfer As mentioned previously, we used the deep learning models
learning in this work is because, the medical imaging datasets based on three meta-architectures for the DFU localization
are very limited. Hence, when CNNs are trained from scratch task. Tensorflow object detection API [47] provides an open
on these datasets, they do not produce useful results. There source framework which makes very convenient to design
are two types of transfer learning i.e. partial transfer learning and build various object localization models. The experiments
in which only the features from few convolutional layers were carried out on the DFU dataset and evaluated with
are transferred and full transfer learning in which features 5-fold cross-validation technique. First, we randomly split
are transferred from all the layers of previous pre-trained the whole dataset into 5 testing sets (20% each) for 5-fold
models. We used both types of transfer learning known as cross validation. This is to ensure that the whole dataset was
two-tier transfer learning [10]. In the first tier, we used partial evaluated on testing sets. For each testing set (20%), the
transfer learning by transferring the features only from the remaining images was randomly split into 70% for training
convolutional layers trained on most significant classification set and 10% validation set. Hence, for each fold, we divided
challenge dataset called ImageNet which consists of more than the whole dataset of 1775 images into approximately 1242
2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
7
TABLE II
P ERFORMANCE MEASURES OF OBJECT LOCALIZATION MODELS ON DFU DATASET
Model Name Speed (ms) Size of Model (MB) Ulcer mAP Overlap Percentage (%)
images in training set, 178 in validation set and 355 in testing truncated normal distribution with standard deviation of 0.01,
set. This was repeated for 5-fold to ensure the whole dataset batch_norm with decay of 0.9997 and epsilon of 0.001. For
was included in testing set. training, we used a batch size of 2, optimizer as momentum
a) Configuration of GPU Machine for Experiments: (1) with manual step learning rate with an initial rate as 0.0002,
Hardware: CPU - Intel i7-6700 @ 4.00Ghz, GPU - NVIDIA 0.00002 at epoch 40 and 0.000002 at epoch 60. The momen-
TITAN X 12GB, RAM - 32GB DDR4 (2) Software: Tensor- tum optimizer value is set at 0.9. For training RFCN, we used
flow [47]. same hyper-parameters as Faster-RCNN with only change in
We tested four state-of-the-art deep convolutional networks the learning rate set as 0.0005. For data augmentation, we used
for our proposed object localization task as described in only random horizontal flip for these two meta-architectures.
Section III B. We train the models with input-size of 640x640 In Table II, we report the performance evaluation of object
using stochastic gradient descent with different learning rate localization networks for DFU dataset on 5-fold cross valida-
on Nvidia GeForce GTX TITAN X card. We initialised the tion. Overall, all the models achieved promising localization
network with pre-trained weights using transfer learning rather results with high confidence on DFU dataset. Few instances of
than randomly initialized weights for the better convergence accurate localization by all trained models are demonstrated by
of the network. We tested the multiple learning rates by the Fig. 11. SSD-MobileNet ranked first in the Size of Model
decreasing the original learning rates with the 10 and 100 and Average Speed performance index. This is mainly due
times as well as multiplication factor from 1 to 5 to check the to the simpler architecture to generate anchor boxes in SSD
overall minimal validation loss. For example, if the original [42]. Whereas in Ulcer mAP and Overlap Percentage, R-FCN
Inception-V2 learning rate was set at 0.001. Then, for training with ResNet101 and Faster R-CNN with InceptionV2 were
on DFU dataset, we used 10 learning rates of 0.0001, 0.0002, almost equally competitive in these performance measures.
0.0003, 0.0004, 0.0005, 0.00001, 0.00002, 0.00003, 0.00004, In Ulcer mAP, Faster R-CNN with InceptionV2 ranked first
0.00005. with overall mAP of 91.8%, just slightly better than R-
We used 100 epochs for training of each reported model, FCN with ResNet101 with mAP of 90.6%. But, in Overlap
which we found are sufficient to train the DFU dataset as Percentage, R-FCN-Resnet101 achieved a score of 96.1%,
both training and validation loss finally converge to optimal which was slightly better than Faster R-CNN with Inception.
lowest. We selected the models on the basis of minimum SSD-InceptionV2 ranked third in both of these performance
validation losses for the evaluation. We tried different hyper- measure categories with difference of 4.6% in Ulcer mAP and
parameters such as learning rate, number of steps and data 3.5% in Overlap Percentage from the first position. In perfor-
augmentation options for each model to minimize both training mance measures, overall Faster R-CNN with InceptionV2 was
and validation losses. In next section, we report the different the best performer, and the most lightweight SSD-MobileNet
network hyper-parameters and configurations for each model emerged as the worst performer in terms of accuracy. Finally,
used for evaluation on the DFU dataset. we tested models on the dataset of 105 healthy foot images
We set the appropriate hyper-parameters on the basis of for specificity measure. None of the above-mentioned models
meta-architecture to train the models on DFU dataset. For produce any DFU localization on these healthy images.
SSD, we used two CNNs, MobileNet and Inception-V2 (both
of them use depth-wise separable convolutions), we set the
weight for l2_regularizer as 0.00004, initializer that A. Inaccurate DFU Localization Cases
generates a truncated normal distribution with standard de- In this work, we explored different object localization meta-
viation of 0.03 and mean of 0.0, batch_norm with decay of architectures to localize DFU on full foot images. Although
0.9997 and epsilon of 0.001. For training, we used a batch size the performance of all models is quite accurate as shown in
of 24, optimizer as RMS_Prop with a learning rate of 0.004 the Fig. 11, this section explores inaccurate localization cases
and decay factor of 0.95. The momentum optimizer value is by trained models on DFU dataset in 5-fold cross-validation
set at 0.9 with a decay of 0.9 and epsilon of 0.1. We also as shown in the Fig. 12. We found that trained models were
used two types of data augmentation as random horizontal struggled to localize the DFU of very small size and that has
flip and random crop. For Faster-RCNN, we set the weight the similar skin tone of the foot especially, SSD-MobileNet
for l2_regularizer as 0.0, initializer that generates a and SSD-InceptionV2. There are cases of DFU that have very
2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
8
Fig. 11. The accurate localization results to visually compare the performance of object localization networks on DFU dataset. Where SSD-MobNet is
SSD-MobileNet, SSD-IncV2 is SSD-InceptionV2, FRCNN-IncV2 is Faster R-CNN with InceptionV2, and RFCN-Res101 is R-FCN with ResNet101.
subtle features, not even, most accurate models such as Faster- a) Configuration of Jetson TX2 for Inference: (1)
RCNN with InceptionV2 and R-FCN with ResNet101 were Hardware: CPU - dual-core NVIDIA Denver2 + quad-core
able to detect these conditions. ARM Cortex-A57, GPU - 256-core Pascal GPU, RAM - 8GB
LPDDR4 (2) Software: Ubuntu Linux 16.04 & Tensor-flow.
IV. I NFERENCE OF T RAINED M ODELS ON NVIDIA
J ETSON TX2 D EVELOPER K IT
We did not find any difference in the prediction of the
Nvidia Jetson TX2 is the latest mobile computer hardware models on Jetson TX2 hardware and the GPU machine; the
with an onboard 5-megapixel camera and a GPU card for only let-off is the slow inference speed on the Jetson TX2. It
the remote deep learning applications as shown in the Fig. is obviously due to limited hardware compared to the GPU
13. However, it is not capable of training large deep learning machine. For example, the speed of SSD-MobileNet was 70
models. We installed tensor-flow specifically designed for this ms per inference on Jetson TX2 as compared to 30 ms on
hardware to produce inference from the DFU localization GPU machine. Also, for real-time localization, models can
models that we trained on the GPU machine. Jetson TX2 is a produce the visualization of maximum 5 fps using the on-
very compact and portable device that can be used in various board camera with lightweight model. Fig 14 demonstrates
remote locations. the inference using Jetson TX2.
2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
9
Fig. 12. Incorrect localization results to visually compare the performance of object localization networks on DFU dataset. Where SSD-MobNet is SSD-
MobileNet, SSD-IncV2 is SSD-InceptionV2, FRCNN-IncV2 is Faster R-CNN with InceptionV2, and RFCN-Res101 is R-FCN with ResNet101.
2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
10
2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
11
Fig. 15. Real-time localization using smartphone android application. In the first row, images are captured by default camera. In the second row, the snapshot
of real-time localization by our prototype android application.
new technologies like the Internet of Things (IoT), cloud [5] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
computing, computer vision and deep learning can enable no. 7553, pp. 436–444, 2015.
[6] D. Leightley, J. S. McPhee, and M. H. Yap, “Automated analysis and
computer systems to remotely assess the wounds, provide quantification of human mobility using a depth sensor,” IEEE journal of
faster feedback with good accuracy. But, this integrated system biomedical and health informatics, vol. 21, no. 4, pp. 939–948, 2017.
should be tested and validated rigorously by podiatrists and [7] M. H. Yap, G. Pons, J. Martı́, S. Ganau, M. Sentı́s, R. Zwiggelaar, A. K.
Davison, and R. Martı́, “Automated breast ultrasound lesions detection
medical experts, before it is implemented in the real healthcare using convolutional neural networks,” IEEE journal of biomedical and
setting and deployed as a mobile application. health informatics, vol. 22, no. 4, 2018.
[8] M. Goyal and M. H. Yap, “Multi-class semantic segmentation
of skin lesions via fully convolutional networks,” arXiv preprint
VII. ACKNOWLEDGEMENTS arXiv:1711.10449, 2017.
[9] M. Goyal, N. D. Reeves, A. K. Davison, S. Rajbhandari, J. Spragg,
The authors express their gratitude to Lancashire Teaching and M. H. Yap, “Dfunet: Convolutional neural networks for diabetic
Hospitals and Jennifer Spragg who is Podiatrist / Chiropodist foot ulcer classification,” IEEE Transactions on Emerging Topics in
in the Rossendale Practice, Rawtenstall, Lancashire for their Computational Intelligence, 2018.
[10] M. Goyal, M. H. Yap, N. D. Reeves, S. Rajbhandari, and J. Spragg,
extensive support and contribution in carrying out this re- “Fully convolutional networks for diabetic foot ulcer segmentation,” in
search. We gratefully acknowledge the support of NVIDIA 2017 IEEE International Conference on Systems, Man, and Cybernetics
Corporation with the donation of the GPU used for this (SMC), Oct 2017, pp. 618–623.
[11] M. Kolesnik and A. Fexa, “Multi-dimensional color histograms for
research. segmentation of wounds in images,” in International Conference Image
Analysis and Recognition. Springer, 2005, pp. 1014–1022.
R EFERENCES [12] M. Kolesnik and A. Fexa, “How robust is the svm wound segmentation?”
in Signal Processing Symposium, 2006. NORSIG 2006. Proceedings of
[1] K. Ogurtsova, J. da Rocha Fernandes, Y. Huang, U. Linnenkamp, the 7th Nordic. IEEE, 2006, pp. 50–53.
L. Guariguata, N. Cho, D. Cavan, J. Shaw, and L. Makaroff, “Idf diabetes [13] E. S. Papazoglou, L. Zubkov, X. Mao, M. Neidrauer, N. Rannou, and
atlas: Global estimates for the prevalence of diabetes for 2015 and 2040,” M. S. Weingarten, “Image analysis of chronic wounds for determining
Diabetes research and clinical practice, vol. 128, pp. 40–50, 2017. the surface area,” Wound repair and regeneration, vol. 18, no. 4, pp.
[2] S. D. Ramsey, K. Newton, D. Blough, D. K. McCulloch, N. Sandhu, 349–358, 2010.
G. E. Reiber, and E. H. Wagner, “Incidence, outcomes, and cost of [14] F. Veredas, H. Mesa, and L. Morente, “Binary tissue classification on
foot ulcers in patients with diabetes.” Diabetes care, vol. 22, no. 3, pp. wound images with neural networks and bayesian classifiers,” IEEE
382–387, 1999. transactions on medical imaging, vol. 29, no. 2, pp. 410–427, 2010.
[3] R. E. Pecoraro, G. E. Reiber, and E. M. Burgess, “Pathways to diabetic [15] M. K. Yadav, D. D. Manohar, G. Mukherjee, and C. Chakraborty,
limb amputation: basis for prevention,” Diabetes care, vol. 13, no. 5, “Segmentation of chronic wound areas by clustering techniques using
pp. 513–521, 1990. selected color space,” Journal of Medical Imaging and Health Informat-
[4] D. G. Armstrong, A. J. Boulton, and S. A. Bus, “Diabetic foot ulcers and ics, vol. 3, no. 1, pp. 22–29, 2013.
their recurrence,” New England Journal of Medicine, vol. 376, no. 24, [16] A. Castro, C. Bóveda, and B. Arcay, “Analysis of fuzzy clustering
pp. 2367–2375, 2017. algorithms for the segmentation of burn wounds photographs,” in
2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2868656, IEEE Journal of
Biomedical and Health Informatics
12
International Conference Image Analysis and Recognition. Springer, [38] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,
2006, pp. 491–501. P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in
[17] D. H. Chung and G. Sapiro, “Segmenting skin lesions with partial- context,” in European conference on computer vision. Springer, 2014,
differential-equations-based image processing algorithms,” IEEE trans- pp. 740–755.
actions on Medical Imaging, vol. 19, no. 7, pp. 763–767, 2000. [39] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time
[18] L. Wang, P. Pedersen, E. Agu, D. Strong, and B. Tulu, “Area determina- object detection with region proposal networks,” in Advances in neural
tion of diabetic foot ulcer images using a cascaded two-stage svm based information processing systems, 2015, pp. 91–99.
classification,” IEEE Transactions on Biomedical Engineering, 2016. [40] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international
[19] C. Wang, X. Yan, M. Smith, K. Kochhar, M. Rubin, S. M. Warren, conference on computer vision, 2015, pp. 1440–1448.
J. Wrobel, and H. Lee, “A unified framework for automatic wound [41] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-
segmentation and analysis with deep convolutional neural networks,” based fully convolutional networks,” in Advances in neural information
in Engineering in Medicine and Biology Society (EMBC), 2015 37th processing systems, 2016, pp. 379–387.
Annual International Conference of the IEEE. IEEE, 2015, pp. 2415– [42] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
2418. Berg, “Ssd: Single shot multibox detector,” in European conference on
[20] J. J. van Netten, J. G. van Baal, C. Liu, F. van Der Heijden, and S. A. computer vision. Springer, 2016, pp. 21–37.
Bus, “Infrared thermal imaging for automated detection of diabetic foot [43] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,
complications,” 2013. T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convo-
[21] C. Liu, F. van der Heijden, M. E. Klein, J. G. van Baal, S. A. Bus, and lutional neural networks for mobile vision applications,” arXiv preprint
J. J. van Netten, “Infrared dermal thermography on diabetic feet soles to arXiv:1704.04861, 2017.
predict ulcerations: a case study,” in Advanced Biomedical and Clinical [44] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking
Diagnostic Systems XI, vol. 8572. International Society for Optics and the inception architecture for computer vision,” in Proceedings of the
Photonics, 2013, p. 85720N. IEEE Conference on Computer Vision and Pattern Recognition, 2016,
[22] J. Harding, D. Wertheim, R. Williams, J. Melhuish, D. Banerjee, and pp. 2818–2826.
K. Harding, “Infrared imaging in diabetic foot ulceration,” in Engineer- [45] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
ing in Medicine and Biology Society, 1998. Proceedings of the 20th recognition,” in Proceedings of the IEEE conference on computer vision
Annual International Conference of the IEEE, vol. 2. IEEE, 1998, pp. and pattern recognition, 2016, pp. 770–778.
916–918. [46] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4,
[23] D. Hernandez-Contreras, H. Peregrina-Barreto, J. Rangel-Magdaleno, inception-resnet and the impact of residual connections on learning.” in
and J. Gonzalez-Bernal, “Narrative review: Diabetic foot and infrared AAAI, 2017, pp. 4278–4284.
thermography,” Infrared Physics & Technology, vol. 78, pp. 105–117, [47] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi,
2016. I. Fischer, Z. Wojna, Y. Song, S. Guadarrama et al., “Speed/accuracy
[24] M. Adam, E. Y. Ng, J. H. Tan, M. L. Heng, J. W. Tong, and U. R. trade-offs for modern convolutional object detectors,” arXiv preprint
Acharya, “Computer aided diagnosis of diabetic foot using infrared arXiv:1611.10012, 2016.
thermography: A review,” Computers in biology and medicine, vol. 91, [48] A. Van Opbroek, M. A. Ikram, M. W. Vernooij, and M. De Brui-
pp. 326–336, 2017. jne, “Transfer learning improves supervised image segmentation across
[25] M. H. Yap, C.-C. Ng, K. Chatwin, C. A. Abbott, F. L. Bowling, A. J. imaging protocols,” IEEE transactions on medical imaging, vol. 34,
Boulton, and N. D. Reeves, “Computer vision algorithms in the detection no. 5, pp. 1018–1030, 2015.
of diabetic foot ulceration a new paradigm for diabetic foot care?” [49] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
Journal of diabetes science and technology, p. 1932296815611425, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large
2015. scale visual recognition challenge,” International Journal of Computer
Vision, vol. 115, no. 3, pp. 211–252, 2015.
[26] M. H. Yap, K. E. Chatwin, C.-C. Ng, C. A. Abbott, F. L. Bowling,
[50] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser-
S. Rajbhandari, A. J. Boulton, and N. D. Reeves, “A new mobile
man, “The pascal visual object classes (voc) challenge,” International
application for standardizing diabetic foot images,” Journal of diabetes
journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.
science and technology, vol. 12, no. 1, pp. 169–173, 2018.
[51] J. J. van Netten, D. Clark, P. A. Lazzarini, M. Janda, and L. F. Reed,
[27] R. Brown, B. Ploderer, L. S. D. Seng, J. J. van Netten, and P. A.
“The validity and reliability of remote diabetic foot ulcer assessment
Lazzarini, “Myfootcare: A mobile self-tracking tool to promote self-care
using mobile phone images,” Scientific Reports, vol. 7, no. 1, p. 9480,
amongst people with diabetic foot ulcers,” 2017.
2017.
[28] B. Hewitt, M. H. Yap, and R. Grant, “Manual whisker annotator (mwa): [52] P. Ince, Z. G. Abbas, J. K. Lutale, A. Basit, S. M. Ali, F. Chohan,
A modular open-source tool,” Journal of Open Research Software, vol. 4, S. Morbach, J. Möllenberg, F. L. Game, and W. J. Jeffcoate, “Use of
no. 1, 2016. the sinbad classification system and score in comparing outcome of foot
[29] L. A. Lavery, D. G. Armstrong, and L. B. Harkless, “Classification of ulcer management on three continents,” Diabetes care, vol. 31, no. 5,
diabetic foot wounds,” The Journal of Foot and Ankle Surgery, vol. 35, pp. 964–967, 2008.
no. 6, pp. 528–531, 1996.
[30] W. Förstner, “A framework for low level feature extraction,” in European
Conference on Computer Vision. Springer, 1994, pp. 383–394.
[31] Z. Guo, L. Zhang, and D. Zhang, “A completed modeling of local binary
pattern operator for texture classification,” IEEE Transactions on Image
Processing, vol. 19, no. 6, pp. 1657–1663, 2010.
[32] J. P. Jones and L. A. Palmer, “An evaluation of the two-dimensional
gabor filter model of simple receptive fields in cat striate cortex,” Journal
of neurophysiology, vol. 58, no. 6, pp. 1233–1258, 1987.
[33] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
detection,” in 2005 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR’05), vol. 1. IEEE, 2005, pp.
886–893.
[34] D. H. Ballard, “Generalizing the hough transform to detect arbitrary
shapes,” Pattern recognition, vol. 13, no. 2, pp. 111–122, 1981.
[35] Y.-I. Ohta, T. Kanade, and T. Sakai, “Color information for region
segmentation,” Computer graphics and image processing, vol. 13, no. 3,
pp. 222–241, 1980.
[36] C. J. Burges, “A tutorial on support vector machines for pattern recogni-
tion,” Data mining and knowledge discovery, vol. 2, no. 2, pp. 121–167,
1998.
[37] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in neural infor-
mation processing systems, 2012, pp. 1097–1105.
2168-2194 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.