Medical Image Retrieval Using Deep Convolutional Neural Network
Medical Image Retrieval Using Deep Convolutional Neural Network
1
Adnan Qayyum, 2a*Syed Muhammad Anwar, 3bMuhammad Awais, 1aMuhammad Majid
1
Department of Computer Engineering, University of Engineering and Technology Taxila, Pakistan
2
Department of Software Engineering, University of Engineering and Technology Taxila, Pakistan
3
Center for Vision, Speech and Signal Processing, University of Surrey, Surrey, UK
a
Signal, Image, Multimedia Processing and Learning (SIMPLe) Group, University of Engineering and Technology,
Taxila, Pakistan
b
Cerebrai Artificial Intelligence, Surrey, UK
*[email protected]
Abstract
With a widespread use of digital imaging data in hospitals, the size of medical image repositories is increasing
rapidly. This causes difficulty in managing and querying these large databases leading to the need of content
based medical image retrieval (CBMIR) systems. A major challenge in CBMIR systems is the semantic gap
that exists between the low level visual information captured by imaging devices and high level semantic
information perceived by human. The efficacy of such systems is more crucial in terms of feature
representations that can characterize the high-level information completely. In this paper, we propose a
framework of deep learning for CBMIR system by using deep Convolutional Neural Network (CNN) that is
trained for classification of medical images. An intermodal dataset that contains twenty four classes and five
modalities is used to train the network. The learned features and the classification results are used to retrieve
medical images. For retrieval, best results are achieved when class based predictions are used. An average
classification accuracy of 99.77% and a mean average precision of 0.69 is achieved for retrieval task. The
proposed method is best suited to retrieve multimodal medical images for different body organs.
Keywords—Content Based Medical Image Retrieval (CBMIR); Convolutional Neural Networks (CNNs); Similarity
Metric; Deep Learning;
1. Introduction
In recent years, rapid growth of digital computers, multimedia, and storage systems has resulted in large image
and multimedia content repositories. Clinical and diagnostic studies are also benefiting from these advances
in digital storage and content processing. The hospitals having diagnostic and investigative imaging facilities
are producing large amount of imaging data thus causing a huge increase in production of medical image
collections. Therefore, development of an effective medical image retrieval system is required to aid clinicians
in browsing these large datasets. To facilitate the process of production and management of such large medical
image databases, many algorithms for automatic analysis of medical images have been proposed in literature
[1-5]. A content based medical image retrieval (CBMIR) system can be an effective way for supplementing
the diagnosis and treatment of various diseases and also an efficient management tool [6] for handling large
amount of data.
Content based image retrieval (CBIR) is a computer vision technique that gives a way for searching relevant
images in large databases. This search is based on the image features like color, texture and shape or any other
features being derived from the image itself. The performance of a CBIR system mainly depends on these
selected features [7]. The images are first represented in terms of features in a high dimensional feature space.
Then, the similarity among images stored in the database and that of a query image is measured in the feature
space by using a distance metric e.g., Euclidean distance. Hence, for CBIR systems, representation of image
data in terms of features and selecting a similarity measure, are the most critical components. Although, many
researchers have broadly studied these areas [8], but the most challenging issue that remains in CBIR systems
is reducing the “semantic gap”. It is the information lost by representing an image in terms of its features i.e.,
from high level semantics to low level features [9]. This gap exists between the visual information captured
by the imaging device and the visual information perceived by the human vision system (HVS). This gap can
be reduced either by embedding domain specific knowledge or by using some machine learning technique to
develop intelligent systems that can be trained to act like HVS.
There has been a significant growth in machine learning research and one breakthrough is deep learning
framework. The deep learning possesses various machine learning algorithms for modelling high level
abstractions in data by employing deep architectures composed of multiple non-linear transformations [10].
Deep learning mimics the human brain [8], that has a deep architecture and information in human brain is
processed through multiple layers of transformation. Thus, to learn features from data automatically at
multiple level of abstractions by exploring deep architectures, deep learning techniques gives a direct way to
get feature representations by allowing the system (deep network) to learn complex features from raw images
without using hand crafted features. Recent studies have reported that deep learning methods have been
successfully applied to many applications areas e.g., image and video classification [11-13], visual tracking
[14], speech recognition [15] and natural language processing [16].
Deep learning methods have been applied to CBIR task in recent studies [8, 17, 18], but there is less attention
on exploring deep learning methods for CBMIR task. In this paper, inspired by the successes of deep learning
methods in bridging the semantic gap, its application to CBMIR task is investigated. A deep learning
technique i.e., Convolutional Neural Network (CNN) is adapted for learning feature representations for
different imaging modalities and body organs. Generally, in medical imaging 3D volumetric image are
obtained consisting of a series of 2D slices acquired from the target body organ. This paper focuses on retrieval
of these 2D slices, the classes were formulated at global level i.e., images from different body parts were
divided into separate classes with respective body part label. In this way, the supervision is very weak and
requires very less time for labelling, hence decreasing the annotation effort required in training phase. For
medical imaging, this type of annotation is quite useful since, annotations usually require expert advice and
high cost. The CNN model is trained for classifying medical images in first phase and then the learned feature
representations are used for CBMIR in second phase. An in-depth analysis of the proposed system in terms
of retrieval quality is presented for a collection of images belonging to different imaging modalities. The
major contributions of this work are threefold,
I. A dataset is carefully collected that is multimodal and covers a wide range of medical imaging
target areas.
II. A deep learning framework is modelled and trained on a collection of medical images.
III. The learned features are used to present a highly efficient medical image retrieval system that
works for a large collection of multimodal dataset.
The rest of the paper is organized as follows. Related work is presented in Section 2, the proposed
methodology is discussed in Section 3, experimental results are shown and discussed in Section 4 and a
conclusion is presented in Section 5.
2. Related Work
In this section, existing work related to our research is briefly discussed.
2.1 Content Based Image Retrieval (CBIR)
A basic block diagram of generic CBIR system is illustrated in Fig. 1. In CBIR, the images are retrieved from
large databases, based on feature representations extracted or derived from the image content [9]. There are
typically two phases in any CBIR system, first one is offline and the other is online phase. In the offline phase,
features are extracted from large collections of images (used to train the system) to establish a local features
database. This phase is generally time consuming and depends on the number of training images used to train
the system. In the online phase, same features are extracted from the query image and a distance metric is
calculated between the features of query image and features of database images for similarity measure. Those
images having high similarity or low distance are then presented to the user as retrieval results. The procedure
used for pre-processing and feature extraction is same in both phases.
Offline Phase
Online Phase
Similarity Retrieved
Measure Images
Fig. 2. The proposed framework for content based medical image retrieval using deep convolutional neural network
3. Methodology
In this work, a classification driven framework for retrieving similar images from medical database is
proposed. A detailed representation of the proposed retrieval system is shown in Fig. 2. The underlying DCNN
model aims to learn filter kernel by producing a more abstract representation of the data in each layer. Despite
of its simple mathematics, DCNN is currently the most powerful tool in vision systems. The DCNN models
generally have three types of layers i.e., convolutional layers, pooling layers, and fully connected layers. The
output layer is generally treated separately as a special layer and the model gets input samples at the input
layer. Each convolutional layer produces feature maps by convolving the kernel with input feature maps. A
pooling layer is designed to down sample feature maps produced by the convolutional layers, which is often
accomplished by finding local maxima in a neighborhood. Also, pooling gives translational invariance and in
the meanwhile it reduces the number of neurons to be processed in upcoming layers. In fully connected layers,
each neuron has a more denser connection as compared to the convolutional layers. The part of the DCNN
before fully connected layers is known as feature extractor part and after that is known as classifier part. A
detailed description of the framework used is presented in following subsections.
3.1 Phase 1: Classification
The first phase in our proposed framework of CBMIR is the classification phase, in which a deep
convolutional neural network is trained for classifying medical images by following supervised learning
approach. For this purpose, the medical images are divided into various classes based on body part or organ
information, more details on the dataset can be found in Section 3. 2D images have been used for analysis,
hence the task is to classify each image into a class, which ultimately formulates into a multiclass image
classification problem. Typically, image classification algorithms have two modules i.e., feature extraction
and classification module. DCNN learns both hierarchy of deep convolutional features and classifier from the
training image data in an end-to-end learning framework. Deep learning algorithm learns low-level, mid-level
and abstract features directly from the images as opposed to making domain specific assumptions, which is
the case for handcrafted features. Hence, it can identify the class of a query image more effectively and
therefore the learned features can be used for image retrieval task. Inspired by this property, a DCNN model
is trained and optimized for multiclass classification problem. The learned features are extracted from the
trained model for retrieval. The details about the model architecture and training are as follows.
Backpropagation is the most frequent algorithm used for training artificial neural networks, that is coupled
with some optimization technique e.g., SGD. In back propagation, gradients are calculated with respect to all
parameters. The SGD gets calculated gradients as input and updates them while trying to minimize the
objective function. To compute gradients of loss, backpropagation requires the known target of each input
i.e., actual class label. The chain rule is used in an iterative manner for computation of gradients for each layer
with respect to loss function. There are three main phases of backpropagation in each layer i.e., forward pass,
backward pass, and derivative of the objective function with respect to layer’s parameters, if the layer is
supposed to have parameters e.g., convolutional layer. Some layers such as pooling layer do not have any
parameters and hence the derivative does not need to be computed.
Forward Pass
In forward pass, forward message is sent to compute all 𝑧’𝑠, where 𝑧 is function of input 𝑥, Eq. 2 gives it
mathematical notation.
𝑧 𝑙+1 = 𝑓(𝑧 𝑙 ), (2)
where 𝑙 is the layer number, and 𝑧 = 𝑓(𝑥𝑖 ).
Backward Pass
In backward pass, backward message is sent to compute all 𝛿′𝑠, where 𝛿 is the derivative of cost function
w.r.t 𝑧’𝑠, mathematically it is given as
𝑑𝐸
𝛿𝑖𝑙 =
𝑑𝑧𝑖𝑙
𝑑𝐸 𝑑𝑧𝑗𝑙+1 𝑑𝑧𝑗𝑙+1
= ∑𝑗 . = ∑𝑗 𝛿𝑗𝑙+1 ( ), (3)
𝑑𝑧𝑗𝑙 𝑑𝑧𝑖𝑙 𝑑𝑧𝑖𝑙
where 𝑖 is the unit’s index for layers, 𝑗 represents input sample, and 𝐸 denotes loss function. Eq. 3 is recursive
and hence backpropagation with SGD attempts to minimize the loss function recursively.
Fig. 4. An illustration of the backpropagation algorithm showing different layers and parameters
3.2 Phase 2: Features Extraction for CBMIR
Once the DCNN model is successfully optimized and trained for classifying the multimodal medical images,
features representations are extracted from last three fully connected layers of the trained model i.e. from
FCL1 - FCL3. For image retrieval task a locally established features database for the whole training data is
required. Therefore, to create such features database, each image 𝑥𝑖 from training set is feed forwarded to the
trained DCNN model for classification task and then features representation 𝐹1𝑖 , 𝐹2𝑖 , and 𝐹3𝑖 are extracted
associated to that specific image from fully connected layers 1 to 3 respectively. The 𝐹1𝑖 , represents a features
database extracted from FCL1 and similarly 𝐹2𝑖 and 𝐹3𝑖 represents features databases extracted from FCL2 and
FCL3, where 𝑖 = 1 𝑡𝑜 𝑃 and P is equal to number of samples in training set. Whenever a query is formulated,
similar images as that of query image are retrieved by comparing feature representations extracted for query
image (by passing query image from same trained model) and that of features database by using Euclidian
distance metric and is given as
𝑃
where ai and bi represent the query and database image features respectively. In addition, the predicted class
label (dashed line in Fig. 2) has been used to limit the search area in the database by reducing the number of
computations and eliminating irrelevant images from retrieval results. Those images having low distance or
high similarity as compared to others are displayed as retrieval results to the user. Finally, comparative
analysis is performed for features representations extracted from FCL1 to FCL3 in terms of retrieval quality.
4. Experimental Results
In this paper, a popular and widely-used deep learning tool Torch7 [30] has been used for developing and
training the proposed deep learning framework. The simulations have been performed on Dell Inspiron 5520
Laptop with Ubuntu 14.04 having Intel Core i3 CPU with clock speed of 2.40 GHz with a RAM of 6.00 GB.
The proposed method has been evaluated in terms of classification and retrieval results.
(1) Brain (2) Liver (3) Stomach (4) Soft Tissue (5) Chest (6) Breast
(7) Renal (8) Thyroid (8) Phantom (10) Rectum (11) Bladder (12) Uterus
(13) Head Neck (14) Esophagus (15) Cervix (16) Prostate (17) Ovary (18) Colon
(19) Lymph (20) Pancreas (21) Kidney (22) Knee (23) Lungs (24) Eye
Fig. 5 Example images from each class in the dataset showing interclass variations
(a) Brain
(b) Liver
(c) Stomach
Fig. 6. Example images from different classes showing intra-class variations (a) Brain, (b) Liver and (c) Stomach
4.1 Dataset
The dataset used for the proposed CBMIR task has been collected from publicly available medical databases
and classes were formed according to the body organ e.g. lungs, brain, liver etc. There is a total of 24 classes
in the dataset used in this research, out of which data for 22 classes was acquired from various public databases
available at cancer imaging archive1. The other two classes contain data from Messidor [46] and an open
access website for knee images [47]. A total of 300 images were taken from each class on random giving a
dataset of 7200 images. The data from each class was divided randomly into training and testing set using
70% and 30% images for training and testing set. A total of 5040 and 2160 images were used in training and
testing set respectively. The training and testing sets did not contain any similar image. All the images were
in DICOM (Digital Imaging and COmmunication in Medicine) format except that of the images from
Messidor, as it contains images in TIF (Tagged Image File) format. All the images from each class were
resized to 256 × 256 and color images from Eye class were converted into grayscale. Numeric class labels
were assigned to classes for supervised learning. Example images from classes used in the dataset are shown
in Fig. 5 and Fig. 6, which emphasize that there is interclass as well intra-class variance among images
respectively.
4.2 Classification Performance
The performance of the proposed framework for classification task is evaluated in terms of average precision
(AP), average recall (AR), accuracy and F1 measure which are calculated as
1 𝑇𝑃
𝐴𝑃 = 𝑁 ∑𝑁 𝑖
𝑖=1 𝑇𝑃 +𝐹𝑃 , (6)
𝑖 𝑖
1 𝑇𝑃𝑖
𝐴𝑅 = ∑𝑁
𝑖=1 , (7)
𝑁 𝑇𝑃𝑖 +𝑇𝑁𝑖
1 𝑇𝑃𝑖 + 𝑇𝑁𝑖
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 (𝐴𝑐𝑐) = ∑𝑁
𝑖=1 , (8)
𝑁 𝑇𝑃𝑖 +𝑇𝑁𝑖 +𝐹𝑃𝑖 +𝐹𝑁𝑖
𝐴𝑃 × 𝐴𝑅
𝐹1 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 = 2 × 𝐴𝑃 + 𝐴𝑅 , (9)
where, TP is true positive and denote the number of images from class 𝑘 and correctly classified, FP is false
positive and denote number of images not from class 𝑘 but misclassified as class 𝑘, TN is true negative and
denote number of images that are correctly classified as not belonging to class 𝑘, FN is false negative and
denote the images that are from class 𝑘 but are misclassified and ‘N’ represents the total number of classes
that equals 24 in this case. A 10-fold cross validation was used on the training data and the testing set was
tested 100 times giving an AP, AR and average F1 measure of 99.76, 99.77 and 99.76 respectively. The
confusion matrix is shown in Table 1, where each class gives an average accuracy of 100 % except stomach,
liver and bladder classes that gives an average accuracy of 98.9%, 97.7% and 98.9% respectively. The
classification performance is compared with single modality organ classification method [43] and is
summarized in Table 2. The proposed system performs better in classifying organs when applied to
multimodal data. Although the system trained in [43] is on a different set of image collection, but the high
accuracy achieved by our proposed system demonstrates the efficacy of the method in the classification task.
4.3 Retrieval Performance
The performance of the proposed framework for CBMIR has been tested using most frequently used
performance measure for CBIR systems i.e., Precision and Recall. The mathematical expression for precision
and recall are,
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑖𝑚𝑎𝑔𝑒𝑠 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = , (10)
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑖𝑚𝑎𝑔𝑒𝑠
1
www.cancerimagingarchive.net
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑖𝑚𝑎𝑔𝑒𝑠 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
𝑅𝑒𝑐𝑎𝑙𝑙 = . (11)
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑖𝑚𝑎𝑔𝑒𝑠 𝑖𝑛 𝑑𝑎𝑡𝑎𝑏𝑎𝑠𝑒
Table 1. Confusion matrix of medical image classification with 24 classes using DCNN.
Table 2. Comparison of classification performance of the proposed DCNN with single modality classification
algorithm.
DCNN trained 7200 5040 2160 MR, CT, 24 99.76 99.77 99.76
on whole PET, PT,
images OPT
(proposed)
Feature representations from all three fully connected layers of the trained model have been used for retrieval
of medical images. Analysis of these representations has been performed in terms of retrieval quality using
both options i.e., with and without using the predicted class labels. The precision vs recall plots for feature
representations extracted from FCL1, FCL2, and FCL3 using class prediction and without using class
prediction are depicted in Fig. 7 and Fig. 8 respectively. It shows that precision is high for feature
representations of FCL1 as compared to feature representations of layer FCL2 and FCL3. The improvement in
performance in terms of precision is also evident in case of using class predictions.
The retrieval performance has also been evaluated using mean average precision with and without class
predictions. Fig. 9 and 10 shows the retrieved images, of query image from chest and renal classes with and
without class predictions respectively. The retrieval results are shown in a ranked order, where the most
relevant image found after feature comparison is presented first. The retrieved results demonstrate the
interclass variance. When class predictions are not used images from other classes are also retrieved
highlighted by red boxes in Fig. 9 and Fig. 10.
(a)
(b)
(c)
Fig. 9. Retrieval results for chest class (a) query image (b) retrieved images using class prediction (c) retrieved images
without using class prediction.
The system achieves a mean average precision of 0.53 and 0.69 without class predictions and with class
prediction respectively. The retrieval results are improved since the images retrieved only belong to the
predicted class by the classification framework.
4.4 Comparison
To evaluate our proposed deep learning based framework for medical image retrieval, a comparison is made
with some recent systems used for such task. As direct comparison was not possible, since to the best of our
knowledge, no standard medical dataset is available that can be used to benchmark the retrieval system.
Hence, two criterions have been used to make the comparison, one is classification accuracy, average
precision, and average recall for classification (Table 2) and the other is mean average precision (mAP) for
(a)
(b)
(c)
Fig. 10. Retrieval results for renal class (a) query images (b) retrieved images using class prediction c) retrieved
images without using class prediction.
retrieval given in Table 3. Although, [48] achieves a higher value for mAP, but they have only worked for a
single modality whereas, our proposed system work for multimodal data.
Table 3. Comparison of the proposed CBMIR using deep learning with state of the art systems in terms of mean
average precision
DCNN trained 7200 5040 2160 MR, CT, PT, PET, 24 0.53 without using
on whole images OPT class predictions
5. Conclusion
This paper proposes a deep learning based framework for content based medical image retrieval by training a
deep convolutional neural network for the classification task. Two strategies have been proposed for retrieval
of medical images, one is by getting prediction about the class of query image by the trained network and then
to search relevant images in that specific class. The second method is without incorporating the information
about the class of the query image and therefore searching the whole database for relevant images. The
proposed solution reduces the semantic gap by learning discriminative features directly from the images. The
network was successfully trained for 24 classes of medical images with an average classification accuracy of
99.77%. The last three fully connected layers of the network have been used to extract features for the retrieval
task. Widely used metrics i.e., precision and recall were used to test the performance of the proposed
framework for medical image retrieval. The proposed system achieves a mean average precision of 0.69 for
multimodal image data with class prediction. We intend to further improve the retrieval performance by using
a larger dataset and adapt the network for 3D volumetric applications by defining further classes that
incorporate the different geometric views of a 3D slice.
References
[1] M. Mizotin, J. Benois-Pineau, M. Allard, and G. Catheline, "Feature-based brain MRI retrieval for Alzheimer disease
diagnosis," in 2012 19th IEEE International Conference on Image Processing, 2012, pp. 1241-1244.
[2] G. W. Jiji and P. S. J. D. Raj, “Content-based image retrieval in dermatology using intelligent technique,” IET Image
Processing, vol. 9, no. 4, pp. 306-317, 2015.
[3] M. Ponciano-Silva et al., "Does a CBIR system really impact decisions of physicians in a clinical environment?," in
Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems, 2013, pp. 41-46.
[4] G. Quellec, M. Lamard, G. Cazuguel, B. Cochener, and C. Roux, “Fast wavelet-based image characterization for highly
adaptive image retrieval,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1613-1623, 2012.
[5] M. M. Rahman, S. K. Antani, and G. R. Thoma, “A learning-based similarity fusion and filtering approach for biomedical
image retrieval using SVM classification and relevance feedback,” IEEE Transactions on Information Technology in
Biomedicine, vol. 15, no. 4, pp. 640-646, 2011.
[6] F. Zhang et al., “Dictionary pruning with visual word significance for medical image retrieval,” Neurocomputing, vol. 177,
pp. 75-88, 2016.
[7] Y. Liu, D. Zhang, G. Lu, and W.-Y. Ma, “A survey of content-based image retrieval with high-level semantics,” Pattern
recognition, vol. 40, no. 1, pp. 262-282, 2007.
[8] J. Wan et al., "Deep learning for content-based image retrieval: A comprehensive study," in Proceedings of the 22nd ACM
international conference on Multimedia, 2014, pp. 157-166.
[9] K. K. Kumar and T. V. Gopal, "A novel approach to self order feature reweighting in CBIR to reduce semantic gap using
Relevance Feedback," in Circuit, Power and Computing Technologies (ICCPCT), 2014 International Conference on,
2014, pp. 1437-1442.
[10] Y. Bengio, A. C. Courville, and P. Vincent, “Unsupervised feature learning and deep learning: A review and new
perspectives,” CoRR, abs/1206.5538, vol. 1, 2012.
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in
Advances in neural information processing systems, 2012, pp. 1097-1105.
[12] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in European Conference on
Computer Vision, 2014, pp. 818-833.
[13] A. Karpathy et al., "Large-scale video classification with convolutional neural networks," in Proceedings of the IEEE
conference on Computer Vision and Pattern Recognition, 2014, pp. 1725-1732.
[14] G. Wu, W. Lu, G. Gao, C. Zhao, and J. Liu, “Regional deep learning model for visual tracking,” Neurocomputing, vol.
175, pp. 310-323, 2016.
[15] G. Hinton et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research
groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.
[16] S. Zhou, Q. Chen, and X. Wang, “Active deep learning method for semi-supervised sentiment classification,”
Neurocomputing, vol. 120, pp. 536-546, 2013.
[17] A. Babenko and V. Lempitsky, "Aggregating local deep features for image retrieval," in Proceedings of the IEEE
International Conference on Computer Vision, 2015, pp. 1269-1277.
[18] K. Lin, H.-F. Yang, J.-H. Hsiao, and C.-S. Chen, "Deep learning of binary hash codes for fast image retrieval," in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 27-35.
[19] A. K. Jain and A. Vailaya, “Image retrieval using color and shape,” Pattern recognition, vol. 29, no. 8, pp. 1233-1244,
1996.
[20] B. S. Manjunath and W.-Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Transactions on pattern
analysis and machine intelligence, vol. 18, no. 8, pp. 837-842, 1996.
[21] D. G. Lowe, "Object recognition from local scale-invariant features," in Computer vision, 1999. The proceedings of the
seventh IEEE international conference on, 1999, pp. 1150-1157.
[22] H. Bay, T. Tuytelaars, and L. Van Gool, "Surf: Speeded up robust features," in European conference on computer vision,
2006, pp. 404-417.
[23] J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W. Ngo, "Evaluating bag-of-visual-words representations in scene
classification," in Proceedings of the international workshop on Workshop on multimedia information retrieval, 2007, pp.
197-206.
[24] J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman, "Discovering objects and their location in images,"
in Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005, pp. 370-377.
[25] L. Wu and S. C. Hoi, “Enhancing bag-of-words models with semantics-preserving metric learning,” IEEE MultiMedia,
vol. 18, no. 1, pp. 24-37, 2011.
[26] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (SURF),” Computer vision and image
understanding, vol. 110, no. 3, pp. 346-359, 2008.
[27] X. Wangming, W. Jin, L. Xinhai, Z. Lei, and S. Gang, "Application of Image SIFT Features to the Context of CBIR," in
Computer Science and Software Engineering, 2008 International Conference on, 2008, pp. 552-555.
[28] Z. Mansoori and M. Jamzad, "Content based image retrieval using the knowledge of texture, color and binary tree
structure," in Electrical and Computer Engineering, 2009. CCECE'09. Canadian Conference on, 2009, pp. 999-1003.
[29] A. Alfanindya, N. Hashim, and C. Eswaran, "Content Based Image Retrieval and Classification using speeded-up robust
features (SURF) and grouped bag-of-visual-words (GBoVW)," in Technology, Informatics, Management, Engineering,
and Environment (TIME-E), 2013 International Conference on, 2013, pp. 77-82.
[30] L. Zhang, Y. Zhang, X. Gu, J. Tang, and Q. Tian, “Scalable similarity search with topology preserving hashing,” IEEE
Transactions on Image Processing, vol. 23, no. 7, pp. 3025-3039, 2014.
[31] M. Norouzi, D. J. Fleet, and R. R. Salakhutdinov, "Hamming distance metric learning," in Advances in neural information
processing systems, 2012, pp. 1061-1069.
[32] H. Jegou et al., “Aggregating local image descriptors into compact codes,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 34, no. 9, pp. 1704-1716, 2012.
[33] K. H. Hwang, H. Lee, and D. Choi, “Medical image retrieval: past and present,” Healthcare informatics research, vol. 18,
no. 1, pp. 3-9, 2012.
[34] L. Deng and D. Yu, “Deep Learning,” Signal Processing, vol. 7, pp. 3-4, 2014.
[35] A. G. e. Ivakhnenko and V. G. Lapa, Cybernetic predicting devices, CCM Information Corporation, 1965.
[36] R. H. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and H. S. Seung, “Digital selection and analogue
amplification coexist in a cortex-inspired silicon circuit,” Nature, vol. 405, no. 6789, pp. 947-951, 2000.
[37] X. Glorot, A. Bordes, and Y. Bengio, "Deep Sparse Rectifier Neural Networks," in Aistats, 2011, p. 275.
[38] X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," in Aistats, 2010,
pp. 249-256.
[39] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings
of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[40] M. Anthimopoulos, S. Christodoulidis, L. Ebner, A. Christe, and S. Mougiakakou, “Lung Pattern Classification for
Interstitial Lung Diseases Using a Deep Convolutional Neural Network,” IEEE transactions on medical imaging, vol. 35,
no. 5, pp. 1207-1216, 2016.
[41] G. van Tulder and M. de Bruijne, “Combining Generative and Discriminative Representation Learning for Lung CT
Analysis With Convolutional Restricted Boltzmann Machines,” IEEE transactions on medical imaging, vol. 35, no. 5, pp.
1262-1272, 2016.
[42] P. Moeskops et al., “Automatic segmentation of MR brain images with a convolutional neural network,” IEEE transactions
on medical imaging, vol. 35, no. 5, pp. 1252-1261, 2016.
[43] Z. Yan et al., “Multi-Instance Deep Learning: Discover Discriminative Local Anatomies for Bodypart Recognition,” IEEE
transactions on medical imaging, vol. 35, no. 5, pp. 1332-1343, 2016.
[44] H.-C. Shin et al., “Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset
characteristics and transfer learning,” IEEE transactions on medical imaging, vol. 35, no. 5, pp. 1285-1298, 2016.
[45] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by
preventing co-adaptation of feature detectors,” arXiv preprint arXiv:1207.0580, 2012.
[46] E. Decencière et al., “Feedback on a publicly distributed image database: the Messidor database,” Image Analysis and
Stereology, vol. 33, no. 3, pp. 231-234, 2014.
[47] Available: http://www.osirix-viewer.com/datasets/DATA/KNEE.zip, 01-12-2016.
[48] Z. Camlica, H. Tizhoosh, and F. Khalvati, "Autoencoding the retrieval relevance of medical images," in Image Processing
Theory, Tools and Applications (IPTA), 2015 International Conference on, 2015, pp. 550-555.
[49] K. Seetharaman and S. Sathiamoorthy, “A unified learning framework for content based medical image retrieval using a
statistical model,” Journal of King Saud University-Computer and Information Sciences, vol. 28, no. 1, pp. 110-124, 2016.