Thesis

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/304023365
Information Theoretic Similarity Measures for Robust Image Matching

(Multimodal imaging - Infrared and visible light)
Thesis · May 2016

DOI: 10.13140/RG.2.1.3209.1769
CITATIONS READS
0 6,888
1 author:
Chaimae El Ghouch
KTH Royal Institute of Technology
1 PUBLICATION 0 CITATIONS
SEE PROFILE
All content following this page was uploaded by Chaimae El Ghouch on 17 June 2016.
The user has requested enhancement of the downloaded file.

DEGREE PROJECT IN COMPUTER ENGINEERING,
FIRST CYCLE, 15 CREDITS
STOCKHOLM, SWEDEN 2016
Information Theoretic Similarity

Measures for Robust Image
Matching
Multimodal imaging - Infrared and visible light
JAMILA YUSUF ISSE
CHAIMAE EL GHOUCH
KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION
KTH Computer science
and Communication
Information Theoretic Similarity Measures for Robust Image Matching
Multimodal imaging Infrared and Visible light
Jamila Yusuf Isse and Chaimae El Ghouch
Degree Project in Computer Science, DD143X

Supervisor: Atsuto Maki
Examiner: Örjan Ekeberg
CSC, KTH 2016

Acronyms
CCRE Crosscumulative residual entropy

IR Infrared
MI Mutual information
SCV Sum of conditional variances
SCVD Sum of conditional variances of differences
SSD Sum of squared differences
1
Acknowledgements
Sincere gratitude to our supervisor Atsuto Maki, researcher and associate professor at the
Royal Institute of Technology (KTH) for the continuous support of our study and related
research. We thank him for his guidance, for his patience, motivation, and immense
knowledge that helped in all times of writing this thesis.
2
Abstract
This study aimed to investigate the applicability of three different information theoretic
similarity measures in image matching, mutual information (MI), crosscumulative residual
entropy (CCRE) and sum of conditional variances (SCV). An experiment was conducted to
assess the impact on the performances of the similarity measures when dealing with
multimodality, in this case in the context of infrared and visible light. This was achieved by
running simulations of four different scenarios using images taken in infrared and visible
light, and additionally with variations in amount of details to create different experimental
setups. Namely experimental setup A: unimodal data sets with more / less details and
experimental setup B: multimodal datasets with more / less details.
The result showed that the concept of multimodality provide a statistically significant effect
on the performances of all similarity measures. Observations were made that the similarity
measures performances also, when trying to match images with different amount of details,
differed from each other. This provided a basis for judgement on what measure to use as to
give as clear and sound results as possible depending on the variation of detail amount in the
data. With this study it was concluded that the similarity measure CCRE gave the most clear
and sound results in the context of multimodality concerning infrared and visible light for
both cases of more or less details. Even though the other similarity measures performed well
in some cases, CCRE would be to recommend as observed by this study.
Keywords : Image matching, multimodal imaging, similarity measures, MI, CCRE, SCV,
infrared, visible light.
3
Abstrakt
Denna studie syftade till att undersöka tillämpligheten av tre olika informationsteoretiska
likhetsmått vid matchning av bilder, mutual information (MI), cross cumulative residual
entropy (CCRE) och sum of conditional variances (SCV). Ett experiment genomfördes för att
bedöma hur de olika likhetsmåtten påverkades i kontexten av multimodalitet, i detta fall i
samband med infrarött och synligt ljus. Detta uppnåddes genom att köra simuleringar av fyra
olika scenarier med hjälp av bilder tagna i infrarött och synligt ljus, och dessutom med
variationer i mängden detaljer för att skapa olika experimentella uppsättningar. Nämligen
experimentuppsättning A: unimodala datamängder med mer / mindre detaljer och
experimentuppsättning B: multimodala datamängder med mer / mindre detaljer.
Resultatet visade att multimodalitet har en statistiskt signifikant påverkan på alla likhetsmått.
Observationer gjordes att likhetsmåttens beteenden, när man försöker matcha bilder med
olika mängd detaljer, skilde sig från varandra. Detta gav en grund för bedömning av vilken av
dessa likhetsmått som då kunde användas för att ge de mest tydliga och stabila resultaten som
möjligt beroende på variationen av mängden detaljer i datat. Med denna studie drogs
slutsatsen att likhetsmåttet CCRE gav mest de tydliga och stabila resultaten i samband med
multimodalitet gällande infrarött och synligt ljus för båda fallen av mer eller mindre detaljer.
Även om de andra likhetsmåtten också gav goda resultat i vissa fall, skulle CCRE vara att
rekommendera, som observerat i denna studie.
4
Table of contents
1. Introduction
...........................................................
6
1.1 Problem definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

1.2 Scope and constraints .................................................. 7
2. Background
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.1 Image matching
2.1.1 Areabased methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Featurebased methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Similarity measures
................................................... 9
2.2.1 Mutual information (MI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Crosscumulative residual entropy (CCRE) . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 Sum of conditional variances (SCV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Multimodality ....................................................... 11
2.4 Infrared Visible Imaging
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.4.1. Electromagnetic radiation (EM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1.1 Visible light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1.2 Infrared light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.4.2 Optical remote sensing utilizing both Infrared and visible light in imaging . 13
3. Method
..............................................................
15
3.1 Data selection ....................................................... 15
.

3.1.1 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
3.2 Experiment ......................................................... 17
3.2.1 Experimental setup A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 Experimental setup B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4. Results
.............................................................
19
4.1 Experimental setup A ................................................ 19
4.1.1. Thermal camera more details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.2. Satellite image less details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Experimental setup B
................................................. 20
4.2.1. Thermal camera pair more details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.2. Satellite pair less details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
5. Discussion
........................................................... 22
...............................................
5.1 Experiment evaluation 22
5.1.1 Experimental setup A Working with one modality. . . . . . . . . . . . . . . . . . . . .
22
5.1.1.1 Experimental setup A Recommendation . . . . . . . . . . . . . . . . . . . . .
23
5.1.2 Experimental setup B Working with two modalities. . . . . . . . . . . . . . . . . . . .
24
5.1.2.1 Experimental setup B Recommendation . . . . . . . . . . . . . . . . . . . . . 25
6. Conclusion
...........................................................
26
5
7. References
...........................................................
27
1. Introduction
Given an image, one challenge is to determine whether or not the image contains a specific
object or feature. This task can easily be solved by humans due to the capability of
recognizing objects through a process of structuring them into different categories, by making
a match based on their characteristics [1]. Despite the systems and technologies that are
developed today a problem remains because the matching requires reasoning from various
image attributes and extensive amounts of knowledge representation. Nevertheless, extensive
studies have been made in the area of image matching and registration to develop more
robust and accurate techniques. Because of this there are great progresses to the advantage of
many different fields.
A similarity measure can be regarded as a tool used to evaluate the spatial correspondence of
images and plays a fundamental role in image matching and registration [2]. There are
several measures of similarity and the usage of them differs since each similarity measure is
considered applicable or not depending on the data involved. When measures apply to images
with different modalities, information theoretic similarity measures are more applicable than
other standard similarity measures because of their strong roots in mathematics as well as
their ability to detect nonlinear changes in intensity. T
his concept was introduced by mutual
information which is a measurement of information that two random variables have in
common. Mutual information (MI) has given means to make similarity quantification with a
wider range of robustness, and by that giving means to the introduction of further
measurements such as crosscumulative residual entropy (CCRE) and sum of conditional
variances (SCV). Matching techniques have with this increased their robustness in more
challenged contexts that still remain but could have troubled the measurements with greater
impact before, such as transformations arisen from nonlinear brightness changes due to
multimodality.
When quantifying similarities between images there is a dependency in data where the
modality of the image has to be considered. There is a distinction in modality where images
can be of unimodal character or multimodal character, meaning dealing with either one
modality or with several modalities. The situation of multimodality is rather a constantly
common one when it comes to the use of similarity measures with information theoretic
approaches. With the possibility of comparing images with different modalities there are
advantages, such as being able to make comparisons from several perspectives. This could
give additional information that may be needed to further strengthen or validate conclusions.
As previously stated there exists a dependency in the data of images when measuring
similarities. These dependencies are caused in connection with the process of taking the
images where variations in intensity, angles and scale could be present. With this,
6
challenges occur in the stage of quantifying the similarities as these changes may not be
transformations of linear nature, much rather nonlinear nature, as employed in many cases
when dealing with multimodality [3].
Intensity changes are one of the most common challenging contexts, and matching images
with such differences is done regularly within many fields. Fields such as medical imaging,
satellite imaging and many more share this problem as they make use of imaging with
various types of lights. The infrared light is used in both of the mentioned fields, especially in
the latter one and images taken with such light is commonly compared with images of visible
light.
1.1. Problem definition
This report investigates image matching concerning nonlinear brightness changes, such that
arises in connection with multimodal imaging.The aim is to study three different similarity
measures in a comparative manner by using different sets of data including multimodal
images taken with infrared and visible light.
The investigation aims to answer the following questions: How do these similarity measures
perform depending on different data and how are their performances affected by multimodal
inputs in conjunction to the variations in data?
1.2. Scope and constraints
The focus of this report is on image matching concerning differences in image modalities in
conjunction with differences in data inputs. The interest is to test three different similarity
measures of information theoretic approaches to handle dealing with several modalities and
this is done by using images taken with infrared and visible light. The similarity measures in
question are mutual information, crosscumulative residual entropy andsum of conditional
variances .
7
2. Background
2.1. Image matching
Matching aims to establish the relationship between two images taken from different
viewpoints and/or by different sensors [4]. However, matching and registration are still very
challenging for modalities that capture different information, for instance visible and infrared
[5]. Although it may be difficult, using several modalities is a good way to extract more
information about a scene or targets in a scene. Image matching is applied in various fields
such as remote sensing and weather forecasting. When matching images, there are a lot of
techniques that can be used. In general, matching techniques may be classified as either
areabased or featurebased methods.
2.1.1. Areabased methods
Areabased methods are sometimes called correlationlike methods or template matching.

They are the one of the simplest of the matching algorithms. They use pixel intensity to
directly compute the similarity measure between images. This method makes use of a
template which is a crop from one image to perform matching with another image.
Areabased methods put emphasis rather on the feature matching step than on their detection.
No features are detected in these approaches so the first step of image registration is omitted.
Each image point to be matched is in the center of a small window of points in the first
image, the reference image, and this window is statistically compared with similarly sized
windows of points in the second image, the target image, of the stereo pair [6]. The
areabased methods are preferably applied when distinctive information is provided by grey
values rather than by local shapes and structures. This method establishes correspondences
between images by determining similarities in grey levels, and examples of areabased
similarity measures are mutual information and crosscorrelation [7].
2.1.2. Featurebased methods
Featurebased methods are typically applied when the local structural information is more
significant than the information carried by the image intensities. These methods allow to
register images of a completely different nature (like aerial photographs and maps) and can
handle complex image distortions [8]. The featurebased methods require the extraction of
basic image features, like blobs, corners, junctions and edges. Matching is then performed
between these features. Features are sometimes more stable with regards to reflectance
characteristics [9].
8
Consequently, the use of featurebased methods are usually employed in the images that
contain enough distinctive and easily detectable objects. This is usually the case of
applications in remote sensing and computer vision. The typical images contain a lot of
details (towns, rivers, roads and forests). On the other hand, areabased methods are
frequently used in the medical field because medical images are not so rich in such details.
Recently, registration methods using simultaneously both areabased and featurebased
approaches have started to appear [4]. However, a key issue in image matching is the choice
of similarity measure, which is a measure to quantify match between entities.
2.2. Similarity measures
Similarity measures are crucial when solving problems such as pattern recognition and
clustering or even classifications [10]. When selecting a similarity measure, some issues to
consider are, for instance, the modalities involved. When images are from the same modality,
similarity measures such as sum of squared differences Correlation ratio
(SSD) or (CR)
are
useful because images of the same type will have the same intensity on corresponding areas
[11]. Consequently, in order to deal with multimodal images, a similarity measure is required
to be robust enough to handle transformations such as nonlinear brightness changes caused
by differences in modalities [2]. As a result, information theoretic similarity measures are
more suitable as these measure the statistical relationship between pixel intensities and can
therefore easily detect nonlinear changes[12].
Interesting information theoretic similarity measures that are further explained in sections
Mutual information
2.2.12.2.4 are crosscumulative residual entropy
(MI), (CCRE), sum of
conditional variances(SCV).
2.2.1. Mutual information (MI)
The use of mutual information (MI) was introduced by Viola and Maes (1997) as a similarity
measure [15,16] and is a suitable similarity measure when dealing with multimodality. MI
relies on the concept of measuring the amount of the information a variable (image) contains
about another. MI also tend to be maximized when both images are in geometric alignment.
MI assumes a statistical relationship between images by analyzing the joint entropy of the
observed images which we denote as variables, deriving from a formal definition of the
entropy as the amount of uncertainty about a certain variable [13].
Given two images X and Y , the joint entropy H (X, Y ) and the individual entropy of any
variable H (X) can be computed as follows [14,15]:
H (X, Y ) = ∑ ∑ P (x, y) log P (x, y)

x∈X y∈y
9
H (X) = ∑ P (x) log P (x)
x∈X
Where P (variable) is the probabilistic distribution of the variable.
Consequently MI combines both the joint entropy of both variables and the individual
entropy of variables to result to the following [14]:
M I(X, Y ) = H(X) + H(Y ) − H(X, Y ) (1)
MI between two images can be maximized by maximizing the individual entropies and
minimizing the joint entropy. Although MI has been a useful tool in image registration, it has
a drawback which is that it does not take into account neighbourhood relationships. Also it
does not consider the spatial correspondence that exists among pixels [15].
2.2.2. Crosscumulative residual entropy (CCRE)
Crosscumulative residual entropy was introduced by Wang and Vemuri (2006) as an

information theoretic similarity measure. It relies on the concept of measuring the entropy by
using cumulative distributions and derives from the cumulative residual entropy (CRE). The
key strength of CCRE over MI is that it includes its significantly larger noise immunity and a
much larger convergence range over the field of transformations [16].
Given two images X , Y , CCRE is defined as the following:
C (X, Y ) = ξ(X) − E[ξ(X|Y )] (2)
Where ξ(X) = − ∫ F (λ) log F (λ) dλ and

R+
F (X) = P (|X| > λ)
in which ξ(.) is the cumulative residual entropy of a variable and F (.) the cumulative
residual distribution replacing the density function H (.) (see equation 1) that is used in the
MI because it is more used than the density function. Also it enables the preservation of the
principle that the logarithm of the probability of an event should represent the information
content in the event [16].
2.2.3. Sum of conditional variances (SCV)
In comparison to earlier proposed methods such as MI and CCRE, SCV was introduced in
order to achieve better performance in terms of computational complexity. SCV makes use of
joint distribution of image intensities in the same way as earlier information theoretic
similarity measures, but it generates it directly as a histogram. With this SCV makes use of
the statistical property of assumption, dealing with two images, that the adjacent intensity of a
10
group of clustered pixels in one image should be clustered in a similar way in the other
image. This approach showed that the method was rather robust against nonlinear brightness
changes and had a better performance having a lower computational complexity [2].
To calculate the sum of conditional variances two images are taken X, as the so called
reference image and Y, called the target image. A partition of the target image Y is made into
nb disjoint bins, Y (j) that correspond to intensity regions of the reference image X, X(j) ,
where j = 1, ..., nb . Then the matching value is obtained by summing the variances of the
intensities of each bin of the target image Y (j) ;
nb
2
SSCV (X, Y ) = ∑ E[(Y i − E(Y i)) | X i ∈ X (j)] (3)
j=1
Where i = 1, ..., N p , where N p is the total number of pixels, in Y i and X i which indicate the
pixel intensities of X and Y .
2.4. Multimodality
Multimodal imaging refers to the area of imaging within several modalities. Working in
different modalities gives an opportunity in image analysis to provide extensive information
in comparison to when working in one modality alone [17]. There are many fields of study in
which this is applied and thus it is a highly interdisciplinary field of research. Fields such as
cell and molecular biology, engineering and psychics, remote sensing, clinical neurology and
cardiology and many more.
Multiple image sources can be available for the same information: multispectral,
multitemporal, radar images can be acquired over a given scene. However, the combination
of sources can be of use to improve the classification of the observed materials.Within, for
instance, the clinical field there exist several imaging techniques to depict anatomical
structures as well as functional information of the human body. This yields further
improvements in understanding disease progression and origin, drug development,
neurological assessment, cancer screening and response monitoring [18]. Another area where
several modalities are used in conjunction with each other is the area of optical remote
sensing the cause for which is further explained in section 2.5.2.
The aim however is the same independent application of use, and that is to achieve spatial
colocalization of complementary information of structure and function [18].
Multimodality is widely used in various ways within many fields but they all have the same
purpose with this to make use of the advantages in every modality and with them all
together gain a better understanding as well as better accuracy in analysis.
11
2.5. Infrared Visible Imaging
The field of image registration is one that is utilized in a variety of ranges and its recent
advances have made that it is even more widely applicable in many areas of research such as
medical imaging, computer vision and remote sensing. However, there is yet a challenge
within image registration and that is the alignment of images taken of the same or similar
scene but by different modalities, such as images taken with infrared and visible light [5].
The concept of multimodality introduces a problem in this because of the different ways
information is conveyed by the modalities of the retrieved images. As for different types of
illumination, such as infrared and visible light, they come from different physical
phenomenon and thus visualize clear differences in an image. For instance, a pattern on an
object seen in the image of visible light may not be seen in the image of infrared light since
the pattern does not emit heat for it to be seen in an infrared image [5]. To understand these
differences further there will be an explanation of their properties, characteristics and
behaviour in the next following paragraphs.
2.5.1. Electromagnetic radiation (EM)
Electromagnetic radiation is a type of energy that takes many forms and two of them are
infrared and visible light. The EM consists of a massive range of wavelengths and
frequencies in which we have the two mentioned lights right next to each other in the span
(see figure 2.5.1) [19].
Figure 2.5.1: The electromagnetic spectrum where the infrared and visible light spectrums within it are seen as
well.
Source:
en.openei.org[27]
12
2.5.1.1. Visible light
With a frequency of 400 THz to 800 THz and a wavelength of 740 nm to 380 nm, visible
light is only a small part of the electromagnetic spectrum, and in fact the only light in the
spectrum that can be seen with human eyes. It is said that the most important characteristic of
visible light is that of color. Light at the lower end of the visible spectrum with a longer
wavelength is seen as red whereas light in the middle of the visible spectrum is seen as green
and at the upper end is seen as violet, as shown in figure 2.5.1 [20].
Visible light is used in imaging in many cases where the property of light scattering is
exploited. When light hits an object it is either absorbed by the object or it changes its
direction, it is the latter that is referred to as scattering [21]. Depending on the objects that the
light hits it scatters in different ways according to the object's material and emits radiation of
different wavelengths. This can be used to identify what the materials are and gives the
possibility to distinguish between them [22].
In the medical optical field, for instance, this property of light scattering is utilized when
visualizing soft tissues. It helps to distinguish between the tissues due to the many various
ways different types of tissues absorb and scatter light [23].
2.5.1.2. Infrared light
Infrared has a frequency of 3 GHz to 400 THz and a wavelength of 30 cm to 740 nm and this
is invisible to the human eye. This type of light can be felt as heat, as infrared radiation is
able to transfer heat. The infrared light can be used in a variety of ways and there is, within
its spectrum, a distinction between its wavelength where a shorter wavelength in this specific
spectrum is called nearinfrared
, a longer wavelength is called farinfrared, and in between
we have the intermediate wavelengths called midinfrared (see figure 2.5.1)
[24]
.
Within imaging the infrared light is used for its many good properties, one of them as
mentioned earlier, being the ability to transfer heat. It is used in what is referred to as thermal
imaging and as though infrared is considered “invisible” for the human eye the same does not
apply for instruments such as infrared cameras. This because they can sense infrared energy
in which they can visualize the infrared light emitting from warm objects such as bodies, and
this can be “seen”. Infrared is also used to monitor the earth by using satellites that sense the
emitted radiation of infrared from the planet. This also works to serve as aid to study changes
in land and sea temperatures [24].
2.5.2. Optical remote sensing utilizing both Infrared and visible light in imaging
Optical remote sensing being the science of obtaining information about different objects or
areas from a distance, such as from satellites, there is an acquisition of largely heterogenous
data of common geographical scenes as this can be done in many ways. This gives a
13
possibility to derive further information from different perspectives of one scene, leading to a
higher degree of accuracy as well as working to provide complementary information [25].
Within the optical remote sensing field of study both infrared and visible lights are used in
conjunction with each other when imaging. Both lights contribute to benefits where each
modality gives some exclusive details complementing each other. The infrared light might
give details on a certain important target in a region and the visible light to give details in that
region that might not appear in the infrared image. An example on application within the area
is fire monitoring, where certain satellites are utilized to do the monitoring. The monitoring is
done with the help of visible light to detect burn scars and smoke plumes from a fire and the
infrared light works to detect the actual hotspot, the region that the fire is and to see if there is
any active fire [26].
14
3. Method
As a start a literature study was performed and the research was based on information from
books and published reports in the field by professors within the area, as well as reliable
websites such as NASA’s. Scientific articles were collected to perform a qualitative content
analysis to gain as much reliable and relevant information as possible about the field. To limit
the number of hits and increase the relevance in the searches some keywords were used:
image matching, image registration, remote sensing, information theoretic similarity
measures, multimodal imaging, near/far/midinfrared, visible light and image fusion.
As the aim of the study was to examine factors that could affect performance, an experiment
was performed to put the different similarity measures to test. In interest to study these in a
more challenged context different experimental setups were conducted to give a view of the
performance for each measurement. The challenged context in question was that of
multimodality and the experiment was executed with different data sets including that.
In the subsections that follow there will be more detailed descriptions to the datasets used, the
experiment and its setups and how the simulations were done.
3.1. Data selection
When selecting data to use there were certain requirements that needed to be met. The data
should be of images that had been taken in infrared and visible light and the most important
factor was that they had to come in sets. The need was to have a pair of images taken in an
identical setting, capturing the exact same scene and preferably the same time to avoid any
further disturbances that might follow as a consequence. Other precautions that had to be
taken into consideration was that the images also had to be geometrically aligned and that
they were consistent with each other in terms of size and shape.
The final data sets chosen consisted of two pairs with variation in amount of details shown in
the images to create the different experimental setups and the cause for which this was
needed will be addressed and explained further in section 3.2.
15
3.1.1. Data sets
The chosen two datasets, with either less or more details are shown down below.
Satellite pair less details
Figure 3.1.1: Images taken with satellite in visible light (left, SatelliteV) and infrared (right, SatelliteI).
Source: gma.org [28]
Thermal camera pair more details
Figure 3.1.2: Images taken with thermal camera in visible light (left, CameraV) and infrared (right, CameraI).
Source:
mdpi.com [29]
16
3.2. Experiment
The experiments concerning both setups was done in the same manner. A random rectangular
region was selected in both images in each data set by a random displacement and direction.
For every change in distance between the regions in both images there was a maximum
distance that the different regions should stay within. For each offset, every step away in
distance, a number of trials were conducted in which the similarity value was calculated, and
to get an average value for every offset the mean was taken. This was done consequently
across all the similarity measures. The size of the region was set depending on the size of the
image in order for the experiment to work according to the different datasets, the offset was
however a fix number.
In the following subsections we will refer to the randomly selected regions in each image as
template and the distance between them as offset.
3.2.1. Experimental setup A
In this first setup the environment for the experiment consisted of two images, one that had
less details depicted and one that had more details depicted. In this setup the aim was to
match each image against itself to give a view of how well each similarity measure performs
in terms of the amount of distinguishable components. The images used for this was
SatelliteI and CameraV (see figures 3.1.1 and 3.1.2 in section 3.1) and in the table below
the settings for the simulation of setup A is shown.
Table 3.2.1: Settings for Experimental setup A.
This setup was done to give an indication to the significance of clarity in images when
performing image matching without multimodality in consideration and to set a base for
comparison with the next setup.
3.2.2. Experimental setup B
The environment for the second setup consisted of four images that were paired. Each image
in setup A was to be matched with another image of countermodality, i.e if the image was
17
taken in visible light then it will be matched against its correspondent image in infrared and
vice versa.
This conducted two pairs where the first pair was composed of SatelliteV and SatelliteI in
infrared and visible light with less details and the same for the second pair of CameraV and
CameraI (see figures 3.1.1 and 3.1.2 in section 3.1), but with more details. The table below
gives the settings for the simulations of this setup, which are the same as in setup A but now
with both pairs.
Table 3.2.2: Settings for Experimental setup B.
The aim of this setup was to examine how the introduction of multimodality matters when
matching images with different levels of clarity and how this affects the performance of each
similarity measure.
18
4. Results
The results of each simulation run is represented on both experimental setups of either single
images with more/less details matched against themselves and images of different modalities
with more/less details matched in pairs. The experiment was runned numerous times and the
plots chosen are representing one of many runs of each simulation. They show the distance,
steps in offset (xaxis) relatively the value of the similarity measure (yaxis).
In the position (x = 0, y = minimum value) the similarity value is the best in other words the
images are considered to be very similar, in best case identical. The further away the curve
goes, as the distance increase, the worse the similarity value gets, as seen in the yaxis. What
is defined as the minimal value in the yaxis depends on the similarity measure in use. MI and
CCRE give a better similarity value if the value in yaxis is as negative as possible, whereas
the minimal value SCV can get is 0.
4.1. Experimental setup A
4.1.1. Thermal camera more details
Figure 4.1.1:Result of running CameraV with more details using MI, CCRE and SCV.
Figure 4.1.1 show smooth and diminishing curves. The curves are steeply ascending in which
the similarity value increases at first and then it levels off as the offset increases. As seen the
plot of CCRE it shows a steeper increase in similarity value for larger offsets than for MI
where the values smoothly subsides. As for SCV, the values are also subsiding in a clearer
manner for greater offsets.
19
4.1.2. Satellite image less details
Figure 4.1.2: Result of running SatelliteI with less details using MI, CCRE and SCV.
The plots of figure 4.1.2 are less smoother than plots of figure 4.1.1 with relatively small ups
and downs in similarity value. The curves rise fairly quickly then begin to grow slower still
increasing in yaxis. The plot for MI shows a greater increase in similarity as the offset grows
than the plots for CCRE and SCV. The plots for CCRE and SCV are not as steep and rather
declining to increase in similarity value for the larger offsets.
4.2. Experimental setup B
4.2.1. Thermal camera pair more details
Figure 4.2.1 : Result of running images CameraV and CameraI with more details using MI, CCRE and SCV.
In figure 4.2.1 all the curves have random fluctuations. In the plot for MI and CCRE the
curves are approximated roughly exponential, although CCRE grows in a more clear manner.
In the plot of SCV, the curves fluctuations seem to be roughly constant in size ( ≈ 9000) over
the xaxis. Although the size of the fluctuations in the start may be slightly more than that at
later measurements.
20
4.2.2. Satellite pair less details
Figure 4.2.2 : Result of running images SatelliteV and SatelliteI with less details using MI, CCRE and SCV.
figure 4.2.1 the curves are smoother, however, the plot in figure 4.2.2

In rise more steadily
and consistently.The plots for MI and CCRE seem to have relatively small ups and downs
while SCV certainly shows a lot of upanddown fluctuations as the offset increases.
To interpret the results emphasis should be on the similarity values in the yaxis as well as the
increasing trend of the plots as they indicate how each similarity measure gives greater or less
values depending on data. This will be discussed in the next section.
21
5. Discussion
Image registration being the process of aligning and matching multiple images in the same
scene, it gives possibilities to track and compare differences in common images from
different perspectives. However, making this match between two images is not an easy task
due to the numerous factors that has to be taken into consideration. These factors set
limitations that the similarity measure techniques become strongly dependent on, limitations
in the form of being dependent on what data is given as input. In this thesis an investigation
was made on how these limitations came to affect the performances. An experiment was
conducted on this problem area to observe the behaviour of the information theoretic
similarity measures chosen to study.
In the following subsections there will be a discussion on the results evaluating the
experiment as well as an overall discussion on important points of the investigation.
5.1. Experiment evaluation
By setting up an experiment to answer the given thesis in section 1.2, interesting observations
were made. The two experimental setups were done with the purpose of giving information
of the amount of significance certain aspects in the input data had. Aspects such as the
amount of details each image contained and how that perhaps contributed to a worse or better
match (experimental setup A), and how the matching abilities would be further affected with
the introduction different modalities of the images in question (experimental setup B).
5.1.1. Experimental setup A Working with one modality
CameraV
CameraV (see CameraV figure 3.1.2) that was tested was an image taken with a thermal
camera in a grayscale displaying a scene with clearly distinguishable objects such as a house,
a fence, bushes and a road. This image contained many clear and distinct details easily seen
with the bare eye.
Matching this image with itself yielded the results seen in figure 4.1.1, and as seen the plots
show smooth curves for the two first measures MI and CCRE, and along with SCV their
curves seemed to also grow strictly for a starter to flatten out as the offset increased.
The interpretation of this is that as the distance between the compared templates of the both
images (two identical versions of CameraV) grows, it is natural that they deviate in
similarity as the two templates are no longer displaying similar parts. As the location of each
template changes, finally displaying different regions in both images, the similarity value
between them gets worse.
22
Up until an offset 10 pixels we can see that the similarity value increases a lot, which shows
that as the regions move 10 pixels away from each other the similarity value is greatly
affected. Later on from an offset of greater than 10 the curves for all measures gradually
subside from increasing and would highly likely converge in similarity value further away
which means that the value is as bad as it can get.
SatelliteI
SatelliteI (see SatelliteI in figure 3.1.1) that was tested was an image taken with a satellite,
meaning that the image was taken from a far distance displaying a part of the earth through
the clouds. This image does not contain clear and distinct details as in CameraV and what
could be said about it by looking at it with bare eyes is that mainly the clouds are seen. Then
there is a lot of unclear structures in a darker gray shade beneath the clouds which
presumably is the plot of earth displayed.
This image gave the results that could be seen in figure 4.1.2 and as seen the plots show less
smoothness. In comparison to the results of CameraV it is clear that there is a larger increase
to the curves as the offset increases.
By observing the similarity values in the yaxis of all the result plots for SatelliteI it is clear
that the value of similarity is a lot worse in comparison to the values of CameraV. We can
see that in this case the values are much greater for all offsets which indicates that there was
an difficulty in even performing a match for the most minimal offset.
5.1.1.1. Experimental setup A Recommendation
With other words, the results for SatelliteI in comparison to CameraV shows that there is a
greater difficulty in performing a match overall as the similarity values are much worse and
what could be said is that the amount of detail in the images clearly had impact on the
performance of the measures.
The observations in setup A showed that when image matching is done in an unimodal
context it is recommended to use CCRE for an easier match for images with more details.
The matching results of CCRE showed a much more stable and certain behaviour in
indicating if the images were similar or not for all offsets.
As for the case of matching images with less details in an unimodal context the similarity
measures show similar behaviours, however, MI is recommended to use. This is due to the
cause of the results for MI in a quite early stage already giving an indication on whether the
images matched are similar or not.
23
5.1.2. Experimental setup B Working with two modalities
CameraV and CameraI
CameraV that was taken in visible light was now compared to another version of itself taken
in infrared, CameraI. Being the image with more details it was a natural consequence that its
infrared version also contained as many details, and due to the characteristics of the infrared
light (see section 2.5) the details were differently highlighted and emphasized (see CameraV
and CameraI in figure 3.1.2).
Now working with more than one modality we can see that the plots for this type of matching
differs extremely from the previous ones. The plots show much apparent fluctuations in the
values though the similarity values in yaxis for each measure seem to stay within a short and
rather constant interval. There is a slight increase of the similarity value, but as the offset
grows it does not seem to get much worse.
To interpret this in comparison to experimental setup A we can see that there are clear
differences in the plots. When comparing CameraV to itself identically the results were clear
where the similarity value got worse relatively the offsets increase, but when comparing it to
its infrared version CameraI it shows that there is a much more uncertainty involved
although the values in yaxis is variating in a much less interval. This means that there is a
greater difficulty to determine if the images are similar or not when dealing with different
modalities. What could be said about the similarity value overall is that it is much worse for
all the measures than what they were in setup A as the value is greater, and with that worse,
in the yaxis for all plots in setup B in comparison.
In short, it is apparent that it showed to be much more challenging to match the images when
working with different modalities and that the value for the similarity for all cases was worse,
more uncertain and with that less reliable.
SatelliteV and SatelliteI
SatelliteV was now to be matched to itself in an version taken in visible light, SatelliteI (see
SatelliteV and SatelliteI in figure 3.1.1). This pair was the pair with less details and
previously in experimental setup A it was noted that there was a bit more difficult match to
make in comparison to the one with more details. What is seen in the result plots is that the
curves now, in comparison to the results in setup A, show to be increasing in a more linear
fashion, as the similarity values are becoming greater much more rapid. The values in the
yaxis seem to keep increasing and the trend continues to the maximum offset. To note is that
the similarity values for all the similarity measures also are greater for even the smallest
offset.
24
5.1.1.1. Experimental setup B Recommendation
Comparing the results of matching SatelliteV and SatelliteI that had less details to the
results of matching CameraV and CameraI with more details, it showed that the similarity
values variated in much greater intervals. This indicates that there is a lot more uncertainty
involved for the similarity measures when matching and comparing CameraV and CameraI
to determine if the images are similar or not.
The results showed that concerning a match between images with more details it is
recommended to use CCRE. The plots for all similarity measures show fluctuations in the
results indicating uncertainty in the matching process as the offset increases. However CCRE
is showing less uncertainty, and amongst the similarity measures.
As for the case of dealing with images with less details the results give that CCRE is
recommended to use. The results of CCRE showed to be more sound in indicating if the
images matched were similar or not for all offsets. Additionally the results show a curve that
grows with a linear behaviour which clearly argue that CCRE in a clear manner can
determine if the images matched are similar or not.
25
6. Conclusion
To draw a conclusion from the results of this study and discussion it appeared to show that
the performance of all similarity measures were affected by both variation in detail clarity and
differences of modality. The results clearly showed the great significance of having more
details, more distinguishable components within an image in both the cases of working with
one modality and several modalities. In the case of one modality it is better to have more
details as it is easier to make a match, but when there is several modalities involved it is better
to have less details as the difference between the images are not as apparent due to the
difference in modality itself.
Working with multimodality in conjunction to either more or less details, it showed to have a
greater impact of the performance for both of the cases of more or less details. To judge by
the observations made it showed that when matching images in a multimodal context there
are some recommendations on what similarity measures to use. Consequently, in order to
yield as clear results as possible since their performances differ. In the context of matching
images with more details or less details it is, as observed by this study, recommended to use
CCRE. The results for CCRE showed to be more sound in indicating if the images matched
were similar or not for all offsets for both cases, namely more or less details, in the context of
multimodality.
This is an early study on applicability of different information theoretic similarity measures

and thus far our experiment was limited to two datasets as there were restricting
requirements that had to be met (see section 3.1 data collection). However, future work will
investigate this area further and in a more comprehensive manner to generalize the results.
Further research on similarity measures could consider possible extensions of SCV such as
SCVD, sum of conditional variance of differences [3].
26
7. References
[1]
Biederman, Irving. "Recognitionbycomponents: a theory of human image
understanding." Psychological review 94.2 (1987): 115.
[2]
Rogelj, Peter, Stanislav Kovačič, and James C. Gee. "Point similarity measures for
nonrigid registration of multimodal data." Computer vision and image understanding 92.1
(2003): 112140.
[3]
Maki Atsuto, and Riccardo Gherardi. "Conditional variance of differences: a robust
similarity measure for matching and registration." Structural, Syntactic, and Statistical
Pattern Recognition . Springer Berlin Heidelberg, 2012. 657665.
[4]
Zitova, Barbara, and Jan Flusser. "Image registration methods: a survey." Image and
vision computing 21.11 (2003): 9771000.
[5]
Sonn, Socheat, GuillaumeAlexandre Bilodeau, and Philippe Galinier. "Fast and accurate
registration of visible and infrared videos." Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops . 2013.
[6]
Hannah, Marsha Jo. "Digital stereo image matching techniques." International Archives
of Photogrammetry and Remote Sensing 27.B3 (1988): 280293.
[7]
Joglekar, Jyoti, and Shirish S. Gedam. "Area based image matching methods—A survey."
Int. J. Emerg. Technol. Adv. Eng 2.1 (2012): 130136.
[8]
ElSamie, Fathi E. Abd, Mohiy M. Hadhoud, and Said E. ElKhamy. Image
Superresolution and Applications . CRC press, 2012.
[9] Barazzetti Luigi, "
Matching of Satellite, Aerial and Closerange Images "
[10] Cha, SungHyuk. "Comprehensive survey on distance/similarity measures between
probability density functions." City1.2 (2007): 1.
[11] Ulysses, J. N., and A. Conci. "Measuring similarity in medical registration." IWSSIP 17th
International Conference on Systems, Signals and Image Processing . 2010.
[12] Kalinić, Hrvoje, and Bart Bijnens. "A novel image similarity measure for image
registration." Image and Signal Processing and Analysis (ISPA), 2011 7th International
Symposium on . IEEE, 2011.
[13] Viola, Paul, and William M. Wells III. "Alignment by maximization of mutual
information." International journal of computer vision 24.2 (1997): 137154.
[14] Maes, Frederik, et al. "Multimodality image registration by maximization of mutual
information." Medical Imaging, IEEE Transactions on 16.2 (1997): 187198.
[15] Hasan, Mahmudul, et al. "Regisration of hyperspectral and trichromatic images via cross
cumulative residual entropy maximisation." Image Processing (ICIP), 2010 17th IEEE
International Conference on . IEEE, 2010.
[16] Wang, Fei, and Baba C. Vemuri. "Nonrigid multimodal image registration using
crosscumulative residual entropy." International journal of computer vision 74.2 (2007):
201215.
[17] Barillot, C., et al. "Data fusion in medical imaging: merging multimodal and multipatient
images, identification of structures and 3D display aspects." European journal of radiology
17.1 (1993): 2227.
27
[18]
James, Alex Pappachen, and Belur V. Dasarathy. "Medical image fusion: A survey of
Information Fusion
the state of the art." 19 (2014): 419.
[19] Science Mission Directorate. "Introduction to The Electromagnetic Spectrum"
Mission:Science . 2010. National Aeronautics and Space Administration. 10 May. 2016
http://missionscience.nasa.gov/ems/01_intro.html
[20] Science Mission Directorate. "Visible Light" Mission:Science. 2010. National
Aeronautics and Space Administration. 10 May. 2016
http://missionscience.nasa.gov/ems/09_visiblelight.html
[21]
Øgendal, Lars. "Light Scattering." (2013) ,
[22]
Shaw, Gary A., and Hsiaohua K. Burke. "Spectral imaging for remote sensing." Lincoln
Laboratory Journal 14.1 (2003): 328.
[23] National institute of biomedical imaging and bioengineering,Optical Imaging.
[24] Science Mission Directorate. "Infrared Waves" Mission:Science . 2010. National
Aeronautics and Space Administration. 10 May. 2016,
http://missionscience.nasa.gov/ems/07_infraredwaves.html
[25]
GomezChova, Luis, et al. "Multimodal classification of remote sensing images: a
review and future directions." Proceedings of the IEEE 103.9 (2015): 15601584.
[26] Global fire monitoring, NASA and NOAA Missions for Monitoring Global Fires
http://earthobservatory.nasa.gov/Features/GlobalFire/fire_5.php
[27] Infrared basics The electromagnetic spectrum,
http://www.protherm.com/infrared_basics.php
[28] Remote sensing visible and infrared imagery,
http://www.gma.org/surfing/sensing/imagery.html
[29]
Bai, Xiangzhi. "Infrared and Visual Image Fusion through Fuzzy Measure and
Alternating Operators." Sensors15.7 (201
5): 1714917167.
28
www.kth.se
View publication stats

Thesis

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Thesis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thesis

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Information Theoretic Similarity Measures for Robust Image Matching

Thesis · May 2016

The user has requested enhancement of the downloaded file.

Information Theoretic Similarity

JAMILA YUSUF ISSE

KTH ROYAL INSTITUTE OF TECHNOLOGY

Information Theoretic Similarity Measures for Robust Image Matching

Multimodal imaging ­ Infrared and Visible light

Jamila Yusuf Isse and Chaimae El Ghouch

Degree Project in Computer Science, DD143X

CSC, KTH 2016

CCRE Cross­cumulative residual entropy

1​.1 Problem definition​. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ​

1.1. Problem definition

1.2. Scope and constraints

2.1. Image matching

2.1.1. Area­based methods

Area­based methods are sometimes called correlation­like methods or template matching.

2.1.2. Feature­based methods

2.2. Similarity measures

2.2.1. Mutual information (MI)

H (X, Y ) = ∑ ∑ P (x, y) log P (x, y)

M I(X, Y ) = H(X) + H(Y ) − H(X, Y ) (1)

2.2.2. Cross­cumulative residual entropy (CCRE)

Cross­cumulative residual entropy was introduced by Wang and Vemuri (2006) as an

Given two images X , Y , CCRE is defined as the following:

C (X, Y ) = ξ(X) − E[ξ(X|Y )] (2)

Where ξ(X) = − ∫ F (λ) log F (λ) dλ and

2.2.3. Sum of conditional variances (SCV)

2.5.1. Electromagnetic radiation (EM)

2.5.1.2. Infrared light

3.1. Data selection

Satellite pair ­ less details

Thermal camera pair ­ more details

3.2.1. Experimental setup A

Table 3.2.1: Settings for Experimental setup A.

3.2.2. Experimental setup B

Table 3.2.2: Settings for Experimental setup B.

4.1. Experimental setup A

4.1.1. Thermal camera ­ more details

4.2. Experimental setup B

4.2.1. Thermal camera pair ­ more details

5.1. Experiment evaluation

5.1.1. Experimental setup A ­ Working with one modality

5.1.1.1. Experimental setup A ­ Recommendation

Camera­V and Camera­I

Satellite­V and Satellite­I

This is an early study on applicability of different information theoretic similarity measures

View publication stats

You might also like

Multimodal imaging Infrared and Visible light

CCRE Crosscumulative residual entropy

1.1 Problem definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1.1. Areabased methods

Areabased methods are sometimes called correlationlike methods or template matching.

2.1.2. Featurebased methods

2.2.2. Crosscumulative residual entropy (CCRE)

Crosscumulative residual entropy was introduced by Wang and Vemuri (2006) as an

Satellite pair less details

Thermal camera pair more details

4.1.1. Thermal camera more details

4.2.1. Thermal camera pair more details

5.1.1. Experimental setup A Working with one modality

5.1.1.1. Experimental setup A Recommendation

CameraV and CameraI

SatelliteV and SatelliteI