0% found this document useful (0 votes)
6 views23 pages

Module 4.1.2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views23 pages

Module 4.1.2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

CHAPTER 5

ADVANCED VISION SYSTEMS IN DETECTION AND ANALYSIS OF


CHARACTERISTIC FEATURES OF OBJECTS
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

Adam Wulkiewicz, Rafał Jachowicz, Sylwester Błaszczyk, Piotr Duch,


Maciej Łaski, Dominik Sankowski, and Piotr Ostalczyk
Institute of Applied Computer Science
Lodz University of Technology
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

90-924 Łódź, ul. Stefanowskiego 18/22


{awulkie, rjachowicz, sblaszc, pduch, mlaski, dsan}@kis.p.lodz.pl,
[email protected]

5.1 Introduction

Vision systems, as a basic technology, are one of the most complicated and,
at the same time, essential when it comes to building autonomous or
semiautonomous vehicles. Vision systems are designed to eliminate or to limit
the necessity of human observation and the probability of an occurrence of
mistakes. From many minor vision systems categories there are three main which
include all other. These are: measurement, inspection and guidance systems. The
purpose of a measurement system is to calculate dimensions of an object by
analysis of its digital image. Inspection systems are ordered to check or to
examine a given area and to detect any irregularities there. Guidance systems, in
turn, are designed to perform orders based on the perception system of the
machine. The vision system dedicated to vehicles which are to perform
autonomous or semiautonomous tasks is included in the last category: guidance
systems.
Nowadays, image processing and analysis is a rapidly developing branch of
science, which increasingly often finds its application in the industry. More and
more device control systems are based on information acquired by image data
processing. Furthermore, those systems repeatedly acknowledge image data as
the only reliable source of information (Zhang et al., 2011; Liu et al., 2009;
Arora and Banga, 2012; Choudekar at al., 2011). Due to the fact that image data

93
94 A. Wulkiewicz et al.

is often ambiguous, the entrustment of making key decisions in any technological


process to a system that is based on such data needs aid of additional image
processing and analysis support techniques. In addition, in cases where an image
analysis system is not a decision one, but only an image data display mechanism,
some valuable image processing support techniques can be applied. These are
tasks in which the image processing and the analysis system are only an
informer, not the manager, e.g. surveillance and scout.
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

Advanced vision systems are increasingly complicated and most of them are
based on the detection of characteristic features of objects, tracking and analysis.
Therefore, the best exposition of key features of objects is essential. This
approach has a special significance wherever the analysis of information acquired
from a single camera, or even from many cameras, but analyzed separately, is
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

insufficient. A human being is not capable of infallible information integration


from many different sources. Therefore, the fusion of images acquired from two
or more different sources to form a single image in order to enhance the quality
of analysis of an observed scene has a significant practical value and is a
continuously developed branch of image processing and analysis. Algorithms of
such a type find their application in medicine (registering and combining
magnetic resonance (MR), positron emission tomography (PET), computer
tomography (CT) into composites to aid surgery (Derek et al., 1994)), military
and surveillance (object detecting and tracking (Snidaro et al., 2009; Zin et al.,
2011) or in the industry (non-destructive evaluation techniques for inspecting
parts (Blum and Liu, 2005)). Potential advantages of image fusion include:
(1) Information can be read more precisely, in a shorter time and at a lower cost.
(2) Image fusion allows distinguishing features that are impossible to perceive with
any individual sensor.
(3) Compact representation of information.
(4) Extended spatial and temporal coverage.
(5) Exposure of all the objects present on the scene by the integration of images
taken from the same viewpoint under different local settings.
For example, one can imagine a human hidden in the woody area at night.
While using a Night Vision camera the observer is able to see details of the
environment and exact shapes of objects, but will hardly detect a camouflaged
human figure. In turn, on the Thermo Vision camera image the silhouette of a
hidden man will be clearly visible, but it will be hard to determine its location in
space, because the temperature of the surrounding elements of the environment is
nearly identical. On the integrated image both features would be presented: the
environment details and the temperature of the objects.
However, not only object exposition support is important, but also the aid of a
computer system wherever a human operator has to simultaneously observe the
Advanced Vision Systems in Detection and Analysis of Characteristic Features of Objects 95

scene and accurately control the robot’s actuators. In such cases, the help of the
system is a significant enhancement. Such solutions are already applied in the
automotive industry, e.g. semi-automatic parking systems.
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

Fig. 5.1 IBIS robot and its operator console (PIAP, Warsaw, www.antyterroryzm.com).

In Fig 5.1 the pyrotechnic robot IBIS constructed by PIAP from Warsaw is
presented. When operating such advanced devices, the operator is forced to
maintain ceaseless concentration, especially when performing dangerous target
tasks such as the examination of explosives. There are only a few available
solutions which offer advanced aid systems in such situations and even more, a
vast number of similar devices do not support even basic enhancement
mechanics such as inverse kinematics (Sun et al., 2012). Due to the necessity of
operator support during accurate mobile platform maneuvers, the authors of the
Mobile Robot operating system from the Institute of Applied Computer Science
developed a support mechanism of suspicious object observation by the robot’s
head.
In this paper two operator work aid algorithms are presented. The algorithms
use the robot’s head as a perception system and are based only on image
processing and analysis data. The algorithms are: Fusion of image data acquired
from different types of cameras (among others Thermo Vision and Night Vision)
96 A. Wulkiewicz et al.

and mobile platform positioning according to objects indicated by the operator.


The result of the output of the first algorithm is the exposition of chosen features
(often impossible to determine by using only one camera) of objects located on
the scene, e.g. size, temperature, etc. The result of the output of the second
algorithm reaches the optimal position of the robot’s head. The optimal position
is determined by the combination of safety and accessibility level to a suspicious
object placed on the ground.
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

5.2 Image fusion algorithm

Fusion techniques of images acquired from different types of cameras can be


classified into one of the following categories (Zhang, 2010; Al-Wassai et al.,
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

2011) (Figs 5.2, 5.3 and 5.4):

(1) The pixel/data level – the integration of raw data acquired from multiple sources
into one image which includes more information (presented in a synthetic way)
than each of the input images analyzed separately. The other possibility is to
expose differences between data collections acquired at different times.
(2) The feature level – on this level, the fusion of images acquired from different
sources into one image which can be used for later processing and analysis is
achieved by the extraction and combination of many types of features such as
edges, corners, lines or textures.
(3) The decision level – this fusion combines the results from multiple algorithms
into a field of the final fused decision.

Fig. 5.2 Structure of image fusion at pixel level.


Advanced Vision Systems in Detection and Analysis of Characteristic Features of Objects 97
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

Fig. 5.3 Structure of image fusion at feature level.


by KAINAN UNIVERSITY on 09/25/16. For personal use only.

Fig. 5.4 Structure of decision-level image fusion.

The issue of fusion of images acquired from different cameras in order to


enhance some interesting features is a complex task. In a common approach one
can find four major stages:

(1) The first stage of preprocessing – in this stage it is necessary to implement


mechanisms which allow one to geometrically fit many images acquired from
different cameras to each other. Those images usually differ from each other in
type and resolution, so special attention should be paid to the fitting mechanism.
The most important steps to proceed in this stage are: the intersection of two or
more input images determination, disparity calculation (disparity is a shift
between two images, caused by not identical location of the cameras) and object
deformation (caused by different perspectives). The technique which includes
such steps is commonly called multi-sensor image registration.
98 A. Wulkiewicz et al.

(2) The second stage of preprocessing – in this stage a segmentation of the region
of interest of the acquired images is performed. In the case of infrared cameras
in such regions the attributes of objects which are not visible by normal optic
sensors can be included. The L-band SAR (Synthetic Aperture Radar) can, for
example, detect metal objects hidden under leaves, a tent, clothes or those
which are painted.
(3) The third stage of preprocessing – this stage main functionality is the extraction
of attributes or features of objects (or of regions of interest) and their valid
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

description. This means that in the case of a Thermo Vision camera a proper
temperature to proper objects on the scene will be assigned. In turn, in the case
of SAR objects, the information of metal detection is assigned, etc.
(4) Image integration – in the last stage information about individual objects (which
was extracted in the second and third stage of preprocessing) on the separate
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

images is fused. Not only can individual objects be analyzed and fused, but also
regions of interest. The result of this stage is nothing more than an output image
which is a combination of all component images with clear and unequivocal
enhancement of characteristic features of individual objects or regions.

It may seem that the three first stages are not strictly connected with the issue
of image fusion but they are essential and necessary to perform a valid image
data fusion process.

5.2.1 Image fusion algorithm description

The presented algorithm is depicted on the following block diagram (Fig. 5.5):

Fig. 5.5 Block diagram of the proposed algorithm.


Advanced Vision Systems in Detection and Analysis of Characteristic Features of Objects 99

5.2.2 Image intersection determination

There are two major types of differences between images acquired from different
types of cameras. The first type is image spatial mismatch, which occurs very
often and is caused by using two (or more) cameras with different resolutions
and/or objective focal lengths. Differences between such images can be repaired
by performing proper spatial transformation. The second type of differences is
connected with the occurrence of such factors as lighting changes, using different
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

sensors, moving objects on the analyzed scene, etc. These differences, which do
not come off after image spatial mismatch, cannot be eliminated in the image
registration process stage.
The difference between sizes of images acquired from different sources is
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

caused by the use of different types of cameras. Due to that fact, in the end there
will be two (or more) different resolutions of images of the scene analyzed.
Sometimes it is even impossible to avoid that kind of situation, because some
types of cameras have originally a lower resolution (e.g. Thermo Vision cameras)
than others (similar quality but different sensor type, e.g. Night Vision or Day
Light cameras). In addition, using different types of cameras (as in the approach
presented) causes a necessity of using objectives with significantly different
parameters. Therefore, the calibration stage is very important.
Due to the relation between viewing angles of camera sets and focals of their
objectives, it is necessary to preliminarily calculate the final size (in pixels) of the
analyzed scene. This operation is essential because in the next steps it will be
necessary to have the same objects on both images (when using two cameras
observing the scene). Additionally, to support performance speed boost these
elements which are not present on all the images of the analyzed scene will be
omitted. The issue described is illustrated in Fig. 5.6:

Fig. 5.6 Difference between acquired scenes from cameras with different parameters.
100 A. Wulkiewicz et al.

Concluding the figure 5.6 the intersection of two acquired scenes is the
fragment presented in the figure 5.7:
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

Fig. 5.7 Intersection of acquired scenes.

For the preliminary mathematical determination of the intersection, based on


by KAINAN UNIVERSITY on 09/25/16. For personal use only.

the knowledge about objective focals, the lens focal length and the relative
position of cameras, an estimated viewing angle of both cameras should be
calculated. Additionally, after final viewing angle estimation, the region of
analysis should be limited by ignoring those parts of the scene which are
inaccessible by any of the cameras. The mentioned region is shown in Fig. 5.8:

Fig. 5.8 Camera set region of analysis.

If two cameras do not have the same viewing angles (which is a usual case),
there will always be an area in one of the images which can be omitted during the
analysis. It can be determined by ignoring pixels which are cut off by the border
line of the viewing angle (Fig. 5.8). This line is set by drawing a smaller viewing
angle at the camera with a greater viewing angle.
The individual viewing angles of camera sets can be calculated by following
the approach presented below (Fig. 5.9):
Advanced Vision Systems in Detection and Analysis of Characteristic Features of Objects 101
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

Fig. 5.9 Relation between the camera viewing angle sets and its focal.

In Fig. 5.9 the vertical viewing angle is deliberately omitted, because cameras
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

are set in such a way that they lie on the surface which is parallel to the ground,
which eliminates a vertical perspective shift. In order to calculate the viewing
angle, a dashed triangle should be analyzed (Figs 5.9 and 5.10):

Fig. 5.10 Horizontal surface of the viewing angle.

Based on the trigonometric relation between the viewing angle α, focal F and
half of the scene width, one can calculate:

  arctg ( 2Fd ) (5.1)

The determined and calculated viewing angle of the camera sets is used to
determine which pixels can be omitted during analysis.

5.2.3 Image fusion on the feature level

The next issue connected with image integration is matching the particular
scene elements. The effect which is problematic is disparity, heavily used in
stereovision systems. Disparity is a shift between the same objects in the
102 A. Wulkiewicz et al.

separate images acquired from cameras which have different locations in a


stereovision set. The relative shift between objects in the analyzed images is
related to their distance from the stereovision set. Due to that fact, in the issue
presented, it is impossible to fuse images on the pixel level, because the
algorithm is not supported by information about the scene depth (distance to
individual objects).
In the case of fusing images acquired from different types of cameras,
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

presented by the authors, a characteristic feature detection based approach is used


to determine the terms of image integration. Characteristic features are first
detected in the first camera and then searched and matched to the same features
in the image acquired from the second one. Those features are: contours of warm
objects (acquired by a Thermo Vision camera) and contours of objects in rough
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

lighting conditions (acquired by a Night Vision camera). Due to the fact that one
of the major tasks of image fusion is to maintain characteristic features of the
objects present on the analyzed images, the methods which are based on feature
level integration yield subjectively better results than those which are pixel level
based (Samadzadegan, 2004).
The main step of this image fusion stage is the analyzed scene decomposition
to its characteristic features, e.g. edges. Next, the detected edges are connected in
groups and each group represents one object located on the scene. Matching
objects between the image from one camera to their equivalents in the image
from another camera is done by their shape analysis based on comparing the
extracted contours.
In the first step, on both images, the edges of all the objects are determined.
In order to perform such a task, one of the many edge detection methods can
be used, e.g. the Sobel Operator (Sobel and Feldman, 1973), the Prewitt
Operator, the Laplacian of Gaussian or the Canny Edge Detection algorithm
(Xin et al., 2012). Even more sophisticated methods of edge detection based
on a fractional order derivative can be used (Mathieu et al., 2003; Yang et al.,
2011; Gan and Yang, 2010). Due to the fact that the authors’ main target was
to develop an algorithm which will be implemented in the Mobile Robot, the
most important factor in choosing an algorithm was resistance to noise, which
is a common problem in mobile vision systems. The second important factor
was precision and detection quality. In addition, the authors wanted to choose
the fastest algorithm and they planned to place the main emphasis on its
optimization.
Therefore, the authors chose the Canny algorithm, which is very precise and
resistant to noise. The complex computation requirements of its internal
mechanisms were highly optimized to fit the needs of the project presented. To
Advanced Vision Systems in Detection and Analysis of Characteristic Features of Objects 103

conclude features of the chosen Canny algorithm, it is very important to: gain a
low error level, ensure that none of the existing edges is omitted during detection,
ensure that none of the false edges, which do not really exist, is detected,
properly locate the detected edges to their real position and avoid the multi-
detection of one edge (each edge should be detected only once). Based on the
above-mentioned criteria, the Canny algorithm in the first step that smoothes the
image analyzed in order to remove noise, then finds the image gradient to
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

highlight regions with high spatial derivatives. Finally, the threshold operation is
used to choose only the strongest edges.

5.2.4 Matching features detected on particular images


by KAINAN UNIVERSITY on 09/25/16. For personal use only.

In the image, an edge refers to an occurrence of a rapid image intensity transition.


Edge detection algorithms can find these transitions but usually the edge map,
which is the result of using the algorithms mentioned, consists of unconnected
short parts of real edges. In order to use such an edge map in high-level
algorithms in which objects (extracted from two images) matching algorithms are
included, the edge parts present on the map provided have to be grouped in
contours.
In order to connect the edges found, two operations are performed on the
image: dilation and erosion. For the purpose of contour extraction, the algorithm
introduced by S. Suzuki and K. Abe (Suzuki and Abe, 1985) is used. To simplify
calculations, the authors decided to choose only the outline contours of all sets of
the edges. Due to that simplification, smaller contours that lie inside the larger
ones are omitted.
In the next stage, all the contours from the thermo picture are compared and
matched with those from the high sensitivity picture. The comparison process can
be implemented by using a simple square mean difference or more sophisticated
and advanced methods, e.g. Pearson’s correlation coefficient, which is widely
used and optimized (Loğoğlu and Ateş, 2010). This method is described in detail
in 7.2.2. section (equation 7.2).
The presented in 7.2.2. section solution of determining the correlation
between two images is used also in the camera set positioning algorithm. This
calibration process is performed based on the relation between the camera set
position and the object position lying underneath. By means of Pearson’s
correlation coefficient, succeeding fragments of the indicated object, which are
acquired from the previous frames, are located. Then, the proper command is
sent to the mobile platform in order to robot’s head reach the desired end
104 A. Wulkiewicz et al.

position. The camera set positioning algorithm mentioned above is presented in


chapter 5.3.

5.2.5 Image fusion

In this stage a new image is created based on data integration, which comes from
images acquired from each of available cameras. Based on the objects detected in
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

the process of the second stage and information about their correlation (similarity
in both images), it is possible to create a result image which consists of imposed
analyzed images. The final result of image fusion depends on the user’s
preferences, and it should preliminarily be determined which features and how
should be enhanced and marked. The main advantage of the presented solution of
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

image fusion is resistance to shift errors which may occur during the image
integration process. This is done because each object on the scene is analyzed
separately according to its own disparity. This process is described in more detail
in the following chapter.
Due to the fact that authors place the main emphasis on Thermo-Vision and
Night-Vision image integration, in the solution presented these two types of
cameras are used. In the integration process the contours (calculated from edge
positions) from a Thermo-Vision image are matched to those from a Night-
Vision image. This priority allows one to preserve all the elements of the Nigh-
Vision image with special enhancement of objects with a temperature
significantly different from that of the environment.

5.3 Head (camera set) positioning algorithm

Due to the nature of tasks to which the Mobile Robot made at the Institute of
Applied Computer Science is dedicated, its main aid subsystem is a vision based
camera set positioning mechanism. The main functionality of this mechanism is
to accurately drive towards a suspicious object lying on the ground in front of the
robot and to not ram or overrun it. The need of such a functionality results from
the fact that there is a big distance between the robot and the operator during the
task performance. The only available source of vision information is a camera set
installed in the robot’s head. Its position above the suspicious object is vital and
should be as optimal as possible. A situation in which such a vision aid
subsystem is useful is presented in Fig. 5.11:
Advanced Vision Systems in Detection and Analysis of Characteristic Features of Objects 105
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

Fig. 5.11 Ground observation.

Nearly all inspection robots equipped with a camera installed on an arm


manipulator are capable of performing close observation of an object keeping a
safe distance between this object and the front of the drive platform.
Unfortunately, such task performance needs the presence of a highly qualified
and experienced operator. The assumption of the positioning algorithm project is
the application of image data acquired from the robot’s head as the only source of
orientation information in a user-friendly aid subsystem which does not require
any qualified personnel. Such ground observation mechanisms have already been
researched and developed, and are in use (Kawamata et al., 2002).
Besides the image sequence, the algorithm is supported only by the operator
mark i.e. a spot on the screen of the console, which was pointed by the operator
in order to determine a suspicious object. In Fig. 5.12 an example view from the
robot’s head observing the ground (inside the building) is presented:

Fig. 5.12 View from the robot’s head observing the ground.
106 A. Wulkiewicz et al.

Owing to the fact that the algorithm cannot assume that the operator will mark
the location of a suspicious object sufficiently well, the nearest neighborhood of
the pointed spot should be analyzed in order to find places of reference which
will be used for the camera set positioning process. The authors of the algorithm
presented decided to use template matching mechanisms to compare the analyzed
image (frame) and fragments of the previous frames which include the places of
reference mentioned. To compare such a pair of images the authors used
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

Pearson’s correlation coefficient (Loğoğlu and Ateş, 2010).


It is assumed that the implemented functionality (camera set positioning) is
performed in the forward-backward direction of the drive platform of the Mobile
Robot. This means that, in the case of marking a suspicious object near the left or
the right border of the console screen, the rotation of the drive platform will not
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

be necessary. Such an approach can be adopted due to the fact that the robot’s
head is capable of rotating in the spherical system, so it will be able to face
towards the marked object.
Owing to the moving direction limit of the mobile platform during the camera
set positioning process, the algorithm can skip the deformation of objects placed
on the analyzed scene caused by different perspectives of the camera set in
different positions of the platform. Those deformations would be much more
intense if the platform could rotate during the positioning process. Nevertheless,
to ensure the minimization of incorrect detections of searched objects, the authors
decided that the searched template is a strip of pixels cut from the previous frame
which includes the marked object. The width of this strip is the same as the width
of the analyzed image. In the succeeding frames the place where the correlation
coefficient will be the highest (between the cut-off strip of pixels from the
previous frame and the current frame) is a place where the marked object is
located (only in the top-bottom image dimension, because the algorithm does not
analyze the left-right variation of the object location). In the case of the first
frame, the strip of pixels to match on the next frames will include the spot
marked by the operator. This kind of approach implies that not only the marked
object is being tracked but also any other objects that lie on the same vertical
position. Due to that fact, the algorithm is often supported by more than one
place of reference during performance. In Fig. 5.13 a case of its standard use is
presented. The operator marked a suspicious object with a slight error (the place
where the operator touched the console screen is marked as a cross with two
circles).
Advanced Vision Systems in Detection and Analysis of Characteristic Features of Objects 107
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

Fig. 5.13 Ground observation with suspicious object marking performed by the operator.
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

For the image acquired from a ground observation camera depicted in Fig. 5.13,
a fragment which will be searched for on the next frame is a strip of pixels with the
determined width (the same as the width of the image depicted in Fig. 5.13) and
height. The above-mentioned strip of pixels is shown in Fig. 5.14 and its vertical
zoom in Fig. 5.15. The height of the pixel strip depends on the computing power of
the computer executing the implementation of the algorithm presented. Due to the
fact that the presented functionality is crucial from the viewpoint of the robot’s
safety, incorrect detections caused by slow computing are not allowed to happen.
During the tests as well as in the target implementation, the authors decided to use
a constant height value of the pixel strip which is 50 pixels. The proposed value is
connected with a number of pixels necessary to fully display an object with sizes
similar to an antipersonnel mine in the image acquired from cameras dedicated to
the Mobile Robot project. To such objects sizes (pixel number needed to display)
research, the authors took into consideration the robot’s head position, which was
tilted over the test object at about 50 cm.

Fig. 5.14 A fragment of the image (presented in Fig. 5.13) chosen to search for on the next frame.

Fig. 5.15 Vertical zoom of a fragment of the image (presented in Fig. 5.13) chosen to search for on
the next frame.

While comparing images (current frame and the strip of pixels cut from
previous frame) a Pearson’s correlation coefficient is calculated repeatedly for
108 A. Wulkiewicz et al.

each vertical position on the current frame. The result of such comparing process
is a vertical vector of normed values of Pearson’s correlation coefficient and the
highest value index indicates a vertical position of a place where most likely lies
the marked object. Such position can be calculated from the following equation:
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

max ⋮ (5.2)

where:
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

imax – is the vector index referring to the position of the highest value of
Pearson’s correlation coefficient,
index() – is the function that returns an index to the given element,
max() – is the function that returns the maximum value of the given vector,
p1, p2, …, pn – are the values of Pearson’s correlation coefficient calculated by
comparing the process between the chosen strip of pixels cut from the previous
frame and the corresponding strip of pixels of the current frame (which is the
same size). To calculate those values, equation 7.2 described in 7.2.2. section is
used:
n – the size of the result vector which differs from the height of the current
image. The difference is caused by the necessity of comparing the whole strip of
pixels to the corresponding pixels of the current image. Therefore, comparing the
edges on the current image (the middle row of pixels of the pixel strip is on the
edge pixels of the current image) is impossible and comparison must be
performed according to the determined range of the current image pixel indexes.
The height of the result vector can be calculated from the following equation:

1 (5.3)

where:
hc – is the current image height,
ht – is the height of pixel strip cut from the previous frame (in the tests and in
target implementation this value was set to 50 pixels, as discussed earlier).

After conducting tests, the authors of the presented algorithm noticed a slight
dislocation of the final indication of the marked object when the camera set
reached the final position (above the target). This dislocation of indication and
Advanced Vision Systems in Detection and Analysis of Characteristic Features of Objects 109

real position of the marked object is caused by a slight deformation of objects in


the image. In turn, this deformation is caused by the change of perspective on the
succeeding frames during the drive platform movement. Due to that deformation
the highest value of Pearson’s correlation coefficient location was slightly shifted
from the real position of the marked object. The change in the perspective was
obviously caused by the movement of the robot. Despite the fact that the final
error was insignificantly small, the authors of the presented algorithm, due to the
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

crucial nature of the working conditions of the algorithm, implemented additional


security mechanisms. Those security systems are mainly based on the Region of
Interest of the current frame control. Reducing the region of analysis aims to
eliminate false indications which could occur if the whole image were analyzed.
If the highest value of Pearson’s correlation coefficient is located beyond the
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

neighborhood of the marked object, the drive platform will move towards that
location despite the fact that the object lies somewhere else. That kind of
situation can be caused by the uniformity of the observed ground. Then, the
differences between the correlation coefficients calculated for each vertical
position of the current frame image are minimal. What is more, when on the
scene there is an object similar to the marked one, the indication on the
succeeding frames can be drastically different from that of its real location.
Therefore, the authors of the presented algorithm decided to reduce the region of
analysis of the succeeding frame to the following range: from the current position
of the suspicious object to the center of the frame. In standard use case of the
algorithm presented, the final position will never be located beyond this area. For
the purpose of reduction of the region of analysis, the authors used the Region of
Interest (ROI) mechanism. During the algorithm performance and the platform
movement, ROI is progressively reduced. The ROI reduction described above
and the validity of its use are illustrated in Figs. 5.16, 5.17 and 5.18:

Fig. 5.16 Ground observation when more than one suspicious object lies on the scene. Marking of
the operator is clearly visible.
110 A. Wulkiewicz et al.
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

Fig. 5.17 Introduction of the region of analysis reduction. In the image an area between the vertical
center of the image and the current position of the pointed object is marked with a bright color.
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

Fig. 5.18 Region of analysis reduction during the camera set positioning algorithm performance
(the drive platform heads towards the marked object).

The bright area marked in Figs 5.17 and 5.18 is reduced while the drive
platform is approaching the position in which the camera set is tilted exactly over
the pointed object. The succeeding reduction of ROI illustrated secures the
algorithm from any unexpected behavior, especially when any of the objects
present on the scene is similar to the marked one. These situations can take place
especially when the ground is uniform and lacks characteristic features while the
marked object is sufficiently well camouflaged. Obviously, the ROI reduction in
the range from the current object position to the center of the current frame does
not secure one from false detections which could be located inside the already
reduced ROI. Then, in the worst case the algorithm will finish its task earlier than
the operator assumed but the platform will not run over the suspicious object
which was marked by the operator.
Besides the increased security against unexpected behavior provided by the
ROI control, the reduction of the region of analysis offers one more major
Advanced Vision Systems in Detection and Analysis of Characteristic Features of Objects 111

advantage. This is a performance speed increase caused by excluding significant


parts of the image from analysis. Additionally, the increase is greater when the
camera set is located closer to the final position. And that is when the
performance speed has the utmost importance in terms of accuracy.

5.4 Implementation
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

The solutions presented in this article were tested in the Mobile Robot built at the
Institute of Applied Computer Science. Due to the necessity of increased
efficiency and adaptations of the algorithm to work in real-time conditions, the
methods for optimization in computation time were applied. The algorithms
proposed were implemented on a computer with an Intel Atom Z530 (1,6 GHz)
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

processor which is installed in the Mobile Robot platform. The technologies


applied are: C++ with the use of the OpenCV library.

5.5 Results

During experiments conducted, the authors used two types of cameras: a Night
Vision camera and a Thermo Vision camera. In the case of the first algorithm
(image fusion) both cameras were necessary. In turn, to the second algorithm
(camera set positioning) only a Night Vision camera or even a Day Light camera,
which is also installed in the Mobile Robot’s head, was needed. To test quality,
both algorithms were implemented on a PC with an Intel(R) Core(TM)2 Quad
(2,83 GHz) processor. In the case of image fusion algorithm, the result of
enhancement of desired features (temperature) is presented on the following
images: Figs. 5.19, 5.20, 5.21 and 5.22 (selected frames from the image
sequence, displayed in real-time).

Fig. 5.19 Image acquired from a Night Vision camera.


112 A. Wulkiewicz et al.
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

Fig. 5.20 Image acquired from a Thermo Vision camera.


by KAINAN UNIVERSITY on 09/25/16. For personal use only.

Fig. 5.21 Edges detected from the picture presented in Fig 5.20.

Fig. 5.22 Fused image. Enhancement of areas with a higher temperature.

The results of the quality tests of the camera set positioning algorithm in
laboratory working conditions were satisfactory because no false detections and
marked object run overs were recorded.
Advanced Vision Systems in Detection and Analysis of Characteristic Features of Objects 113

The target implementation of the image fusion algorithm done on an Intel


Atom Z530 reached approximately 15 fps performance speed. In the case of the
camera set positioning algorithm, the authors managed to reach the human eye
processing fluency, which is approximately 25 fps. The speed was higher when
the marked object was close to the center of the image. The results of the quantity
tests of implementations of both vision aid algorithms are shown in the following
table (Table 5.1):
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

Table 5.1 Performance tests results of presented algorithms.

Intel(R) Core(TM)2 Intel Atom Z530


Quad (2,83GHz) (1,6Ghz)
[fps] [fps]
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

Image fusion
over 25 15
algorithm
Camera set positioning
~50 ~25
algorithm

5.6 Conclusions

Considering the image fusion algorithm, objects of distinguished temperatures


are marked in the result image and can be quickly located on the scene. As can be
noticed, the algorithm is resistant to disparity caused by different object distances
from the camera set. In addition, due to the use of a high sensitivity camera, the
feature enhancement can be performed in rough lighting conditions.
The results of the algorithm presented demonstrate immense possibilities of
the use of such a solution in surveillance tasks. It can be a powerful enhancement
of an observer workstation in many fields of interest.
As far as the camera positioning algorithm is concerned, a significant
enhancement of the operator console was obtained. After the use of safety
mechanisms the operator can easily and safely position the camera set in order to
better observe a suspicious object.

5.7 Further research

The authors of the image fusion algorithm presented plan the following:

(1) to adapt the solution and its implementation to enhance more object features
which can be detected by different types of cameras;
(2) develop the multiple (not only two) image source fusion;
(3) use numerous upgrades of the detection or comparison techniques which are
now available.
114 A. Wulkiewicz et al.

As for camera positioning algorithm, the authors plan to implement not only
one direction (forward-backward) calibration but also the rotation of the platform
to fully support object centering on the image.

References

Al-Wassai, F. A., Kalyankar, N. V. and Al-Zuky, A. A. (2011). Arithmetic and Frequency Filtering
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

Methods of Pixel-Based Image Fusion Techniques, International Journal of Computer


Science Issues (IJCSI), Vol. 8 Issue 3, p. 113.
Arora, M. and Banga, V. K. (2012). Intelligent Traffic Light Control System using Morphological
Edge Detection and Fuzzy Logic, International Conference on Intelligent Computational
Systems, Planetary Scientific Research Centre, Dubai.
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

Blum, R. S. and Liu, Z. (2005). Multi-sensor image fusion and its applications, special series on
Signal Processing and Communications, Taylor & Francis, CRC Press.
Choudekar, P., Banerjee, S. and Muju, M. K. (2011). Real time traffic light control using image
processing, Indian Journal of Computer Science and Engineering. Vol. 2, No. 1, pp. 6-10.
Derek, L.G. et al. (1994). Accurate Frameless Registration of MR and CT Images of the Head:
Applications in Surgery a Radiotherapy Planning Dept. of Neurology, United Medical and
Dental Schools of Guy’s and St. Thomas’s Hospitals, London, SEt 9R, U.K.
Gan, Z. and Yang, H. (2010). Texture Enhancement though Multiscale Mask Based on RL
Fractional Differential, International Conference of Information Networking and
Automation, Vol. 1, pp. 333-337.
Kawamata, S., Ito, N., Katahara, S. and Aoki, M. (2002). Precise position and speed detection from
slit camera image of road surface Marks. Intelligent Vehicle Symposium, Versailles, France.
Liu, X., Wang, Y., Liu, Y., Weng, D. and Hu, X. (2009). A remote control system based on real-
time image processing, Fifth International Conference on Image and Graphics, IEEE
Computer Society, Xi'an, Shanxi, China.
Loğoğlu, K. B. and Ateş, T. K. (2010). Speeding-Up Pearson Correlation Coefficient Calculation
On Graphical Processing Units, Signal Processing and Communications Applications
Conference (SIU), Diyarbakir.
Mathieu, B., Melchior, P., Oustaloup, A. and Ceyral, C. (2003). Fractional differentiation for edge
detection, Signal Processing, Vol. 83, No. 3, pp. 2421-2432.
Samadzadegan, F. (2004). Data Integration Related to Sensors, Data and Models, International
Archives Of Photogrammetry Remote Sensing And Spatial Information Sciences, Vol. 35, pp.
569-574.
Snidaro, L., Visentini, I. and Foresti, G. L. (2009) Multi-sensor Multi-cue Fusion for Object
Detection in Video Surveillance, AVSS '09 Proceedings of the 2009 Sixth IEEE International
Conference on Advanced Video and Signal Based Surveillance, IEEE Computer Society
Washington, DC, USA, pp. 364-369.
Sobel, I. and Feldman, G. (1973). A 3x3 Isotropic Gradient Operator for Image Processing, in R.
Duda and P. Hart (Eds.), Pattern Classification and Scene Analysis, pp. 271-272.
Sun, Z., He, D. and Zhang, W. J. (2012) A systematic approach to inverse kinematics of hybrid
actuation robots. Advanced Intelligent Mechatronics (AIM), Kachsiung.
Advanced Vision Systems in Detection and Analysis of Characteristic Features of Objects 115

Suzuki, S. and Abe, K. (1985). Topological Structural Analysis of Digitized Binary Images by
Border Following. Computer Vision, Graphics, And Image Processing 30, pp. 32-46.
Xin, G., Ke, C. and Xiaoguang, H. (2012). An improved Canny edge detection algorithm for color
image, Industrial Informatics (INDIN), Beijing.
Yang, Z., Lang, F., Yu, X. and Zhang, Y. (2011). The Construction of Fractional Differential
Gradient Operator, Journal of Computational Information Systems 7:12, pp. 4328-4342.
Zhang, J. (2010). Multi-source remote sensing data fusion: status and trends, International Journal
of Image and Data Fusion, Vol. 1, Issue 1, pp. 5-24.
Zhang, Y., Zhao, G. and Zhang Y. (2011). Design of a remote image monitoring system based on
Computer Vision in Robotics and Industrial Applications Downloaded from www.worldscientific.com

GPRS, International Conference on Machine Learning and Computing, IACSIT Press,


Singapore.
Zin, T. T., Takahashi, H., Toriu, T. and Hama, H. (2011). Fusion of Infrared and Visible Images for
Robust Person Detection, Image Fusion, Osamu Ukimura (Ed.), InTech.
by KAINAN UNIVERSITY on 09/25/16. For personal use only.

You might also like