Aircraft Detection and Tracking Using UAV-Mounted Vision System
Aircraft Detection and Tracking Using UAV-Mounted Vision System
Aircraft Detection and Tracking Using UAV-Mounted Vision System
12-2015
This Thesis - Open Access is brought to you for free and open access by Scholarly Commons. It has been accepted
for inclusion in Dissertations and Theses by an authorized administrator of Scholarly Commons. For more
information, please contact [email protected].
AIRCRAFT DETECTION AND TRACKING USING
A Thesis
of
by
Yan Zhang
of
December 2015
TABLE OF CONTENTS
Page
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background of the thesis work . . . . . . . . . . . . . . . . . . . . . 1
1.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Camera calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Pinhole model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Principle of camera calibration . . . . . . . . . . . . . . . . . . . . . 8
2.3 Camera calibration implementation . . . . . . . . . . . . . . . . . . 12
3 Video stabilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 State estimation theory . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 The Baysian filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 The Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.1 Derivation of the Kalman filter . . . . . . . . . . . . . . . . 20
3.4.2 Algorithm of the Kalman filter . . . . . . . . . . . . . . . . 25
3.5 Particle filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5.1 Derivation of the particle filter . . . . . . . . . . . . . . . . . 28
3.5.2 Algorithm of particle filters . . . . . . . . . . . . . . . . . . 31
4 Implementation of the particle and Kalman filters for video stabilization . 33
4.1 Camera model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Implementation of particle filter . . . . . . . . . . . . . . . . . . . . 35
4.3 Implementation of the Kalman filter . . . . . . . . . . . . . . . . . 39
4.4 Testing results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.1 Testing results for smooth linear motions . . . . . . . . . . . 43
4.4.2 Testing results for random motions . . . . . . . . . . . . . . 45
5 Object detection algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.1 Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2 Morphological processing . . . . . . . . . . . . . . . . . . . . . . . . 53
5.3 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4 Implementation of object detection algorithms . . . . . . . . . . . . 59
5.5 Results of object detection . . . . . . . . . . . . . . . . . . . . . . . 60
5.6 Remarks about algorithm selection . . . . . . . . . . . . . . . . . . 63
iv
Page
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
v
LIST OF FIGURES
Figure Page
1.1 Diagram of the vision-based sense and avoid system. . . . . . . . . . . 2
2.1 Pinhole camera model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Projection coordinates. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Calibration process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Calibration results from MATLAB. . . . . . . . . . . . . . . . . . . . . 13
3.1 State transition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1 Frames from the unstable video. . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Stabilized frames processed with Scheme A. . . . . . . . . . . . . . . . 44
4.3 Stabilized frames processed with Scheme B. . . . . . . . . . . . . . . . 44
4.4 Stabilized frames processed with Scheme C. . . . . . . . . . . . . . . . 44
4.5 Comparison of x-axis translation estimations for linear translation change. 45
4.6 Comparison of y-axis translation estimations for linear translation change. 46
4.7 Comparison of rotation estimations for linear rotation change. . . . . . 46
4.8 Frames from unstable video. . . . . . . . . . . . . . . . . . . . . . . . . 47
4.9 Stabilized frames processed with Scheme A. . . . . . . . . . . . . . . . 47
4.10 Stabilized frames processed with Scheme B. . . . . . . . . . . . . . . . 47
4.11 Stabilized frames processed with Scheme C. . . . . . . . . . . . . . . . 47
4.12 Comparison of the x-axis translation estimations for random translation
change. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.13 Comparison of the y-axis translation estimations for random translation
change. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.14 Comparisons of the rotation estimations for random rotation change. . 49
5.1 Object movement illustration. . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Object detection results for a synthetic video with dark clouds. . . . . 61
vi
Figure Page
5.3 Object detection results for a synthetic video with light clouds and added
noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 Object detection results for a recorded video with varying clouds. . . . 63
5.5 Object detection results for a video without cloud clutters. . . . . . . . 64
5.6 The SNR comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
vii
ABSTRACT
For unmanned aerial vehicles (UAVs) to operate safely in the national airspace
where non-collaborating flying objects, such as general aviation (GA) aircraft without
automatic dependent surveillance-broadcast (ADS-B), exist, the UAVs’ capability of
“seeing” these objects is especially important. This “seeing”, or sensing, can be
implemented via various means, such as Radar or Lidar. Here we consider using
cameras mounted on UAVs only, which has the advantage of light weight and low
power. For the visual system to work well, it is required that the camera-based
sensing capability should be at the level equal to or exceeding that of human pilots.
This thesis deals with two basic issues/challenges of the camera-based sensing
of flying objects. The first one is the stabilization of the shaky videos taken on
the UAVs due to vibrations at different locations where the cameras are mounted.
In the thesis, we consider several algorithms, including Kalman filters and particle
filters, for stabilization. We provide detailed theoretical discussions of these filters
as well as their implementations. The second one is reliable detection and tracking
of aircraft using image processing algorithms. We combine morphological processing
and dynamic programming to accomplish good results under different situations. The
performance evaluation of different image processing algorithms is accomplished using
synthetic and recorded data.
1
1. Introduction
Unmanned aerial vehicles (UAVs) have great potential in various military and
civil applications. To safely operate UAVs in the national aerospace where non-
collaborating flying objects, such as general aviation (GA) aircraft without automatic
To promote the research of this “seeing” technology, the National Aeronautics and
ticipate in this competition, and we started the thesis work to serve the ERAU team.
will make integration of UAS into the National Airspace System (NAS) possible. One
of the most difficult technical problems is to ensure safe separation with neighboring
non-cooperative aircraft that do not broadcast ADS-B messages. Though the compe-
tition was cancelled later, the research on this topic continued due to its importance.
Many technologies including radar (Moses, 2013) and computer vision (Rozantsev,
2009) have been researched to improve the sense and avoid ability for UAVs. This
2
thesis focuses on a potentially feasible and affordable way to address the problem
a vision system. Such a system provides advantages including light weight and low
power consumption compared to active sensors like radar and lidar (Zarandy, Zse-
The diagram of a vision-based sense and avoid system for UAVs (Zarandy, Zse-
drovits, Nagy, Kiss, & Roska, 2011) is illustrated in Fig. 1.1. The system works as
follows. First, the images are captured using the camera block at given time inter-
vals. Then the captured images are passed to an image processing block. After that,
detected and tracked, the aircraft’s position information as referenced to each image
is collected in the Data Acquisition block. The coordinate information about the
Navigation System (INS), Global Positioning System (GPS), and the local position
of detected aircraft relative to the image from the Image Processing block. The other
parts of the system are related to collision detection and avoidance control.
This thesis concentrates on the Image Processing block to acquire reliable detec-
first is that the image sequence captured from the camera is usually not stable due
to the shaky platform of the UAV. To handle this problem, we consider several algo-
rithms, including the Kalman and particle filters, for video stabilization. The second
and dynamic programming to accomplish good results under different situations. The
(Fergus, Singh, Hertzmann, Roweis, & Freeman, 2006), the in-plane camera rota-
tion is neglected and the camera motion is estimated using a proposed blur kernel
blurry image under the assumption that the camera motion is uniformly distributed
Lin et al. (Lin, Hong, & Yang, 2009) present a stabilization system using the mod-
ified proportional integrated (PI) controller to remove the shaking from the captured
videos while maintaining the panning motion of the camera. The motion compensa-
tion vector estimated in the paper is utilized to control the movement of the camera
platform through a PI controller. In (Matsushita, Ofek, Ge, Tang, & Shum, 2006),
4
the motion inpainting is implemented to enhance the quality of the stabilized image
sequences.
Particle filter (Mohammadi, Fathi, & Soryani, 2011) and Kalman filter (Song,
Zhao, Jing, & Zhu, 2012) for video stabilization are particular interesting as those
algorithms are applied in a wide range of fields (Zhou, Chelleppa, & Moghaddam,
2004). Also, these algorithms have been developed over years and proven to be
Now we consider target detection. Zarandy et al. (Zarandy et al., 2011) present
a way to detect and calculate the position of known-size aircraft. In the paper,
FlightGear, a flight simulator, is used to produce the simulated aerial aircraft images.
These images are then transmitted through Simulink to Matlab where the image
processing algorithms are applied. The proposed algorithm is completed in two main
steps. In the first step, the entire image is handled as a whole and then the region
of interest is extracted and processed in the second step. However, this method is
only effective in detecting the intruder aircraft in daylight situations when the cloud
contrast is medium or small. In complex situations where the contrast of the clouds
is high and the image is cluttered with obstacles, the proposed algorithm is not able
Jaron & Kucharczyk (Jaron & Kucharczyk, 2012) describe two detailed vision
system prototypes (a ground tracking and onboard detection and tracking system) to
solve the problem of positioning and detection with cameras only. An object identifi-
cation algorithm was developed and its position was estimated in the ground tracking
5
system. On the onboard system, a FAST (Features from Accelerated Segment Test)
feature detection and extraction algorithm is implemented for position estimation and
collision detection. This method, however, has not been implemented on hardware
In (Shah, 2009), the author attempts to find the size and location of an obstacle
by cameras. The idea behind this is to use one camera mounted on a UAV to detect
obstacles by means of feature points. Then the UAV flies around it in a circular path
while capturing the feature points at the same time. The advantage of such an idea
is that only one camera is needed. However, disadvantages, such as flying around
In (Gaszczak, Breckon, & Han, 2001), the authors present an approach for au-
tomatic detection of vehicles based on cascade Haar classifiers with secondary infor-
mation in thermal images. The presented results show successful detection under
varying conditions with minimal false detections. However, the algorithm must be
improved to detect aircraft in the real world. The improvements can be as follows.
First, the intruding aircraft detection performance at a long distance should be im-
proved. Secondly, since the Haar classifier needs to train hundreds of aircraft images
with different angles of view, prior information about the airplane like model and size
should be known.
Hajri (Hajri, 2012) provides the preliminary process for target detection and posi-
tion estimation. The article explores the computer vision detection algorithms. The
6
first algorithm uses edge detection and image smoothing possesses to achieve a high
detection rate, yet it exhibits high false alarm rates in highly cluttered image envi-
ronments at the same time. The other approach is to use morphological filters and
color-based detection. This method works effectively with the prior information of
the UAV color patch. Hence, it exhibits low detection rates in low lighting condition.
2009). The approach starts with the morphological filter that looks for high contrast
regions in the image that are likely to be aircraft. Next a classifier that has been
trained on positive and negative examples has been used. Finally, it tracks the can-
didates over time to remove false detections. The results of the proposed algorithm
Carnie et al. (Carnie, Walker, & Corke, 2005) combine the operation of morpho-
logical filter and dynamic programming to detect small aircraft in images with poor
signal to noise ratios. The results demonstrate the ability to detect distant objects
2. Camera calibration
Although camera calibration is not the main focus of this thesis, it is an integral part
of the work for vision-based sense and avoid. Hence, we present the work that we
have performed on this topic here. In the later chapters, we will assume the camera(s)
is calibrated.
In this thesis, a pinhole model camera is used for all calculations regarding size,
model assumes that there exists an opaque wall with only one small hole in the center
allowing only one ray of light to pass at a time. The ray then is projected onto the
image plane which is at the same distance as the focal length of a camera from the
aperture wall, as shown in Fig. 2.1. The advantage of the pinhole model is that the
height of the object on the image plane is relative to only one parameter. As it can
be seen from Fig. 2.1, the relation between the height of the object and the height of
f Z
= ,
h S
8
where f is the focal length of the camera, the distance between the image plane and
the pinhole, h is the height of the projected object on the image plane with regard to
the optical axis, Z is the distance between the object and the pinhole, and S is the
height of the object. The intersection of optical axis with the image plane in Fig. 2.1
The above model provides an easy way to calculate the distance between the
camera and the object, given S and h which can be obtained via image processing.
But in reality, the camera is far from the pinhole model. Ideally, the principal point
should be placed exactly in the center of the image plane. However, this can never
Image acquisition processes usually comprise one or more digital cameras, and
in reality, most cameras are not ideal pinhole models. Thus camera calibration, an
essential process for constructing real world models and interacting with real world
Image plane
P O
coordinates (Hruska, Lancaster, Harbour, & Cherry, 2005), is needed. Camera cali-
bration is used basically to find a number of internal and external parameters that
given in Fig. 2.2. The reference point for the homogeneous coordinates is O, the same
as the pinhole point shown in Fig. 2.1. The image plane is mirrored to the right side
of the pinhole point since it does not change the projection point. (xw , yw , zw ) denotes
the world coordinates while the image coordinates are represented by (xi , yi , zi ). Wc is
the point that the optical axis intersects with the world plane. The center of the image
plane, P , is the principal point. And zi equals to f since the principal point is the
reference point. Fig. 2.2 assumes that the two coordinate vectors are homogeneous.
Thus,
xw yw zw
= = ,
xi yi zi
which leads to xi = f xzww , yi = f yzww , where xi and yi are both distances from the image
center.
To transform from the object lengths to pixels, scaling factors kx and ky for x,
y directions are defined, respectively. The unit of the scaling factor is distance per
pixel. Also, the coordinates for the principal point P on xpix and ypix axes are defined
to be (x0 , y0 ) in pixels. Therefore, the pixel coordinates for the same projection point
xw
xp = x0 + kx f ,
zw
yw
yp = y0 + ky f .
zw
Moreover, the image plane often is a parallelogram instead of a rectangle. Thus there
The transformation from the world coordinates to the a new coordinates can be
formulated as
u f kx s x0 xw
v = 0 f k y y , (2.1)
y 0 w
w 0 0 1 zw
which leads to
u
xp = ,
w
v
yp = .
w
11
fx s x0
M =
0 f y y ,
0
(2.2)
0 0 1
y directions with pixel units. Note the elements in the calibration matrix are intrinsic
parameters.
The extrinsic parameters that need to be handled in the calibration process are
the lens distortions. The major lens distortions are radial and tangential distortions.
Radial distortion is caused by the inappropriate shape of the lens and tangential
distortion depends on the accuracy of the aligning lens with the image plane. In this
thesis, it is assumed that five distortion coefficients are extracted since higher order
distortion lenses are rare (Tsai, 2003). The lens calibration vector is given as
T
d = k1 k2 p1 p2 k3 .
The variables k1 , k2 are the coefficients of radial distortion, p1 , p2 are the coefficients
Based on what has been introduced above, there are nine parameters (not includ-
ing k3 ) that need to be found through calibration process. Such a complicated process
is made easy with the MATLAB calibration toolbox. The toolbox processes many
pictures of different point of views from the same object and outputs the camera
parameters. The tested object should have a distinctive and repeatable pattern so
that it would be easy to locate and track on the image. Therefore, a chessboard is
often used in the calibration since it has a repeatable pattern with black and white
squares. Furthermore, the more pictures taken from the various views of the object,
the more accurate the calibration is. So for the calibration, many pictures, say, 30,
are taken from different views of the chessboard and then processed with the camera
Fig. 2.3a demonstrates a chessboard in the feature extraction window in the MAT-
LAB calibration toolbox and Fig. 2.3b shows the simulated process of taking different
pictures. Meanwhile, the calibration matrix and the four distortion parameters of the
camera are computed. An example of the lens distortion and calibration matrix for a
GigE camera (IDS, n.d.) is displayed in Fig. 2.4a and Fig. 2.4b. Note that the matrix
in Fig. 2.4b is the transpose of the calibration matrix M in Eq. (2.2). With the
calibrated parameters, the real world homogeneous position of a known size object
matrices.
14
3. Video stabilization
A stable video is essential to achieve good target detection. The problem of stabilizing
video is one of the applications of state estimation. Therefore, before diving into the
details of video stabilization, let us discuss the theory and methods of state estimation.
as signal processing, computer vision, object tracking, etc. The evolution of those
dynamic systems is determined by the state of the systems, which are not directly
observable, and the inputs to the systems. The observed data, the measurement,
is related, to some extent, to the system state and can be leveraged to infer the
actual states (Haykin, 2009). Usually, the system is modeled as a discrete system
since it is not required to know the system data all the time. In the meanwhile, the
measurement at every time interval is available to the observer. The diagram of the
In Fig. 3.1, the system is represented by the hidden Markov model (HMM) and
the discrete state is denoted as xn , where n is the time step. The Markov model is a
random process that features the transition from one state to another. In the Markov
model, the current state is only dependent on the system’s previous state instead of
15
the entire past state sequence. The available measurement at time step n for the
system is denoted as y n .
3.2 Models
Two models are required to describe the state evolution and the measurement of
a dynamic system. Under the assumption that the current state xn evolves only from
xn = f (xn−1 ) + wn , (3.1)
where f is the system transition function which is also known as the evolution function
and wn is the system dynamic noise. Also, the measurement model is formulated as
y n = g(xn ) + v n , (3.2)
To estimate the system’s state, we need to use the above two models. In (Haykin,
The first case deals with the linear, Guassian models. In this case, the system
evolution and observation functions are both assumed to be linear. The dynamic
noise wn and observed noise v n are both additive, independent zero-mean Gaussian
processes. The Kalman filter, which will be discussed later, is often used to handle
this case.
The second case is the same as the first one, except that the system dynamic noise
wn and the observation noise v n are now assumed to be additive, independent non-
Gaussian processes. The tricky part for this case is the complicated non-Gaussian
processes. Thus a bank of Kalman filters may be used to solve the linear, non-
The third case is the same as the first case, except that the system evolution and
observation functions are nonlinear. The nonlinear model issue can be addressed by
two different solutions called local approximation and global approximation, respec-
tively. The extended Kalman filter is an example of the local approximation, where
the localized estimates are assumed to be linear. For the global approximation, the
so that the approximation boils down to solving a mathematic problem. The particle
The fourth case deals with the nonlinear, non-Gaussian models. In this case,
the system evolution and observation functions are both nonlinear, and the system
17
dynamic and observation noises are not only non-Gaussian, but also may not be
The problem of the discussed models can be tackled by a recursive Baysian filter,
such as the Kalman filter or a particle filter (Orlande et al., n.d.). The recursive
Baysian filter is a process to estimate the probability density function (pdf) using the
up-to-date measurements and the mathematical models. The filter is recursive be-
of processing the whole batch of data, the recursive method makes use of the cur-
rent measurement, previous system state, and system models to estimate the current
& Clapp, 2002). The Baysian filter provides a general framework for sequential state
estimation.
As can be seen from Fig. 3.1, at each time step, there will be a hidden updated
internal state and a new observable measurement. A filter can be repeatedly applied
to solve the posteriori pdf p(xn |y 0:n ) given all the the observations along with the
assumption of the initial pdf p(x0 |y 0 ) = p(x0 ). In general, two steps are involved in
the process and they are referred as state prediction and update.
18
In the prediction step, the priori pdf of the state at time step n, p(xn |y 1:n−1 ),
is obtained via the Chapman-Kolmogorov method (Perreault, 2012). Note that the
T
p(x, y) is the simplified version of p(x y) in this thesis.
Z
p(xn |y 1:n−1 ) = p(xn , xn−1 |y 1:n−1 )dxn−1
Z
= p(xn |xn−1 , y 1:n−1 )p(xn−1 |y 1:n−1 )dxn−1
Z
= p(xn |xn−1 )p(xn−1 |y 1:n−1 )dxn−1 .
Note that we have used the Markovian property of the system in the above equations:
if xn is independent of y 1:n−1 , given xn−1 , then p(xn |xn−1 , y 1:n−1 ) = p(xn |xn−1 ). The
pdf of the state transition p(xn |xn−1 ) can be inferred from Eq. (3.1) and the filtering
In the update step, the new measurement at time step n is used to calculate the
and
where we have used p(y n |xn , y 1:n−1 ) = p(y n |xn ) because Eq. (3.2) shows that the
Hence,
p(y n |xn )p(xn |y 1:n−1 )
p(xn |y 1:n ) = R .
p(y n |xn )p(xn |y 1:n−1 )dxn
Thus, the priori density is modified using the current measurement to get the required
posteriori density.
The basic procedure of the Baysian filter consists of two stages. Due to the noise
variations in the system, it is difficult to solve for the exact posterior probability
density function which is also addressed as the optimal Baysian solution. However,
the optimal Baysian solution can be achieved by applying restrictions on the system
model and noise. This is a situation where the state and measurement systems are
assumed to be linear systems with zero mean Gaussian noises. One example from
this category is the Kalman filter. However, in most cases where the linear, Gaussian
model is not suitable for the system, another approach, such as the paticle filter, that
approximates the probability density function, is utilized. In the sequel, the Kalman
and particle filters are studied and implemented to compare the applicability and
The Kalman filter is a linear recursive algorithm generating least square error so-
lutions (Orlande et al., n.d.). The Kalman filter finds the best current state estimate
based on the current measurement, previous state estimate, and mathematical mod-
els using the least square optimization method, which produces more accurate results
than just one single observed measurement. Whenever a new measurement comes
20
in, the filter updates the new estimate so that the error estimation vector between
estimated states and the real states is minimized. The recursive manner and compu-
tational efficiency of Kalman filter make it useful in a system where the estimation
accuracy and time constraints are highly required. For this reason, the Kalman filter
The Kalman filter is based on the assumption of linear discrete state-space and
Gaussian models, which update the state each time a new observation data is added.
The two models for the Kalman filter can be rewritten below. The state-space tran-
xn+1 = An xn + wn ,
where An is the system state transition matrix at time step n, xn+1 and xn are the
system states at time n + 1 and time n respectively, and wn is the system dynamic
y n = H n xn + v n ,
21
trix at time n, and v n is the observation noise at time n which is also modeled as
independent zero-mean additive Gaussian. Like the system noise, the measurement
Since it is an implementation of the Baysian filter, there are basically two steps
involved in the Kalman filter. The first step is the prediction process, which takes the
previous estimated states and then outputs the predicted current states based on the
given system transition function. This is also called priori estimation process. The
second step is to update the prediction, priori estimation, given the current observed
measurements to get more accurate estimation. The updated estimation is also called
Kalman filter.
ŷ n = H n x−
n . The measurement error is defined as the difference between observations
measurements at time step n, the priori state estimation can be updated to posteriori
the current priori estimation plus the weighted measurement error. The equation is
illustrated below
− −
x+
n = xn + K n (y n − H n xn ),
22
where K n is a weighted matrix which is also known as the Kalman gain. K n indicates
how much the measurement error changes the estimation (Perreault, 2012).
During the Kalman filter implementation, two estimation errors are computed for
the two stages. First, we calculate the priori estimation error, the difference between
the actual state and the priori estimated state, which is represented as e− −
n = xn − xn .
variance matrix P + + +H
n = E[en en ]. The Kalman filter outputs the estimated state
that produces the least square error with the actual state. In this case, K n should
formulate e+
n as
e+ +
n = xn − xn
= xn − (x− −
n + K n (y n − H n xn ))
= xn − (I − K n H n )x−
n − K nyn
= (I − K n H n )(xn − x−
n ) − K nvn
= (I − K n H n )e−
n − K nvn.
23
So we obtain
P+ + H
n = E[en en ]
= E[((I − K n H n )e− − H
n − K n v n )((I − K n H n )en − K n v n ) ]
= E[(I − K n H n )e− −H
n en (I − K n H n )
H
H − H H −H
+ K nvnvH H
n K n − (I − K n H n )en v n K n − K n v n en (I − K n H n ) )].
− H −H
Rn = E[v n v H
n ], and observe that E[en v n ] = 0 and E[v n en ] = 0. Therefore the
−
P+ H H
n = (I − K n H n )P n (I − K n H n ) + K n Rn K n
= K n (H n P − H H − − H H −
n H n + Rn )K n − K n H n P n − P n H n K n + P n .
A = H nP − H
n H n + Rn . (3.3)
− H −1 − H − H
P+
n = (K n − P n H n A )A(K n − P n H n A )
(3.4)
− P− H −1 −
n H n A H nP n + P−
n.
24
be zero. So we have
Kn = P − H −1
n Hn A
= P− H − H −1
n H n (H n P n H n + Rn ) ,
which leads to
−
P+
n = (I − K n H n )P n .
−
Note that both K n and P +
n are expressed in terms of P n since H n and Rn are
e− −
n = xn − xn
= An−1 e+
n−1 + w n−1 .
Hence,
P− − −H
n = E[en en ]
= E[(An−1 e+ + H
n−1 + w n−1 )(An−1 en−1 + w n−1 ) ]
= An−1 P + H
n−1 An−1 + Qn−1 ,
25
E[e+ H +H
n−1 w n−1 ] = 0 and E[w n−1 en−1 ] = 0.
It is obvious from the above derivation that the Kalman filter can be calculated
recursively to each newly acquired data. Assume that at time step n, the followings
are given: the system observation matrix H n , system observation noise covariance
matrix Rn , and the real measurement y n . Also assumed given are: the system tran-
sition matrix An−1 , the system dynamic noise covariance matrix Qn−1 , the posteriori
P− + H
n = An−1 P n−1 An−1 + Qn−1 ,
x− +
n = An−1 xn−1 .
26
Kn = P − H − H −1
n H n (H n P n H n + Rn ) ,
− −
x+
n = xn + K n (y n − H n xn ),
−
P+
n = (I − K n H n )P n .
els is to employ particle filters. Particle filters are the best examples of the Monte
Carlo method (Mackay, n.d.), a broad class of algorithms that repeatedly generate
random samples to get the results. A particle filter is also known as the CON-
pling, and the sequential Monte Carlo approach (Doucet, Freitas, & Gordon, n.d.). A
particle filter is a general and powerful method that can be applied in radar tracking,
the Kalman filter, particle filters do not require tight restrictions on system models.
The basic idea of the particle filter is to generate a set of random weighted samples
in order to estimate the posteriori probability density function. The joint posteriori
distribution of all the states is denoted as p(x1:n |y 1:n ), where x1:n represents all the
27
system states from the starting time up to current time step n. Correspondingly,
all the measurements of the system, denoted as y 1:n are available to use. However,
the actual sequential system states x1:n are hidden from the observer because of
the uncontrollable variables and noise in the system. Hence, it is very challenging
to obtain the real posteriori pdf p(x1:n |y 1:n ). The way the particle filter does is to
draw samples from the so-called importance density function which is designated as
q(x1:n |y 1:n ) instead of the actual posteriori p(x1:n |y 1:n ). Then p(x1:n |y 1:n ) can be
approximated by the summation of the weighted samples. The weighted samples are
denoted by {xi1:n , wni }, where {xi1:n , i = 1, ..., Ns } is a set of sampled points from
q(x1:n |y 1:n ). And the matching weight for each sample is illustrated by {wni , i =
PNs
1, ..., Ns }. Usually, the weights are normalized such that i=1 wni = 1. Since the
samples are from the importance density function, the weight for the ith sample can
be calculated as
p(xi1:n |y 1:n )
wni = . (3.5)
q(xi1:n |y 1:n )
Ns
X
p(x1:n |y 1:n ) ≈ wni δ(x1:n − xi1:n ),
i=1
where xi1:n are samples generated from the importance density function q(x1:n |y 1:n ),
samples from the importance density function q(x1:n |y 1:n ). We will prove that the
the approximation of p(x1:n−1 |y 1:n−1 ) and the new measurement y n at time step n,
p(x1:n |y 1:n ) can be computed. Every time a new observation is available, new samples
are generated from the importance density function. The importance density function
To simplify the above expression, the importance density function q(x1:n |y 1:n ) can
be chosen such that the factorization q(x1:n−1 |y 1:n−1 , y n ) = q(x1:n−1 |y 1:n−1 ), which
means the current measurement has no effect on the system previous states. Then
Therefore, it can be inferred that the updated samples xi1:n are generated by com-
bining the existing samples xi1:n−1 ∼ q(x1:n−1 |y 1:n−1 ) with the samples xin that are
The weights assigned to each particle should also be updated once the new samples
p(x1:n , y 1:n )
p(x1:n |y 1:n ) = ,
p(y 1:n )
Thus
p(y n |x1:n , y 1:n−1 )p(x1:n |y 1:n−1 )
p(x1:n |y 1:n ) = .
p(y n |y 1:n−1 )
we have
Under the assumption that the system follows a Markovian model, we have
. Furthermore, we have
since the current measurement is merely dependent on the current states. Besides,
By combining Eq. (3.5), Eq. (3.6), and Eq. (3.7), the weighting update can be
which means the density function is now only dependent on the previous value of the
system state and the current measurement. The assumption is consistent with the
31
idea of recursive filter since there is no need to store and compute the system past
Since the filtering distribution p(xn |y 1:n ) is the integration of the posteriori density
Ns
X
p(xn |y 1:n ) ≈ wni δ(xn − xin ).
i=1
There are usually two steps implemented in particle filters when a new observation
is obtained. The algorithm below is the basic form for a particle filter, which is also
2. Calculate wni for each new sample using Eq. (3.8) and normalize it.
The steps are applied repeatedly to get a new estimation for each time step. However,
which happens after several iterations of the particle filter, where a few samples have
large weights while most of samples are negligible (Arulampalam et al., 2002). The
degeneracy problem implies that we will waste the computations in updating the
weights of the negligible samples whose contribution to the posteriori pdf is almost
zero. One approach to address the degeneracy problem is to add the resampling
al., 2002). We will see that the sequential importance sampling algorithm is suited for
stabilizing the video in the next chapter, where implementations and testing results
are discussed.
33
stabilization
In this chapter, we consider the implementation of the particle and Kalman filters for
video stabilization. The camera model, the implemented algorithms, and the results
Due to the presence of the unexpected movement of the camera, the transforma-
tion between consecutive frames is related to the camera motion. The frames in an
unstable video suffer from the rotation and the translation of the camera. Assume
(x, y, f ), where f is the focal length. Then at the next frame, frame k + 1, due to the
the distance from the camera to the scene is far enough so that we can ignore the
change of the scale factor between the consecutive frames. Also, the rotation angle
between the image plane and the z axis is small (J. Yang, Schonfeld, Chen, & Mo-
hamed, 2006). Assume the rotation angle between the image planes of the two frames
34
formulated as
x0 cos(θk ) − sin(θk ) x Txk
= + ,
y0 sin(θk ) cos(θk ) y Tyk
where the Txk and Tyk are the translations. The above equation can be rewritten as
x0 cos(θk ) − sin(θk ) Txk x
y 0 = sin(θ ) cos(θ ) T y . (4.1)
k k yk
1 0 0 1 1
indicates, we need to find the θk , Txk , and Tyk to obtain the transformation matrix
T k . The above three unknown variables can be grouped into a vector denoted as
xk = [θk Txk Tyk ]T . Hence, for video stabilization, the task is to find xk between
each frame pairs. The first frame in the video is considered to be stable. This means
that the transformation matrix for each frame should be referenced to the first frame,
which can be achieved with the multiplication of the transformation matrices. The
problem of solving for xk for every frame is considered as a state estimation problem
xk . Note that the nonlinear relationship between xk and T k is the main reason to
N
X
p(xk |y 1:k ) = wki δ(xk − xik ). (4.2)
i=0
As Eq. (4.2) suggests, the estimation involves the generation of the samples and the
the algorithm proposed in (J. Yang et al., 2006) with slight modifications. Assume
that at frame k, the particles are generated from an importance density function which
with mean x̄k and the variance Σk . Thus, the equation for the particle generations
at frame k is defined as
Note that the mean x̄k is important for the approximations since it provides the
baseline estimation. The closer the mean vector to the real state, the more accurate
In general, the mean vector can be obtained via feature detection algorithms
(Abdullah, Tahir, & Samad, 2012). The features of an image are usually corners and
edges (Manjunath, Shekhar, & Chellappa, n.d.). There are many feature detection
techniques that utilizes the edges, corners (Harris & Stephens, 1998), and small blob
have been used in video stabilization, image registration, motion detection, and ob-
ject recognition (Tong, Kamata, & Ahrary, 2009). The feature detection method
employed in this thesis is the Speeded Up Robust Features (SURF) detector (Pinto
& Anurenjan, 2011). The SURF detector detects the points on an image that are
invariant to scale, rotation, and the change of illumination. Also, the SURF detector
So once we have the matched j feature points between two frames from the SURF
It is obvious that we need at least three matched points to solve for T k and then
for xk . This can be easily achieved with SURF. Hence, we have the mean values
Now that the mean value x̄k is available, the samples of the state xk for frame
k can be drawn from the importance density function NG (x̄k , Σk ), where the Σk
the weight for each sample is assigned based on the similarity between the inversely
transformed frame using the samples and the reference frame. In our case, there
current frame using the N samples, and then calculate how similar is the inversely
transformed frame to the first frame for each sample. The particle that produces the
37
most similar frame is assigned to a heavier weight. Note that this method works only
The processes for measuring the similarity between two images utilize the methods
in (J. Yang et al., 2006). The first method is to calculate the Mean Square Error
(MSE) Mi2 between two images. It is obvious that the smaller the MSE, the less
difference between the two images. Thus the likelihood of the two images is higher
when the MSE is smaller, which can be approximated by the Gaussian distribution
below
1 M2
i
PM SE ∝ √ exp{− 2i }, (4.5)
2πσM 2σM
The second parameter is the correlation between the two images. The coefficient
of correlation Pi indicates the degree that the two images are linearly related (Kaur,
Kaur, & Gupta, 2012). The probability of the similarity between two images using
i 1 (Pi − 1)2
Pcorr ∝√ exp{− 2
}, (4.6)
2πσcorr 2σcorr
where the σcorr is the adjustable standard deviation, which is determined by experi-
ments.
38
After obtaining two weights from Eq. (4.5) and Eq. (4.6), the normalized weight
Pi Pi
wik = PN M SEi corr . (4.7)
i
i=1 PM SE Pcorr
So far, the samples and the weights are attained, the estimated state vector at frame
is shown as
N
X
x̂k = wik xik . (4.8)
i=1
Therefore, the output from the particle filter provides the estimated vector x̂k =
[θ̂k T̂xk T̂yk ] for the global movement between two successive frames.
One more step is to compute the transformation matrix that references to the
stable frame. Since the first frame is regarded to be stable, the accumulative trans-
k
Y
Hk = Ti (4.9)
i=1
Note that the output from the particle filter gives us the estimation of the global
camera motion, the motion with respect to frame. To maintain the intentional move-
39
ment due to the movement of the UAV and the object motion, extra steps are required,
As we only need to get rid of the unwanted movement, the intentional movement
on the video should not be removed. Thus, the intentional movement of the airplane
should be calculated and used to compensate the global movement (J. Yang et al.,
2006). A Kalman filter is utilized to estimate the intentional motion of the camera
since the intentional moving camera system can be modeled as a linear system. For the
Kalman filter, we need to find the two linear models, for the system state transition
model and the observation model. Assume the rotation angle and translations along
x and y axises are independent variables. So for the x-axis translation, the state
where Txk and Tx,k−1 are the translations along the x-axis at frames k and k − 1,
respectively, vxk and vx,k−1 are the moving speed along x-axis at frames k and k − 1,
respectively, and nvx,k−1 is the zero mean Gaussian noise and has the distribution
2
nvx,k−1 ∼ N (0, σvx ). For the observation model, the equation is simply
where Zxk is the measurement at frame k and mxk is a Gaussian noise with zero mean
2
and variance σmtx .
Similarly, translation Ty along the y-axis can be modeled in the same way as Tx .
For the rotation angle, the assumption is that there is no intentional angular velocity
Txk 1 1 0 0 0 Tx,k−1 0
v 0 1 0 0 0 v n
xk x,k−1 vx,k−1
T = 0 0 1 1 0 T + 0 , (4.10)
yk y,k−1
v 0 0 0 1 0 vy,k−1 nvy,k−1
yk
θk 0 0 0 0 1 θk−1 nθ,k−1
where θk is the rotation angle at frame k, nvy,k−1 and nθ,k−1 are both zero mean Gaus-
2
sian noises with variance being σvy and σθ2 , respectively. Accordingly, the observation
model is formulated as
Zxk 1 0 0 Txk mxk
Z = 0 1 0 T + m , (4.11)
yk yk yk
Zθk 0 0 1 θk mθk
where Zxk , Zyk , and Zθk are the measurements of the translations along x-axis, y-axis,
and the rotation angle, respectively, mxk , myk , and mθk are the zero mean Gaussian
2 2 2
observation noises with variance being σmx , σmy , and σmθ , respectively.
41
From the above two models of Eq. (4.10) and Eq. (4.11), the intentional motion
vector can be obtained using Kalman filter and is denoted as z k = [Txk Tyk θk ]T .
0
x cos(θ̃k ) −sin(θ̃k ) T̃xk x
y 0 = sin(θ̃ ) cos(θ̃ ) T̃ y ,
k k yk
1 0 0 1 1
where θ˜k = θˆk − θk , T˜xk = Tˆxk − Txk , and T˜yk = Tˆyk − Tyk are the unintentional motion
estimation for the rotational angle and translations along both axises. For notation
in the video. In (J. Yang et al., 2006; Song et al., 2012), the object motion is
removed before the background motion estimation by detecting the motion speed that
is assumed to be faster than the background. However, in our case, we assume that
the airplane moves very slowly among successive frames since the target is very far
away from the camera. Moreover, the airplane appears to be very small on the image,
which has little to zero feature points. The airplane appears to be static compared
to the camera motion. Therefore, the motion of the object is not considered in this
1. Read the video and load the consecutive frames: frame k and frame k − 1.
42
2. Detect, extract, and match feature points of two consecutive frames using
SURF.
3. Compute the state vector x̄k from T k estimated using Eq. (4.4).
(a) for i = 1:N, generate particles from the Gaussian importance density as
(b) for i = 1:N, assign the weights to each particle and calculate the normalized
(c) Estimate the state vector using the weighted samples, as illustrated in
Eq. (4.8).
6. Put the accumulative matrix into the Kalman filter to estimate the intentional
7. Apply inverse transformation using the above T̃ k to the current frame to form
erate the shaky videos using rotation and translations to a known image so that
43
ground truth values of rotation angle and the translations are known. The param-
eters for the particle filter are chosen as follows: the number of particles N = 30,
Σk = [0.001 10 10 ], and σM SE = σcorr = 0.5. For the Kalman filter, the initial states
are all zero. The system noise parameters are σθ = 0.5 and σvx = σvy = 5. The
observation noise parameters are σmθ = σmtx = σmty = 0.1. Moreover, the initial
error covariance matrix is assumed to be equal to the system noise covariance matrix.
These values are controllable and subject to change for different cases.
The first video comprises a series of the images that are obtained with linear incre-
ment in each of the three shaking parameters of the rotation angles, the translations
along x-axis, and the translations along y-axis. We use three different schemes on this
a particle filter, and Scheme C, SURF + a particle filter + a Kalman filter. Scheme
A estimates the transformation matrix using the match points directly, and Scheme
B estimates the transformation matrix using SURF first and then a particle filter
to obtain more accurate results. Note that the outputs from the first two schemes
are about global camera motion as shown in Eq. (4.9). In Scheme C, the Kalman
filter is applied to estimate the intentional motion vector following SURF and particle
given in Figs. 4.1 to 4.4. Note that for the result in Fig. 4.4, the translation along
Figs. 4.5, 4.6, and 4.7 show the comparisons of the estimation results for each
of the three parameters of camera motion. As can be seen from Fig. 4.7, Scheme B
outperforms Scheme A, and since Scheme C considers linear motions as the intentional
movement, both the estimation results are close to zero, which means no unexpected
motion. However, we can see from Fig. 4.5 and Fig. 4.6 that Scheme A outperforms
the other two schemes. This is due to the assumption that no intentional motion
Another video is produced with rotation angle and translation in y-axis of each
frame being random processes to further evaluate the performances of the three
Actual
Scheme A
Scheme B
80 Scheme C
60
pixels
40
20
−20
0 10 20 30 40 50 60 70 80 90 100
Frame
Figure 4.5. Comparison of x-axis translation estimations for linear translation change.
46
Actual
Scheme A
Scheme B
Scheme C
80
60
pixels
40
20
−20
0 10 20 30 40 50 60 70 80 90 100
Frame
Figure 4.6. Comparison of y-axis translation estimations for linear translation change.
15
10
Radians
−5
0 10 20 30 40 50 60 70 80 90 100
Frame
schemes. The motion along x-axis remains the same linear relationship as that in
Scheme A
Scheme B
Actual
100 Scheme C
80
60
Pixels
40
20
−20
0 10 20 30 40 50 60 70 80 90 100
Frame
Figure 4.12. Comparison of the x-axis translation estimations for random translation
change.
given in Figs. 4.8 to 4.11, and Figs. 4.12, 4.13, and 4.14 show the comparisons of the
estimation results for x-axis, y-axis translations, and rotation angle. As seen from
Fig. 4.14, Scheme C outperforms both Scheme A and Scheme B, demonstrating the
effectiveness of the Kalman filter. Yet, as shown in Fig. 4.13 for y-axis translation
estimation, Scheme C performs the worst, still due to the estimation of intentional
change. We can see from Fig. 4.12 that Scheme B outperforms Scheme A.
49
140
Actual
Scheme A
Scheme B
120
Scheme C
100
80
60
Pixels
40
20
−20
−40
−60
0 10 20 30 40 50 60 70 80 90 100
Frame
Figure 4.13. Comparison of the y-axis translation estimations for random translation
change.
0.005
0
Radians
−0.005
−0.01
Actual
Scheme C
Scheme A
Scheme B
−0.015
0 10 20 30 40 50 60 70 80 90 100
Frame
Figure 4.14. Comparisons of the rotation estimations for random rotation change.
50
In this chapter, we discuss object detection algorithms in details. Usually, the de-
heavy clouds, and other obstacles on the ground. Therefore, the development of suit-
able algorithms are critical for successful aircraft detection among other distractions.
Some popular algorithms include edge detection (Bhadauria, Singh, & Kumar, 2013),
connected area extraction (Hajri, 2012), morphological filtering (Casasent & Ye, 1997;
Sang, Zhang, & Wang, n.d.), local adaptive threshold filtering (Zarandy et al., 2011),
and dynamic programming (M. Yang et al., 2002; Barniv, 1985). In the sequel of
this thesis, we discuss the development and implementations of several algorithms for
object detection.
Edge detection is one of the fundamental operations in computer vision. Edges are
significant local changes of intensity, which usually occur on the boundary between
different regions in an image. Therefore, edge detection extracts image features such
as corners, lines, and curves on the image. Generally, derivative operations are applied
∂f f (x + h, y) − f (x, y)
fx = = lim ,
∂x h→0 h
∂f f (x, y + h) − f (x, y)
fy = = lim .
∂y h→0 h
By definition, the gradient is a vector with direction and magnitude. For a 2D discrete
denoted as ∇f = [fx fy ]T :
fx = f (x + 1, y) − f (x, y)
fy = f (x, y + 1) − f (x, y)
q (5.1)
M (∇f ) = (fx )2 + (fy )2
The operation of edge detection can be considered as the convolution of the image
with a mask, a filter. For example, the convolution masks defined in Eq. (5.1) are
[−1 1] in the x direction and [−1 1]T in the y direction, respectively. Then the
edges are determined by finding the local maximum or minimum points, which can
Another way to acquire the local maximum and minimum points is checking
whether the second derivative at the point is zero-crossing. The second derivative
∂ 2f ∂ 2f
∇f 2 = + , (5.2)
∂x2 ∂y 2
which is also known as the Laplacian edge detector. For calculations using Eq. (5.2),
one of the popular discrete convolution kernels of the Laplacian edge detector is
obtained as
0 1 0
M =
1 −4 1 .
0 1 0
Since the Laplacian operation is sensitive to noise due to the second-order deriva-
tives, the operation is often applied after a Gaussian filter which reduces noise. This
There are four different edge detectors widely used: Roberts, Prewitt, Sobel, and
Canny edge detectors (Shrivakshan & Chandrasekar, 2012). In this thesis, the Sobel
edge detector is employed due to its computational efficiency. The Sobel edge detector
is an example of applying the first-order derivative to the image to obtain the edges.
53
The Sobel edge detector uses a pair of 3×3 convolutional kernels which are formulated
as follows.
−1 0 1 −1 −2 −1
Mx =
−2 0 2
My =
0 0 0
−1 0 1 1 2 1
Compared to the second derivative, the Sobel operator is less sensitive to unex-
pected noise.
McCandless, 2003). It provides a way for extracting small, point-like targets (Carnie
et al., 2005; Yusko, 2007), which is a good application for UAV sense and avoidance.
preserve the small objects while removing the large cloud clutters on the image.
Morphological operations only rely on the relative ordering of the pixel values instead
of on their numerical values, so they are widely applied to process binary images.
Generally, the operation involves two primary operations which are known as dilation
and erosion. The basic element in these two morphological operations is a binary
region called a structure element, a small binary matrix whose shape is defined by
the pattern of ones and zeros. Unless specified otherwise, the center of the structure
54
element is the origin (Sonka, Hlaval, & Boyle, 2013). During morphology operation,
different values (1 or 0) will be assigned to the corresponding area under the structure
The mathematical expression for the dilation and erosion are defined (Sonka et
al., 2013) as
where F is a set with the elements being represented by f , SE denotes the structure
element whose members are expressed as se, and ⊕ and stand for the dilation and
location (x, y) are defined (Carnie, Walker, & Corke, 2006) by the following
by a dilation
f ◦ s = (f s) ⊕ s,
55
f • s = (f ⊕ s) s.
Usually, the opening operation smooths the contours of the object by eliminating
thin protrusions and breaking narrow bridges that are too small to accommodate the
structure element (Sonka et al., 2013). On the other hand, the closing operation tends
to smooth the entire section area by building up the links and filling small holes and
gaps.
From the above discussion, we know that the small bright areas are darkened
by the opening operation and the small dark areas are brightened by the closing
operation. As such, by subtracting the opened image from the original image, the
small positive objects are obtained. Similarly, the difference between the original
image and the closed image identifies the small negative objects that are darker than
the background (Maragos, 1987). Thus, the closed image minus the opened image
provides the detections for both the positive and negative objects. Such a process is
CM O(f, s) = (f • s) − (f ◦ s).
the large cloud and ground clutter whose existence causes false detections. The
minimum CMO utilizes two 1-D structure elements that are used for vertical and
56
horizontal operations. During the two CMO operations, small objects are preserved
if the sizes of the structure elements are bigger than the objects, with large clutter
large clutters are removed after calculating the minimum values out of the two CMO
operations.
While the minimum CMO operation can be used to detect small negative and
positive objects with high detection probability, the detection performance is often
affected by random pixel noise and poor signal to noise ratio. One optimal solution
for moving target detection is to utilize dynamic programming (DP) (Arnold, Shaw,
& Pasternack, 1993), a combination of detection and tracking, which returns the
target detection and tracking at the same time (Tonissen & Evans, 1995). The DP
algorithm has been proven to be efficient in detecting targets with low signal-to-noise
ratio and is robust to the camera jitter and random noise (Barniv, 1985). Instead
of detecting objects based on a single image, DP makes the decision of the presence
of the target after shifting and averaging multiple frames, which is suited for object
detection (Gonzalez & Woods, 2008), tracking (Tonissen & Evans, 1995), and even
For aircraft sense and avoid applications, the object movement between two frames
is less than 1 pixel, especially at far distances (Hobbs, 1991). Hence, we consider a
2D image, where the target position is represented by (i, j) and the 2D velocity of the
57
Assume at frame k, the object is at location (i, j) with the speed (u, v). Then at
frame k + 1, the object can end up at any location centered around (i, j) with the
range of 1 pixel, as shown in Fig. 5.1 (Carnie et al., 2006). The dark blue marks the
location of the object in frame k and the possible locations in frame k + 1 are colored
in light blue. The nine possible locations are grouped into four cases in terms of the
velocity u and v.
The steps for the dynamic programming are detailed in (M. Yang et al., 2002)
Initialization
For frame k = 0, Fu,v (i, j, 0) = 0, u ∈ {−1, 1} v ∈ {−1, 1}, where Fu,v (i, j, 0) is
the (i, j) pixel of frame 0 of the processed image in the (u, v) direction.
58
Recursion
At frame k + 1, the value of each pixel of the processed frame in each direction
is the summation of the weighted value of the pixel of the input image at
frame k + 1 and the maximum response of four possible transition states in four
how much it should trust the previous frame (also called memory factor) whose
values range from 0 to 1, Q(i, j, u, v) represents the pixel values within the
Decision Finally, the pixel value at the (i, j) of the processed image of frame k + 1
The processed image with DP is usually converted into a binary image with a
threshold for detection purpose. The target with low signal-to-noise ratio is able to
be detected since the dynamic programming raises the signal-to-noise ratio for dim
moving target. Note that there is usually clutter besides the target. The large area
clutter should be eliminated using the CMO operation before processing with DP.
59
consider three schemes: Scheme 1, the Sobel edge detector, Scheme 2, morphological
processing plus the Sobel edge detector, and Scheme 3, morphological processing plus
dynamic programming and the Sobel edge detector. The algorithm for Scheme 3 is
outlined below.
(a) First process f with CMO using a horizontal structure element to get the
(b) Then apply CMO with a vertical structure element to f to get vertical
CMO image fv .
(c) Finally obtain the minimum CMO image fm by finding the minimum pixel
in the horizontal direction and another of size 10 × 1 in the vertical direction. The
clutters, the maximum area size is set to be 300 pixels or 100 pixels depending on
the size of the aircraft. This means any connected area whose size is bigger than
300 or 100 pixels will be eliminated. We use four sets of videos to demonstrate the
the printed copy, all the binary images are displayed in a way that the background is
Fig. 5.2a is the first frame from a synthetic video. This video is generated in a
way that the background does not change while the object moves between consecutive
frames. The size of the target is designed to be 2 × 2 and the speed of the target
is constrained within 1 pixel per frame to be consistent with the assumptions of the
dynamic programming. As shown in the figure, there are large dark clouds in the
sky and large buildings at the bottom. In addition, the target is very dim and the
contrast between the target and the background is not very sharp, making it difficult
to detect with the naked eye. Figs. 5.2b to 5.2d demonstrate the object detection
results using different schemes. We can see that Scheme 1 works well in terms of
object detection, but suffers from too many false detections. We can also see that
61
Figure 5.2. Object detection results for a synthetic video with dark clouds.
both Scheme 2 and Scheme 3 work very well with a lower number of false detections.
Note that the power of DP is not obvious since the video is not very noisy.
The image shown in Fig. 5.3a is a frame from the video that has been added with
zero mean Gaussian noise of variance 0.0002. This video is generated in the same way
as the one shown in Fig. 5.2a. Figs. 5.3b to 5.3d demonstrate the object detection
results using different schemes. We can see that Scheme 1 still works well in terms of
object detection and suffers less with false detections due to better cloud conditions.
62
Figure 5.3. Object detection results for a synthetic video with light clouds and added
noise.
We can also see that Scheme 3 significantly outperforms Scheme 2 due to the power
Fig. 5.4a shows the original image with varying clouds, other clutters, and a
relatively small object. The video was recorded on the ground by hand. We can see
from Figs. 5.4b to 5.4d that Scheme 3 significantly outperforms the other two schemes
in terms of reduced false alarms. This is again due to the effectiveness of DP.
Fig. 5.5a shows the original image recorded on the ground without too many
distractions. Although there are some lamp posts in the image, the sky is clear
63
Figure 5.4. Object detection results for a recorded video with varying clouds.
without heavy cloud clutter. Figs. 5.5b to 5.5d demonstrate the object detection
results using different schemes. We can see that due to the big difference between the
in Fig. 5.6. Though the SNR is not significantly improved, this makes a big difference
The decision on which detection shceme to use should be made based on specific
situations. We will use Scheme 1 when there is big contrast between the objects and
the background, Scheme 2 when there is large area clutter, such as dark clouds, to
64
Figure 5.5. Object detection results for a video without cloud clutters.
−2.6
SNR before DP
−2.65 SNR after DP
−2.7
−2.75
−2.8
dB
−2.85
−2.9
−2.95
−3
−3.05
−3.1
0 50 100 150 200 250 300 350 400 450
Frame
be removed first, and Scheme 3 when the image is noisy. However, the DP is not
computational efficient since the algorithm searches every pixel on the image in a
recursive fashion.
66
6. Conclusion
This thesis has documented the following work for sense and avoid using cameras
mounted on a UAV:
since it is an important part for vision-based sense and avoid in terms of accu-
2. Camera stabilization. There are many different methods for camera stabiliza-
tion, which is still an active research topic. Here, we choose to address the
using matched feature points obtained using SURF. We have provided an easy-
the particle filter. We also implemented both filters, with the particle filter
used for global motion estimation and the Kalman filter for intentional mo-
tion estimation. Testing results are provided to show the effectiveness of the
approach.
3. Object detection. We have focused on the issue of small target detection, which
three image processing schemes to address the problem of small target detection:
called CMO on gray-level images and then a Sobel edge detector, and Scheme
evaluated the performance of these schemes, and concluded that we can use
Scheme 1 when there is big contrast between the objects and the background,
Scheme 2 when there are large clutters, such as dark clouds, to be removed first,
In the future, the following aspects can be investigated to improve and evaluate
• Research algorithms that are robust to the heavy clutter and environment vari-
ations will be explored. One thought is that the aircraft can be detected based
on its steady moving speed between the consecutive frames. Therefore, objects
with random speed can be classified as noise and outliers (Chen, Dang, Peng,
& Bart Jr, 2009; Abe, Zadrozny, & Langford, 2006), which can be removed in
a further process.
REFERENCES
Abdullah, L., Tahir, N., & Samad, M. (2012, 7). Video stabilization based on point feature
matching technique. Control and System Graduate Research Colloquium, 303-307.
Abe, N., Zadrozny, B., & Langford, J. (2006, 8). Outlier detection by active learning. Pro-
ceedings of the 12th ACM SIGKDD international conference on Knowledge discovery
and data mining, 504-509.
Arnold, J., Shaw, S., & Pasternack, H. (1993, 1). Efficient target tracking using dynamic
programming. IEEE transactions on Aerospace and Electronic Systems, 29 (1).
Arulampalam, M., Maskell, S., Gordon, N., & Clapp, T. (2002, 2). A tutorial on particle
filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on
Signal Processing, 50 (2).
Barniv, Y. (1985, 1). Dynamic programming solution for detecting dim moving targets.
Aerospace and Electronic Systems, IEEE Transactions on.
Bhadauria, H., Singh, A., & Kumar, A. (2013, 6). Comparison between various edge
detectioin methods on satellite image. International Journal of Emerging Technology
and Adavance Engineering, 3 .
Carnie, R., Walker, R., & Corke, P. (2005). Computer-vision based collision avoidance for
uavs. Melbourne, Australia.
Carnie, R., Walker, R., & Corke, P. (2006, 5). Image processing algorithms for uav “Sense
and Avoid”. Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE
International conference on.
Casasent, D., & Ye, A. (1997, 1). Detection filters and algorithm fusion for ATR. IEEE
Transactions on Image Processing, 6 .
Chen, Y., Dang, X., Peng, H., & Bart Jr, H. (2009, 2). Outlier detection with the ker-
nelized spatial depth function. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 31 .
Development Projects INC. (2013, 9). Nasa centennial challenge uas-aoc rules
(Tech. Rep.). Retrieved from http://www.nasa.gov/directorates/spacetech/
centennial challenges/uas
Dey, D., Geyer, C., Singh, S., & Digioia, M. (2009, 7). Passive, long-range detection of
aircraft: Towards a field deployable sense and avoid system. Proceedings of Field &
Services Robotics.
Doucet, A., Freitas, N., & Gordon, N. (n.d.). An introduction to sequential monte carlo
methods. Retrieved from http://www.stats.ox.ac.uk/~doucet/doucet defreitas
gordon smcbookintro.pdf
69
Fergus, R., Singh, B., Hertzmann, A., Roweis, S., & Freeman, W. (2006). Removing camera
shake from a single photograph. ACM Trans. Graph, 25 , 787–794.
Gandhi, T., Yang, M., Kasturi, R., Coraor, L., & McCandless, J. (2003, 1). Detection
of obstacles in the flight path of an aircraft. IEEE Transactions on Aerospace and
Electronic Systems, 39 .
Gaszczak, A., Breckon, T., & Han, J. (2001, 1). Real-time people and vehicle detection from
uav imagery. Proceeding of SPIE: Intelligent Robots and Computer Vision XXVIII:
Algorithms and Techniques.
Gonzalez, R., & Woods, R. (2008). Digital image processing (3rd ed.). Pearson Education.
Hajri, R. (2012, 6). UAV to UAV target detection and pose estimation. Retrieved from
http://www.dtic.mil/dtic/tr/fulltext/u2/a562740.pdf
Harris, C., & Stephens, M. (1998). A combined conrner and edge detection. Proceedings
of the Fourth Alvey Vision conference, 147 - 151.
Haykin, S. (2009). Neural networks and learning machines (3rd ed.). Pearson Education.
Hobbs, A. (1991, 4). Limitations of the see-and-avoid principle (Tech. Rep.). Retrieved
from https://www.atsb.gov.au/publications/1991/limit see avoid.aspx
Hruska, R., Lancaster, G., Harbour, J., & Cherry, S. (2005, 9). Small UAV-acquired,
high-resolution, georeferenced still imagery. conference: AUVSI Unmanned Systems
North America.
IDS. (n.d.). Retrieved from https://en.ids-imaging.com/store/produkte/kameras/
gige-kameras/show/all.html
Jaron, P., & Kucharczyk, M. (2012). Vision system prototype for UAV po-
sitioning and sparse obstacle detection. Retrieved from http://www.diva
-portal.se/smash/get/diva2:832010/FULLTEXT01.pdf;jsessionid=Iqf9smVDR
7DqAuk08Gd7xKFkKcBIJH3zhqMWZvt.diva2-search7-vm
Kaur, A., Kaur, L., & Gupta, S. (2012, 12). Image recognition using coefficient of correlation
and structure similaity index in uncontrolled environment. International Journal of
Computer Applications, 29 .
Lee, B., Yan, J., & Zhuang, T. (2001). A dynamic programming based algorithm for
optimal edge detection in medical images. Medical Imaging and Augmented Reality,
2001. Proceedings. International Workshop on, 193-198.
Lin, C., Hong, C., & Yang, C. (2009, 3). Real-time digital image stabilization system
using modified proportional integrated controller. IEEE Transactions on Circuits
and Systems for Video Technology, 19 .
Mackay, D. (n.d.). Introduction to monte carlo methods. Retrieved from http://www
.inference.phy.cam.ac.uk/mackay/erice.pdf
Maini, R., & Himanshu, A. (2009, 2). Study and comparison of various image edge detection
techniques. International Journal of Image Processing, 3 .
Manjunath, B., Shekhar, C., & Chellappa, R. (n.d.). A new approach to image feature detec-
tion with applications. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/
download?doi=10.1.1.1.3625&rep=rep1&type=pdf
70
Maragos, P. (1987, 7). Tutorial on advances in morphological image processing and analysis.
Proceeding of SPIE0707. Visual Communications and Image Processing.
Matsushita, Y., Ofek, E., Ge, W., Tang, X., & Shum, H. (2006, 7). Full-frame video
stabilization with motion inpainting. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 28 .
Mohammadi, M., Fathi, M., & Soryani, M. (2011, 6). A new decoder side video stabilization
using particle filter. Systems, Signals and Image Processing (IWSSIP), 2011 18th
International article on, 1-4.
Moses, A. (2013). Radar based collision avoidance for unmanned aircraft systems. Elec-
tronic Thesis and Dissertations.
Orlande, H., Colaco, M., Dulikravich, G., Vlanna, F., daSilva, W., daFon-
seca, H., & Fudym, O. (n.d.). Kalman and particle filters. Re-
trieved from http://www.sft.asso.fr/Local/sft/dir/user-3775/documents/
actes/Metti5 School/Lectures&Tutorials-Texts/Text-T10-Orlande.pdf
Perreault, B. (2012, 4). Introduction to the Kalman filter and its derivation. Re-
trieved from https://www.academia.edu/1512888/Introduction to the Kalman
Filter and its Derivation
Pinto, B., & Anurenjan, P. (2011, 2). Video stabilization using speeded up robust features.
Communications and Signal Processing (ICCSP), 2011 International conference on,
527-531.
Rozantsev, A. (2009, 5). Visual detection and tracking of flying objects in unmanned
aerial vehicle. Retrieved from http://wiki.epfl.ch/edicpublic/documents/
Candidacy%20exam/PR13Rozantsev.pdf
Sang, N., Zhang, T., & Wang, G. (n.d.). Gray scale morphology for small object detection.
Proc. SPIE2759. Signal and Data Processing of Small Targets, 2759 .
Shah, S. (2009, 8). Vision based 3D obstacle detection using a single camera
for ROBOTS/UAVs. Retrieved from https://smartech.gatech.edu/bitstream/
handle/1853/29741/shah syed i 200908 mast.pdf
Shrivakshan, G., & Chandrasekar, C. (2012, 9). A comparison of various edge detection
techniques used in image processing. International Journal of Computer Science
Issues, 9 .
Song, C., Zhao, H., Jing, W., & Zhu, H. (2012, 5). Robust video stabilization based
on particle filtering with weighted feature points. IEEE Transactions on Consumer
Electronics, 58 .
Sonka, M., Hlaval, V., & Boyle, R. (2013). Image processing, analysis, and machine vision
(4th ed.). Cengage Learning.
Tong, C., Kamata, S., & Ahrary, A. (2009, 11). 3D face recognition based on fast feature
detection and non-rigid iterative closet point. Intelligent Computing and Intelligent
Systems, 2009. ICIS 2009. IEEE International conference on, 4 , 509-512.
Tonissen, S., & Evans, R. (1995, 12). Target tracking using dynamic programming algo-
rithm and performance. Decision and Control, 1995., Proceedings of the 34th IEEE
conference on, 3 , 2741-2746 vol.3.
71
Tsai, R. (2003, 1). A versatile camera calibration technique for high-accuracy 3d machine
vision metrology using off-the-shelf TV cameras and lenses. IEEE Journal of Robotics
and Automation, 323-344.
Yang, J., Schonfeld, D., Chen, C., & Mohamed, M. (2006, 10). Online video stabilization
based on particle filters. Image Processing, 2006 IEEE International conference on,
1545-1548.
Yang, M., Gandhi, T., Kasturi, R., Coraor, L., Cmaps, O., & McCandless, J. (2002).
Real-time implementation of obstacle detection algorithms on a datacube maxpci
architecture. Real-Time Imaging.
Yusko, R. (2007, 3). Platform camera aircraft detection for approach evaluation and train-
ing. Retrieved from http://www.dtic.mil/dtic/tr/fulltext/u2/a467710.pdf
Zarandy, A., Zsedrovits, T., Nagy, Z., Kiss, A., & Roska, T. (2011, 5). Collision avoid-
ance for UAV using visual detection. Circuits and Systems (ISCAS), 2011 IEEE
International Symposium on, 2173-2176.
Zarandy, A., Zsedrovits, T., Nagy, Z., Kiss, A., & Roska, T. (2012, 8). Visual sense-
and-avoid system for UAVs. Cellular Nanoscale Networks and Their Applications
(CNNA), 2012 13th International Workshop on, 1-5.
Zhou, S., Chelleppa, R., & Moghaddam, B. (2004, 12). Visual tracking and recognition
using appearance-adaptive models in particle filters. IEEE Transactions on Image
Processing.