Adaptive Algorithm To Identify Anomalies in Moving Objects Using Computer Vision
Adaptive Algorithm To Identify Anomalies in Moving Objects Using Computer Vision
Abstract - In this document, an adaptive algorithm is just a few, which only offer a partial limited solution to
proposed to identify anomalies in moving objects, such as the problem of detect anomaly in the video. Also, many
pedestrians, cars, motorcyclists and cyclists. The anomalies of these projects make tests in controlled settings, where
detected by this algorithm are: occlusions, an object par-
tially entering or exiting a frame, alterations in the object a scene is arranged and labelled as anomalous, and then,
velocities or collisions between two or more objects in the the algorithm detects this behavior when a similar scene
frame. The classification between frames with anomaly and takes place.
no anomaly is achieved by finding an adaptive threshold
that depends on the video sequences.
III. S YSTEM OVERVIEW
I. I NTRODUCTION
A. Segmentation of Moving Objects
Nowadays, one of the main problems of video surveil-
lance is the detection of anomalies while monitoring mul- The correct segmentation of the moving objects in
tiple cameras at the same time. To solves this problems each frame is very important to obtain the relevant
this project proposes and develops an adaptive algorithm kinetic energy and area throughout the video. These two
that works in uncontrolled scenes and automatically features allow the detection of the different changes and
detects a variety of anomalies, such as occlusions,drastic anomalies within that frame.
changes in the velocities of the objects or at least two The segmentation of the foreground extraction was
objects colliding in the same frame. developed as follows:
The remainder of this work is organized as follows: The first step is the use of three binarization tech-
In Section II, related works are described. In Section III, niques: Gaussian Mixture Modelling [12] (GMM) (see
a summary of the diferente techniques in the algorithm Figure 1b), Morphological Reconstruction [13] (see Fig-
is given. Some of these techniques are background sub- ure 1d) and Motion Detection (see Figure 1b).
straction, apparent mass and kinetic energy estimation,
The second step involves an AND operation between
recursive least squares and an unsupervised classifier.
each pair of possible combinations of the three images
Finally, the experimental results on different video clips
obtained from the previous step (see Figures 1e-g).
are presented in Section IV.
In the third and final step, the OR operation is pro-
II. R ELATED W ORK posed in order to get a unique binary image I between
all the images obtained previously (see Figure 1h).
Different works have presented solutions to ensure
anomaly detection in surveillance videos. Some of these
solutions are based on detection, segmentation and track- B. Feature Extraction
ing [1-6]. Other projects [7-9] used temporal features,
such as, local velocity and local movement. Finally, In order to identify the anomalies in the video, two
in projects [10] and [11] two kinds of anomalies are features are proposed: the apparent mass and the kinetic
detected through kinetic energy when the people are energy.
exiting the boundary of the frame or when the frame
The apparent mass is proposed knowing that the
has crowded scenes.
volume of the object is a physical and solid characteristic
All the aforementioned method use complex opera-
that should not have changed throughout the video. The el objeto solo cambia
tions such as object tracking with a high numbers of tra- cuando hay oclusiones
drastic changes in this feature can only occur when there o una entrada o salida
jectories and people counting through the video, to name
are occlusions or the partially entry or exit of an object parcial de un objeto en
el frame
978-1-5090-2532-9/16/$31.00 c 2016 IEEE in the frame.
In order to estimate the second feature (kinetic en-
ergy), a pixel-by-pixel moving window P (i, j) is pro-
posed to make an horizontal sweep of the frame. This
window is created by selecting a neighbourhood of IxJ
pixels from the matrix of apparent mass M (u, v). This
neighbourhood is centered around a pixel of interest,
being (i, j) the pixel coordinates in the matrix.
The theory [11] denotes that the foreground entropy
H(u, v) is the dispersion of the foreground on the
horizontal and vertical directions. Therefore, the new
mass apparent matrix that use these foreground entropy
is defined as Crowd Dispersion Index and it is calculated
by the following:
Pu+ U2 P U2
( i=u− U
j=v− U
P (i, j))2
2 2
CDI(u, v) = , (2)
H(u, v)3
u+ U
2 2
U
X X
2
Ek(u, v) = CDI(u, v) ∗ vij , (3)
Figure 1. (a) Original image. Foreground detection with: (b) Gaussian i=u− U
2 j=v− U
2
Mixture Modeling, (c) Motion detection, (d) Morphological reconstruc-
tion. The resulting images from the AND operation between: (e) GMM
and motion detection, (f) GMM and morphological reconstruction and where vij is the velocity obtained through a Thomas
(g) Motion detection and morphological reconstruction. (h) Foreground
detection resulting from the combination of the three methods. Brox [14] optical flow algorithm.
Drastic changes in the kinetic energy allow detecting C. Recursive Least Square Filters
alterations in the velocities of the objects or collisions
between two or more objects in the frame. Two recursive least squares filters are proposed. One
to estimate the total apparent mass and the other one to
To calculate the first feature (apparent mass) is nec-
estimate the total kinetic energy in the frame k.
essary to obtain the volume estimation with the area
The RLS filter is calculated as:
correction of each moving object through the vanishing
point. The vanishing point is found by the detection
of two parallel lines in the original RGB image. Then, ŷ(k) = wT (k − 1)g(k) (4)
the distance di is calculated from the centroid of each
moving object i to the vanishing point. where, ŷ could be the estimation of total apparent
mass M̂ t or the estimation of total kinetic energy Ekt ˆ
With this distance, the apparent mass is attained as:
depending on the needed filter, w(k−1) are the respective
filter coefficients: 4 for the total apparent mass and 8 for
n
X the total kinetic energy, g is the vector of the buffered
M (u, v) = di Ii (u, v), (1)
input samples at the frame k being g = M t for the total
i=1
apparent mass or g = Ekt for the total kinetic energy.
Two instantaneous error are calculated to know if there
where, M (u, v) represents the mass apparent matrix are drastic changes in the total apparent mass and the
where the background is equal to zero and the foreground total kinetic energy respectively.
has positive values different of zero. The values of the Taking into account that the proposed RLS filters track
foreground correspond to the distances di of each moving the real signal of the total apparent mass and the total
object i, n is the total number of moving objects in the kinetic energy throughout the video, it is supposed that
frame, Ii is the (U, V ) matrix of the binary image with the absolute error is lognormal. Therefore, the calculated
only the i-th moving object, U and V are the height and histogram by the log error has a normal distribution when
width values of the real image respectively. there are not anomalies.
Resolution Frames
Two frame-by-frame moving windows are used to Video 1 240x352 30/s
detect a classification threshold that adapts to different Video 2 360x640 30/s
scenes in the video. One window is linked to the total Video 3 360x528 30/s
Video 4 252x320 30/s
apparent mass instantaneous error and the other one to Table I
the total kinetic energy instantaneous error. This classifi- E ACH VIDEO SPECIFICATIONS USED BY THE ALGORITHM .
cation method is unsupervised since there is no previous
training.
These moving windows select the data from each error
and then, a log-normal histogram with an absolute error Figure 2 highlights drastic changes in the total kinetic
distribution for each of them is created. energy or total apparent mass with a red color and minor
A probability density function (PDF) that fits each changes in these features with a yellow color.
histogram has to be found. The PDFs that fits best the
different distribution of the histograms are: First, when
the scene has no anomalies, the proposed (RLS) filter
has soft changes making the histogram a lognormal dis-
tribution one. Therefore, this distribution is modelled as
a Gaussian PDF. Second, when the scene has anomalies
the histogram changes its distribution (not a lognormal
one) making the absolute error value to be modelled as
a Gaussian Mixture PDF.
A Gaussian mixture distributions, if the algorithm
detects a dataset which contains changes in the scenes
(anomaly or non-anomaly in the frame) and a Gaussian
distribution, if the dataset has only samples without any
anomaly in the apparent mass or the kinetic energy. The
term anomaly makes reference to a drastic change in the
total apparent mass or in the total kinetic energy.
In order to know wich of Gaussian PDF or Gaussian Figure 2. Processed images of Video 1 showing drastic changes in
mixture PDF most fit the histogram a Kullback–Leibler total kinetic energy or total apparent mass.
divergence is calculated for both of them:
Finally, if the Gaussian Mixture PDF is the one that The samples behave throughout all the Video 1 as
fit the histogram, the threshold thr is calculated as: shown in Figure 3, where the blue samples are the frames
that have normal behavior, the green ones are the frames
√ µ1 σ22 + µ2 σ12 that have changed in the total kinetic energy and the
thr = γ+ (5)
σ12 + σ22 red ones are the frames that have a change in the total
where, apparent mass.
In Table 2 the results obtained with the unsupervised
γ = σ12 σ22 (ln(a1)(σ12 σ22 ) − (µ1 − µ2 )2 classifier for each video are shown.
+ (σ12 + σ22 )ln(a2)) (6) Non-anomaly Anomaly VPR SPC ACC
Video 1 945 235 81.66% 71.66% 73.19%
Video 2 621 249 80.77% 78.26% 78.66%
Video 3 1466 307 77.78% 90.76% 88.36%
Video 4 947 268 84.00% 81.82% 82.17%
where µ1 and µ2 are the means of the first and second Table II
Gaussian distribution respectively, σ1 and σ2 are the S ENSIBILITY, SPECIFICITY AND PRECISION FOR THE FOUR VIDEOS .