0% found this document useful (0 votes)
19 views5 pages

Fast Anomaly Detection in Traffic Surveillance Video Based On Robust Sparse Optical Flow

Uploaded by

Sadia Shafiq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views5 pages

Fast Anomaly Detection in Traffic Surveillance Video Based On Robust Sparse Optical Flow

Uploaded by

Sadia Shafiq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

FAST ANOMALY DETECTION IN TRAFFIC SURVEILLANCE VIDEO BASED ON ROBUST

SPARSE OPTICAL FLOW

Hanlin Tan, Yongping Zhai, Yu Liu and Maojun Zhang

College of Information System and Management, National University of Defense Technology, China
hanlin [email protected], [email protected], [email protected], [email protected]

ABSTRACT our application and it introduces what features to extract in


Section 2.
Fast abnormal events detection in video is important for intel-
ligent analysis of video. This paper proposes a fast anomaly
detection algorithm based on sparse optical flow. We im- 1.2. Related Work
prove the efficiency of optical flow computation with fore- A group of anomaly detection algorithms[2, 3, 4, 1, 5, 6, 7, 8]
ground mask and spacial sampling and increase the robust- performs (subsets of) the following five steps:
ness of optical flow with good feature (TK) points select-
ing and forward-backward filtering. A foreground channel a) Feature computation on pixel level;
is also added to the feature vector to help detect static or low b) Feature aggregation in space or/and time;
speed objects. The algorithm is validated on real-life traffic c) Transformation of the aggregated features to certain do-
surveillance to prove its effectiveness. It is also evaluated on a mains;
benchmark dataset and achieve detection results comparable d) Build a model/classifier with the final features from train-
to state-of-art methods and outperforms them at pixel-level ing video;
when the false alarm rate is low. The strength of our algo- e) Comparison of the final features from test video with the
rithm is that it runs real-time on the benchmark dataset which model.
is hundreds of times faster than comparative methods.
Pixel level features include foreground location[4, 7, 8],
Index Terms— Anomaly detection, abnormal event, traf- HOG (Histogram of Oriented Gradients)[1], HOF (His-
fic surveillance, optical flow togram of Optical Flow)[2] and MDT (Mixtures of Dynamic
Textures)[3]. The most common aggregation is to sum up
features in a spacial and temporal 3-D block, which helps
1. INTRODUCTION make the feature more robust to noise. Other aggregation
includes building custom models such as locality model
Automatically locating abnormal events in traffic surveillance used in [8]. Transformation used recently includes sparse
video is of vital importance to traffic administration as well as representation[1, 9]. Models and classifiers include sparse
public safety. As is well known, it is hard to handle all scenes reconstruction cost[1, 6], maximum norm[7] and one-class
with one method, so we select traffic surveillance as the target SVM[5]. For detection step, researchers usually have to set
type of scene to design and test our algorithm. thresholds or tune parameters based on what model or clas-
sifier they adopts and compare features they extract from
1.1. Definition of Anomaly training and test videos.
Another type of method is based on tracking[10]. This
There are many definitions of anomaly to many people. We type of method is good at handling uncrowded scenes. How-
take events with low possibility as anomaly since it converts ever, tracking is unreliable on crowded scenes[3]. And it is
the ambiguous concept to an operational one. The word event hard to obtain reliable detection results with unreliable tra-
is still not operational. This word contains different meanings jectories. The algorithm proposed by this paper belongs to
in different scenes. In traffic surveillance, we take event as the first type and can achieve remarkable detection results on
motion. Therefore, anomaly detection in this paper is to de- real-life traffic surveillance in real-time.
tect motion with low possibility in traffic surveillance video.
This definition ignores anomaly that is not involved with mo-
1.3. Our Contributions
tion, e.g. appearance anomaly. However, this is acceptable
considering our application background being traffic surveil- The contribution of this paper to anomaly detection lies
lance. The definition is consistent with human cognition in in several aspects: First, it proposes a procedure comput-

‹,(((  ,&$663


ing robust sparse optical flow, which makes feature extrac- to be in a certain range. If the pixel distance is too small, the
tion fast and reliable. Second, it aims at real-life traffic length of optical flow is close to zero and cannot be computed.
surveillance and propose a framework that is simple, ro- If the pixel distance is too large, the algorithm cannot find the
bust and hundreds of times faster than comparative meth- correct corresponding pixel to compute optical flow. In order
ods. Third, it summarizes a common framework the state- to make up this limitation, a new channel named foreground
of-art methods adopt. The source code of our algorithm is added to feature inspired by [4]. For each pixel,
is available at https://github.com/TomHeaven/ 
AnomalyAnalysisWithOpticalFlow. f oreground =
1, for foreground pixel
(1)
0, for background pixel
2. METHODOLOGY The aggregation step for this new channel is identical to
that for the HOF feature. With this new channel, the feature
This section first introduces how to compute robust sparse op- has better performance on detecting static and low speed fore-
tical flow, and then elaborate how to extract features and uti- ground object than simply using HOF.
lize it to detect anomaly. To further improve robustness, a spacial Gaussian blur is
performed on the aggregated feature (on each channel sep-
2.1. Robust Sparse Optical Flow arately), which makes the feature more smooth and stable.
The experimental results show that the blur of feature does
Figure 1 (a) illustrates the result of the optical flow, which not only reduce false alarm but also increase true positive de-
is in accord with object motion. The computation of op- tection rate.
tical flow is mainly based on [11, 12, 13]. They are inte-
grated with modifications and achieve improvements on both
computation speed and robustness. The procedure is illus- 2.3. Training and Detection
trated as Figure 1 (b). First, the foreground-masked input A training model is built to learn from a normal video and
frame owns fewer and only moving pixels, which reduce the detect anomaly in a test video. Figure 1 (d) and (e) illus-
amount of calculation as well as the chance of matching er- trates one block of the proposed feature for some 5,000 nor-
rors. Second, finding good features makes our optical flow mal frames extracted from a piece of real-life traffic surveil-
more reliable[12]. Third, the LKT tracker[11] is the most lance video. The values of feature channels vary dramatically,
commonly used and stable method for computation of opti- yet they are extracted from the normal video. A very simple
cal flow. LKT tracker is used to compute optical flow only on detection criterion is that:
good feature points[12], which is both robust and fast. At last,
a forward-backward filter inspired by [13] further removes a) Any feature value that appear in training video are normal.
the unreliable matching results and leaves us the robust op- b) Any feature value that is significantly greater than the
tical flow. The optical flow is computed in both directions maximum feature values of training video are considered
and the distance between the origin of forward flow and the abnormal.
destination of the backward is recorded. Then the worst 50%
optical flow is filtered out by a mean filter. Note the original Let vector A(b, t) denote the aggregated feature of block b
thesis[13] use a median filter. However, we find that a mean at frame t. The training process is to find a maximum bound-
filter performs better in traffic videos. ary B(b) for each feature channel:

B(b) = max A(b, t) (2)


t
2.2. Feature Extraction and Aggregation
Figure 1 (c) illustrates the procedure for feature extraction and where t is a variable that enumerates all frames of training
aggregation. One of the low-level features is optical flow. It sequences. Let vector v(b, t) denote the aggregated feature
captures both the speed and the direction of every moving extracted from the test video. Then, compute the distance
pixel. Then the optical flow is projected on a certain number vector D(b, t) as
of orientations to obtain the Histogram of Optical Flow (HOF)
D(b, t) = v(b, t) − B(b) (3)
feature. The HOF feature is aggregated in spatial block and
temporal period by sum them up in each orientation sepa- Finally, whether a block is abnormal is decided by thresh-
rately. ([1] called this feature Multi-scale HOF, MHOF.) olding on each channel of the distance vector:
However, one important limitation of optical flow is that it

cannot extract feature from static or slow speed object, even 1, if D(b, t) > θ
if the object is detected as foreground by motion detection x(b, t) = (4)
0, otherwise
algorithms. In fact, computation of optical flow requires the
corresponding pixels’ distance between the two input frames where θ is the predefined threshold vector.


Fig. 1. Optical flow computation and feature extraction. (a) illustrates the Extracted sparse optical flow. The head and length
of red arrows denotes the direction and speed of extracted optical flow. (b) demonstrates four major steps to compute robust
optical flow with high speed. (c) illustrates feature extraction and aggregation. Pixel feature consists of HOF with additional
channel of foreground [4]. To further improve robustness, a spacial Gaussian blur is performed on the aggregated feature (on
each channel separately). (d) demonstrates a scene image and the red block circled marks out one of the feature blocks. (e)
illustrates its feature variation along time with a plot of six channels of the feature. The horizontal axis is the frame number; the
vertical axis is the feature value.

2.4. Acceleration 3.1. Real-life Situations

The most time-consuming parts of our algorithms lie in the The proposed method is tested in numerous real-life situa-
computation of optical flow and aggregation of features. For tions to validate its effectiveness. Figure 2 illustrates the re-
optical flow, there are two methods for acceleration: sults. Given a few minutes of normal video for training, our
algorithm accurately detect various anomalies such as pedes-
a) Foreground mask. Computation of optical flow in the trian across the road at wrong location and reversely running
background area is neither nonsense nor wrong. There- motorcycles and trunks at different scenes, day and night.
fore, foreground is used as a mask to increase both effi-
ciency and robustness.
b) Spacial sampling. Sparse optical flow extracted from fixed 3.2. UCSD Ped1 Dataset
grid pixels are sufficient for feature aggregation.
The algorithm is tested on one of the most evaluated datasets:
For aggregation, integral images are used to compute the UCSD Ped1 Dataset [15]. This dataset provide training se-
sum of a rectangle area with a constant time cost. quences with only pedestrians and marks non-pedestrians as
anomalies. Note the ground truth is not fully consistent with
our definition of anomaly since this paper only takes low-
3. EXPERIMENTS possibility motion patterns as anomalies.
The results are illustrated in Figure 3. (a), (b) is the
The method is evaluated on both real-life situations and frame-level and pixel-level ROC curve, respectively. It can
a benchmark dataset. Parameter setting are as follows: be seen that our algorithm performs comparable with state-
the block size is 16 · 16 with a time window of 5 frames. of-art methods at frame-level and outperforms state-of-art
The threshold vector is different according to scenes. How- methods at pixel-level when the false positive rate is less than
ever, 0.1·num of block pixels/spacial sample distance· around 24%. This improvement is important because it is not
time window typically produces an acceptable result. practical to tolerate high false alarm rate. And (c) is the run-


Fig. 2. Detection results of real-life surveillance video. Group (a) shows two detected anomalies of one scene: the pedestrian
crossing the road at wrong location and a car entering the side-road at wrong location. Group (b) shows two detected reversely
running motorcycles of another scene. Group (c) shows two anomalies of different scenes: detected trunks running on the
wrong side of road at night and the pedestrian across the road at wrong location.

Fig. 3. Comparison of ROC curves on UCSD ped1 dataset. Here Adam refers to [1]; MDT refers to [3]; social force refers to
[14]; sparse and sparse combination refer to [1] and [6], respectively. (a) is the frame-level ROC curve; Note that frame-level
AUC is not the bigger the better if the pixel-level AUC is not in accord with it. (b) is the pixel-level comparison. (c) is a table
of running time comparison.

ning time comparison. Our algorithm runs real-time on this The sparse optical flow computation is improved in both ro-
benchmark dataset, which is much faster than existing algo- bustness and efficiency. A foreground channel is added to
rithms [3, 1]. Algorithm [6] reaches an impressing speed by HOF feature to detect long-term static objects. The algo-
resizing frames to 30 · 30 and other small resolutions. How- rithm is validated on real-life traffic surveillance and a bench-
ever, this will not work if the abnormal objects in original mark dataset to prove its effectiveness. The algorithm runs
frame are small. real-time and is hundreds of times faster than a number of
comparative algorithms. The speed gain is achieved by a
4. CONCLUSION fast feature extraction design and a simple detection model.
This research was partially supported by the National Natu-
We propose an efficient algorithm for detecting anomaly in ral Science Foundation of China under Grant 61403403 and
traffic surveillance video based on robust sparse optical flow. 61405252.


5. REFERENCES [12] C. Tomasi and J. Shi, “Good features to track,” Com-
puter Vision and Pattern Recognition (CVPR), IEEE
[1] Y. Cong, J. Yuan, and J. Liu, “Sparse reconstruction Conference on, pp. 593–593, 1994.
cost for abnormal event detection,” Computer Vision
and Pattern Recognition (CVPR), IEEE Conference on, [13] Z. Kalal, “Tracking learning detection,” PhD The-
pp. 3449–3456, 2011. sis, University of Surrey, vol. 30, pp. 555–560, January
2011.
[2] A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz, “Ro-
bust real-time unusual event detection using multiple [14] R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd
fixed-location monitors,” Pattern Analysis and Machine behavior detection using social force model,” Computer
Intelligence, IEEE Transactions on, vol. 30, pp. 555– Vision and Pattern Recognition (CVPR), IEEE Confer-
560, 2008. ence on, pp. 935–942, 2009.

[15] V. Mahadevan, W. Li, V. Bhalodia, and


[3] V. Mahadevan, W. Li, V. Bhalodia, , and N. Vasconce-
N. Vasconcelos, “Ucsd ped dataset,”
los, “Anomaly detection in crowded scenes,” Computer
http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm,
Vision and Pattern Recognition (CVPR), IEEE Confer-
2013.
ence on, pp. 1975–1981, 2010.

[4] P. Jodoin, V. Saligrama, and J. Konrad, “Behavior


subtraction,” PROCEEDINGS-SPIE THE INTERNA-
TIONAL SOCIETY FOR OPTICAL ENGINEERING, p.
6822, 2008.

[5] T. WANG, H. SNOUSSI, , and F. SMACH, “Detection


of visual abnormal events via one-class svm,” Proceed-
ings of the International Conference on Image Process-
ing, Computer Vision, and Pattern Recognition (IPCV),
pp. 113–119, 2012.

[6] C. Lu, J. Shi, and J. Jia, “Abnormal event detection


at 150 fps in matlab,” Computer Vision (ICCV), IEEE
International Conference on, pp. 2720–2727, 2013.

[7] P. Jodoin, V. Saligrama, and J. Konrad, “Behavior sub-


traction,” Image Processing, IEEE Transactions on, pp.
4244–4255, 2012.

[8] V. Saligrama and Z. Chen, “Video anomaly detection


based on local statistical aggregates,” Computer Vision
and Pattern Recognition (CVPR), IEEE Conference on,
pp. 2112–2119, 2012.

[9] T. Dean, M.A. Ruzon, M. Segal, J. Shlens, S. Vijaya-


narasimhan, and J. Yagnik, “Fast, accurate detection of
100,000 object classes on a single machine,” Computer
Vision and Pattern Recognition (CVPR), IEEE Confer-
ence on, 2013.

[10] A. Basharat, A. Gritai, and M. Shah, “Learning object


motion patterns for anomaly detection and improved ob-
ject detection,” Computer Vision and Pattern Recogni-
tion (CVPR), IEEE Conference on, pp. 1–8, 2008.

[11] B.D. Lucas and Kanade, “An iterative image registration


technique with an application to stereo vision,” Com-
puter Vision and Pattern Recognition (CVPR), IEEE
Conference on, pp. 674–679, 1981.



You might also like