Anomalous Motion Detection On Highway Using Deep Learning: Harpreet Singh Emily M. Hand Kostas Alexis
Anomalous Motion Detection On Highway Using Deep Learning: Harpreet Singh Emily M. Hand Kostas Alexis
ABSTRACT
Research in visual anomaly detection draws much interest due
to its applications in surveillance. Common datasets for eval-
uation are constructed using a stationary camera overlooking
a region of interest. Previous research has shown promising
results in detecting spatial as well as temporal anomalies in
these settings. The advent of self-driving cars provides an
opportunity to apply visual anomaly detection in a more dy-
namic application yet no dataset exists in this type of environ- Fig. 1. The top row shows examples from the CUHK anomaly
ment. This paper presents a new anomaly detection dataset – dataset, normal image data(left) and abnormal(right). The
the Highway Traffic Anomaly (HTA) dataset – for the prob- bottom row is an example from the UCSD anomaly dataset,
lem of detecting anomalous traffic patterns from dash cam the anomaly(right) is the golf cart
videos of vehicles on highways. We evaluate state-of-the-art
deep learning anomaly detection models and propose novel
variations to these methods. Our results show that state-of- the Highway Traffic Anomaly (HTA) dataset, to evaluate
the-art models built for settings with a stationary camera do anomaly detection methods with a moving agent. Specifi-
not translate well to a more dynamic environment. The pro- cally, the dataset consists of dash cam videos captured from
posed variations to these SoTA methods show promising re- vehicles driving on the highway. The goal is to learn normal
sults on the new HTA dataset. driving motion of traffic in the camera’s field of view and
Index Terms— anomaly detection, deep learning, one- then detect conditions in which other vehicles are moving
class classification abnormally. In this dataset, not only is the camera mov-
ing but other vehicles and background features are also in
motion relative to the camera. The dataset consists of five
1. INTRODUCTION types of anomalies: speeding vehicle, speeding motorcycle,
vehicle accident, close merging vehicle and halted vehicle.
Anomaly detection is an unsupervised one-class classification Three state-of-the-art deep learning based anomaly detection
problem with the goal of learning the normal state of data dur- models are evaluated and two variations, specifically for the
ing training and then detecting aberrations without any pro- problem of detecting anomalous highway traffic motion, are
vided labels. Many anomaly detection applications involve proposed. Code for the HTA dataset and the evaluated models
visual data, such as images or videos, and are motivated by in- is available at [2].
terest in surveillance. Datasets commonly evaluated for visual
anomaly detection are constructed with a stationary camera
observing a region in which the background environment is 2. RELATED WORK
relatively static while foreground objects such as pedestrians 2.1. Anomaly Detection Datasets
and vehicles are in motion. Anomalies are defined as appear-
ance or motion deviations from normal data in the foreground Frequently evaluated visual anomaly datasets consist of a
objects, shown in Figure 1. [1] provides a comprehensive list static background, moving foreground objects, and a station-
of video anomaly detection datasets, but they all maintain the ary camera. The UCSD Pedestrian dataset [3] consists of
same characteristics. As autonomous robots become increas- videos of pedestrians on a university campus. Normal data
ingly common, there is a need to perform anomaly detection corresponds to pedestrians walking on pathways. Abnormal
while an agent is moving within an environment yet no such data consists of small vehicles or cyclists that are differ-
dataset exists. ent in appearance and motion from usual pedestrian traffic.
This paper presents a new anomaly detection dataset, The CUHK [4] Avenue dataset also consists of videos of a
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 25,2020 at 11:12:29 UTC from IEEE Xplore. Restrictions apply.
crowded walkway with anomalies defined as unusual behav-
ior, such as throwing an object. Similarly, the UMN Unusual
Crowd Activity dataset [5] consists of videos of unusual be-
havior in crowds. While not exhaustive, this list summarizes
the typical visual anomaly datasets previously studied. To the
best of our knowledge, there is no open-source autonomous
driving dataset specifically for the task of anomaly detection.
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 25,2020 at 11:12:29 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Predictive Coding Network architecture is composed
of modules containing four components: recurrent layer, pre-
Fig. 3. Examples of abnormal images from the training set. diction layer, input layer, and error layer.
Top to bottom row: speeding motorcycle, halted vehicle,
close merge, speeding vehicle. Vehicle accident not shown frame. The Predictive Coding Network (PredNet) proposed in
but YouTube links will be provided. [18] showed promising results in predicting future frames in
the KITTI dataset. PredNet, shown in Figure 5, is constructed
with a series of modules that perform local predictions and
then propagates only the errors between the predicted and ac-
tual to the subsequent layers. The evaluated model was con-
structed using the recommendations in [18]. Anomalies are
detected using the pixel-level reconstruction error between
the predicted (N + 1)th frame and the ground truth frame,
employing the averaging sliding window approach to com-
pute pixel-level difference.
Predictive models can learn sequential data such as videos. A total of four models are evaluated with the HTA dataset:
By training on sequences of N normal traffic frames, a look CGAN, FlowNet, PredNet N + 1, and PredNet N + 6. The
back of N , a predictive model will then predict the N + 1 performance of each model is measured using an AUC score.
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 25,2020 at 11:12:29 UTC from IEEE Xplore. Restrictions apply.
Table 1. AUC scores computed from the all the evaluated
models. PredNet(N+6) performs the best across the board.
CGAN FlowNet PredNet(N+1) PredNet(N+6)
Speeding Vehicle 0.608 0.623 0.497 0.614
Accident 0.607 0.657 0.559 0.601
Speeding motorcycle 0.580 0.593 0.581 0.828
Close Merge 0.422 0.531 0.619 0.643 Fig. 7. Results from PredNet of the extrapolated frames of ab-
Halted Vehicle 0.337 0.216 0.554 0.236 normal motion, the speeding motorcycle. Vehicles in normal
motion are less blurry than the speeding motorcycle.
both models are near 0.5, indicating that they are unable to
discern abnormal motion from normal motion. It seems that
both models learn to estimate optical flow better than learning
distinguishing patterns in normal motion.
Fig. 6. The CGAN discriminator’s output(right) is unable to
identify the anomalous motion of the motorcycle. 4.3. Predictive Coding Network
The PredNet model is trained on a look back of 10 consecu-
The thresholds in each experiment range from 0.002 to 1.0 in tive RGB images as suggested by the original work. As with
increments of 0.002. Five AUC scores are reported for each the CGAN, each image is cropped from the bottom as well
model, one for each type of anomalous motion, Table 1. as the top with the final input size 64 × 256 × 3 × 10. The
model was trained for 100 epochs. Anomalies are detected
4.1. Conditional GAN using the reconstruction error between the model’s predicted
future frame and the ground truth frame. Using this mecha-
The CGAN is trained on pairs of RGB images. Input images
nism, AUC scores for PredNet N + 1 are provided in Table
are cropped from the bottom to remove the visible hood of
1. The AUC scores of PredNet N + 1 show that close merge
the vehicle as well as from the top to reduce the amount of
and speeding motorcycle anomalies perform relatively better
sky/background in each frame. The final input size is 128 ×
than others and achieves the highest AUC score for the halted
512 × 6. The model is trained for 40 epochs.
vehicle anomaly albeit still near 0.5.
The first approach to detect anomalies is by using the dis-
By setting a look back of two, we extrapolate four frames
criminator’s 2D - one channel output as a heat map for patches
into the future, testing the sixth frame for anomalous motion,
containing an anomaly. [9] shows promising results using this
PredNet N + 6. Figure 7 shows a sample output and ground
approach but the discriminator’s output trained on the HTA
truth frame. Anomalies are detected in the same manner as
dataset does not produce the same results, Figure 6. The
with PredNet N +1. AUC scores for PredNet N +6 are shown
salient characteristic that may cause this discrepancy in the
in Table 1 and show significant improvement in detecting the
results is that, unlike the UCSD Pedestrian dataset, the HTA
speeding motorcycle anomaly. PredNet N + 6 achieves the
dataset does not maintain a static background.
highest or close to the highest AUC score for each anomaly
The second approach to detect anomalies with the CGAN
type except for halted vehicle.
is to use the reconstruction error from the generator’s out-
put. The AUC scores of the CGAN in Table 1 show that all
anomaly types are near 0.5, meaning that the reconstruction 5. CONCLUSION
error from generator’s output has minimal discriminative ca-
This paper presents a new anomaly detection dataset, the
pabilities in classifying abnormal motion.
Highway Traffic Anomaly (HTA) dataset. It differs from
existing anomaly detection datasets in many ways that make
4.2. FlowNet it more challenging. To the best of our knowledge, this is
FlowNet is also trained on pairs of RGB images from the the first anomaly detection dataset for autonomous driving.
training dataset. The input RGB images for FlowNet are Four state-of-the-art deep learning models were evaluated
cropped from the bottom, making the input size 2 × 256 × with a proposed heuristic to improve the reconstruction er-
512 × 3. The model is trained for 100 epochs. Anomalies are ror for anomaly detection tailored for the HTA dataset. The
detected in the test set using the same approach as CGAN, results indicate that state-of-the-art models do not perform
the reconstruction error between the predicted and ground well on the HTA dataset. Our proposed variation of the Pred-
truth optical flow. The AUC scores for the FlowNet are only Net model to predict the sixth future frame shows promising
slightly better than the CGAN results, Table 1. results on the speeding motorcycle anomaly and relatively
Both generative models estimate optical flow of abnor- better in all anomalies.
mal motion just as well as normal motion; AUC scores of
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 25,2020 at 11:12:29 UTC from IEEE Xplore. Restrictions apply.
6. REFERENCES [12] Yong Shean Chong and Yong Haur Tay, “Abnormal
event detection in videos using spatiotemporal autoen-
[1] N. Patil and Prabir Kumar Biswas, “A survey of video coder,” in International Symposium on Neural Net-
datasets for anomaly detection in automated surveil- works. Springer, 2017, pp. 189–196.
lance,” 2016 Sixth International Symposium on Embed-
ded Computing and System Design (ISED), pp. 43–48, [13] Weixin Luo, Wen Liu, and Shenghua Gao, “Remember-
2016. ing history with convolutional lstm for anomaly detec-
tion,” in 2017 IEEE International Conference on Multi-
[2] Harpreet Singh, “Highway traffic anomaly repos- media and Expo (ICME). IEEE, 2017, pp. 439–444.
itory,” https://github.com/harpreets652/
[14] Fisher Yu, Wenqi Xian, Yingying Chen, Fangchen
highway-traffic-anomaly, 2019.
Liu, Mike Liao, Vashisht Madhavan, and Trevor Dar-
[3] Vijay Mahadevan, Weixin Li, Viral Bhalodia, and Nuno rell, “Bdd100k: A diverse driving video database
Vasconcelos, “Anomaly detection in crowded scenes,” with scalable annotation tooling,” arXiv preprint
in 2010 IEEE Computer Society Conference on Com- arXiv:1805.04687, 2018.
puter Vision and Pattern Recognition. IEEE, 2010, pp. [15] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal
1975–1981. of Software Tools, 2000.
[4] Cewu Lu, Jianping Shi, and Jiaya Jia, “Abnormal event [16] B Kiran, Dilip Thomas, and Ranjith Parakkal, “An
detection at 150 fps in matlab,” in Proceedings of overview of deep learning based methods for unsu-
the IEEE international conference on computer vision, pervised and semi-supervised anomaly detection in
2013, pp. 2720–2727. videos,” Journal of Imaging, vol. 4, no. 2, pp. 36, 2018.
[5] Nikos Papanikolopoulos, “Detection of unusual crowd [17] Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip
activity,” Unusual crowd activity dataset. Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick
Van der Smagt, Daniel Cremers, and Thomas Brox,
[6] David MJ Tax and Robert PW Duin, “Support vector “Flownet: Learning optical flow with convolutional net-
data description,” Machine learning, vol. 54, no. 1, pp. works,” arXiv preprint arXiv:1504.06852, 2015.
45–66, 2004.
[18] William Lotter, Gabriel Kreiman, and David Cox,
[7] B Kiran, Dilip Thomas, and Ranjith Parakkal, “An “Deep predictive coding networks for video predic-
overview of deep learning based methods for unsu- tion and unsupervised learning,” arXiv preprint
pervised and semi-supervised anomaly detection in arXiv:1605.08104, 2016.
videos,” Journal of Imaging, vol. 4, no. 2, pp. 36, 2018.
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 25,2020 at 11:12:29 UTC from IEEE Xplore. Restrictions apply.