Anomalous Motion Detection On Highway Using Deep Learning: Harpreet Singh Emily M. Hand Kostas Alexis

The document presents a new dataset called the Highway Traffic Anomaly (HTA) dataset for evaluating anomaly detection methods in highway traffic videos captured from vehicle dash cams. The goal is to detect abnormal traffic patterns compared to normal driving motion. The dataset consists of dash cam videos divided into training videos of normal traffic and test videos containing both normal and abnormal traffic, with five types of anomalies annotated. Three state-of-the-art deep learning anomaly detection models are evaluated on the new dataset as well as two proposed variations tailored for the highway traffic domain.

Uploaded by

Asil Asila

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views5 pages

Anomalous Motion Detection On Highway Using Deep Learning: Harpreet Singh Emily M. Hand Kostas Alexis

Uploaded by

Asil Asila

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

ANOMALOUS MOTION DETECTION ON HIGHWAY USING DEEP LEARNING

Harpreet Singh Emily M. Hand Kostas Alexis

University of Nevada, Reno, USA

Department of Computer Science and Engineering

ABSTRACT
Research in visual anomaly detection draws much interest due
to its applications in surveillance. Common datasets for eval-
uation are constructed using a stationary camera overlooking
a region of interest. Previous research has shown promising
results in detecting spatial as well as temporal anomalies in
these settings. The advent of self-driving cars provides an
opportunity to apply visual anomaly detection in a more dy-
namic application yet no dataset exists in this type of environ- Fig. 1. The top row shows examples from the CUHK anomaly
ment. This paper presents a new anomaly detection dataset – dataset, normal image data(left) and abnormal(right). The
the Highway Traffic Anomaly (HTA) dataset – for the prob- bottom row is an example from the UCSD anomaly dataset,
lem of detecting anomalous traffic patterns from dash cam the anomaly(right) is the golf cart
videos of vehicles on highways. We evaluate state-of-the-art
deep learning anomaly detection models and propose novel
variations to these methods. Our results show that state-of- the Highway Traffic Anomaly (HTA) dataset, to evaluate
the-art models built for settings with a stationary camera do anomaly detection methods with a moving agent. Specifi-
not translate well to a more dynamic environment. The pro- cally, the dataset consists of dash cam videos captured from
posed variations to these SoTA methods show promising re- vehicles driving on the highway. The goal is to learn normal
sults on the new HTA dataset. driving motion of traffic in the camera’s field of view and
Index Terms— anomaly detection, deep learning, one- then detect conditions in which other vehicles are moving
class classification abnormally. In this dataset, not only is the camera mov-
ing but other vehicles and background features are also in
motion relative to the camera. The dataset consists of five
1. INTRODUCTION types of anomalies: speeding vehicle, speeding motorcycle,
vehicle accident, close merging vehicle and halted vehicle.
Anomaly detection is an unsupervised one-class classification Three state-of-the-art deep learning based anomaly detection
problem with the goal of learning the normal state of data dur- models are evaluated and two variations, specifically for the
ing training and then detecting aberrations without any pro- problem of detecting anomalous highway traffic motion, are
vided labels. Many anomaly detection applications involve proposed. Code for the HTA dataset and the evaluated models
visual data, such as images or videos, and are motivated by in- is available at [2].
terest in surveillance. Datasets commonly evaluated for visual
anomaly detection are constructed with a stationary camera
observing a region in which the background environment is 2. RELATED WORK
relatively static while foreground objects such as pedestrians 2.1. Anomaly Detection Datasets
and vehicles are in motion. Anomalies are defined as appear-
ance or motion deviations from normal data in the foreground Frequently evaluated visual anomaly datasets consist of a
objects, shown in Figure 1. [1] provides a comprehensive list static background, moving foreground objects, and a station-
of video anomaly detection datasets, but they all maintain the ary camera. The UCSD Pedestrian dataset [3] consists of
same characteristics. As autonomous robots become increas- videos of pedestrians on a university campus. Normal data
ingly common, there is a need to perform anomaly detection corresponds to pedestrians walking on pathways. Abnormal
while an agent is moving within an environment yet no such data consists of small vehicles or cyclists that are differ-
dataset exists. ent in appearance and motion from usual pedestrian traffic.
This paper presents a new anomaly detection dataset, The CUHK [4] Avenue dataset also consists of videos of a

Authorized licensed use limited to: IEEE Xplore. Downloaded on October 25,2020 at 11:12:29 UTC from IEEE Xplore. Restrictions apply.
crowded walkway with anomalies deﬁned as unusual behav-
ior, such as throwing an object. Similarly, the UMN Unusual
Crowd Activity dataset [5] consists of videos of unusual be-
havior in crowds. While not exhaustive, this list summarizes
the typical visual anomaly datasets previously studied. To the
best of our knowledge, there is no open-source autonomous
driving dataset speciﬁcally for the task of anomaly detection.

2.2. Anomaly Detection Methods

Initial work in anomaly detection relied on hand-engineered
features (e.g., HOG) to create meaningful representations of
normal image data. The features were then used to fit a statis- Fig. 2. Examples of normal images from the training set.
tical model such as an SVDD [6]. Deep learning models have
recently shown promising results in anomaly detection. The
deep learning models evaluated in this study can be grouped of the data collection process, the HTA dataset is not without
using the taxonomy proposed in [7]: generative and predictive noise. For instance, bumps and cracks on the highway cause
learning models. transient shaking in the videos. These characteristics make
Generative models attempt to estimate the probability dis- the HTA dataset more challenging and realistic.
tribution of the training data (i.e. the normal motion data). Normal driving conditions in this dataset are defined by
After training, test videos are evaluated by some form of a the motion of vehicles that does not perturb the motion of the
reconstruction error. [8] proposes a method to use adversarial dash cam vehicle or the motion of other vehicles stays rela-
autoencoders for anomaly detection by relying on the gen- tively self-similar. The training set consists of 286 videos of
erator’s ability to model only the normal data distribution. normal traffic conditions, a total of 322, 202 video frames and
More recently, a conditional generative adversarial network an average duration of 40 seconds. Figure 2 shows a sample
(CGAN) is used by [9]. The authors train two independent of video frames from the training set.
CGANs to learn an image-to-flow and flow-to-image trans- The test set contains a total of 103 videos, 78 normal
formation. We evaluate this model on the HTA dataset, pro- traffic videos not in the training set and 25 abnormal traf-
viding more details in section 3. Predicting Optical flow with fic videos. Abnormal traffic videos consist of five types
CNNs for visual understanding has been shown to be effec- of anomalous motion: speeding vehicle (4 videos), close
tive, such as in [10] for action classification. merge (13 videos), halted vehicle (1 video), vehicle accident
Predictive networks seek to model sequential data. This (5 videos), and speeding motorcycle (2 videos). Each case
approach is effective in learning temporal patterns in videos; represents a situation in which a human driver will practice
given a sequence of N video frames, predict the N + 1 frame. caution. Vehicle accident anomalies were downloaded from
Anomaly detection is performed by first training only with YouTube. Example image sequences of each abnormality
normal data to accurately predict images from normal mo- are shown in Figure 3. Abnormal motion is manually anno-
tion sequences. After training, predicting a future frame of an tated at the frame level since only short sequences contain
abnormal sequence will result in a larger error. Various archi- abnormal motion. A frame was labelled anomalous if the
tectures have been proposed [11, 12, 13], they all share the motion from the previous frame to the current was part of an
same anomaly detection mechanism. anomalous motion. There are a total of 1531 frames labeled
as anomalous, making up 6% of the abnormal test set.
3. METHOD
3.2. Generative Models
3.1. Highway Traffic Anomaly Dataset
Since an anomaly is defined as irregular motion, genera-
The HTA dataset was curated from the Berkeley DeepDrive tive models can learn to predict dense optical flow to model
dataset [14] that consists of 100k high resolution(1280 × 720, normal motion. For training, ground truth optical flow is
30F P S) dash cam videos collected from cars in New York computed using OpenCV’s [15] dense optical flow imple-
and the Bay Area. We sifted through the entire dataset, select- mentation. The first generative model evaluated is the condi-
ing only highway driving videos. From that subset, videos in tional GAN (CGAN) proposed in [9], Figure 4. The CGAN
visually degraded conditions were removed. In summary, the is trained to predict the optical flow between a pair of se-
HTA dataset consists of highway videos: during clear light- quential frames. The generator’s input are two RGB images,
ing conditions; clear, partly cloudy, or overcast weather con- concatenated depth-wise and it predicts the corresponding
ditions; minimally occluded from large vehicles; and contain optical flow. The discriminator classifies patches in the input
at least some traffic in motion. Due to the imperfect nature as real or fake, producing a 2D-one channel output in the

Authorized licensed use limited to: IEEE Xplore. Downloaded on October 25,2020 at 11:12:29 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Predictive Coding Network architecture is composed
of modules containing four components: recurrent layer, pre-
Fig. 3. Examples of abnormal images from the training set. diction layer, input layer, and error layer.
Top to bottom row: speeding motorcycle, halted vehicle,
close merge, speeding vehicle. Vehicle accident not shown frame. The Predictive Coding Network (PredNet) proposed in
but YouTube links will be provided. [18] showed promising results in predicting future frames in
the KITTI dataset. PredNet, shown in Figure 5, is constructed
with a series of modules that perform local predictions and
then propagates only the errors between the predicted and ac-
tual to the subsequent layers. The evaluated model was con-
structed using the recommendations in [18]. Anomalies are
detected using the pixel-level reconstruction error between
the predicted (N + 1)th frame and the ground truth frame,
employing the averaging sliding window approach to com-
pute pixel-level difference.

Fig. 4. The CGAN’s generator learns an image-to-optical 3.4. Proposed Variations

flow transformation. The discriminator processes the input
and either the ground truth or generator’s output concatenated The first modification we propose is related to the detection
and classify it as real or fake. mechanism that relies on the reconstruction error. The pixel-
level reconstruction error averaged with a sliding window can
be improved by using a variable window size. Since vehicles
range [0, 1], 1 denoting real and 0 denoting fake. in the center of the image tend to be further from the cam-
[9] proposes using the discriminator’s output to detect era, anomalous motion near the center of the image will dis-
anomalies since it learns to distinguish normal motion. The place a small number of pixels. In order to account for depth,
discriminator’s output can be interpreted as a heat map for we propose to use a smaller 3x3 averaging window size near
patches containing abnormal motion. A pair of images will be the center and a larger 9x9 averaging window size around the
classified as anomalous if there exists an element in the dis- edges.
criminator’s output below a threshold. The second approach Another characteristic, specific to highway anomaly de-
to detect abnormal motion uses the pixel-level difference be- tection, is the motion of the vehicles is relative due to ego-
tween generator’s predicted optical flow and the ground truth motion; vehicles moving at speeds similar to the camera will
optical flow. The difference is then averaged using a sliding seem to displace very few pixels, if any. When using the
window. A frame is labeled as anomalous if there exists an PredNet model to predict future frames, rather than just one
error above a threshold in either the x or y component. frame into the future, the model can predict the 6th future
Due to the high capacity of GANs, they may estimate frame instead. [18] tests the PredNet’s capabilities of pre-
anomalous data as well as normal data [16]. In order to evalu- dicting frames further into the future by using PredNet’s first
ate the effectiveness of the CGAN, we also evaluate the state- predicted frame as input for the next prediction. Five frames
of-the-art deep learning dense optical flow prediction model: further into the future are shown to maintain reasonable accu-
FlowNet [17]. Unlike the CGAN, FlowNet is not trained ad- racy. We evaluated this method using N=1,2,3,... and found
versarially. that N=6 gave the best results.

3.3. Predictive Model 4. EXPERIMENTS

Predictive models can learn sequential data such as videos. A total of four models are evaluated with the HTA dataset:
By training on sequences of N normal trafﬁc frames, a look CGAN, FlowNet, PredNet N + 1, and PredNet N + 6. The
back of N , a predictive model will then predict the N + 1 performance of each model is measured using an AUC score.

Authorized licensed use limited to: IEEE Xplore. Downloaded on October 25,2020 at 11:12:29 UTC from IEEE Xplore. Restrictions apply.
Table 1. AUC scores computed from the all the evaluated
models. PredNet(N+6) performs the best across the board.
CGAN FlowNet PredNet(N+1) PredNet(N+6)
Speeding Vehicle 0.608 0.623 0.497 0.614
Accident 0.607 0.657 0.559 0.601
Speeding motorcycle 0.580 0.593 0.581 0.828
Close Merge 0.422 0.531 0.619 0.643 Fig. 7. Results from PredNet of the extrapolated frames of ab-
Halted Vehicle 0.337 0.216 0.554 0.236 normal motion, the speeding motorcycle. Vehicles in normal
motion are less blurry than the speeding motorcycle.

both models are near 0.5, indicating that they are unable to
discern abnormal motion from normal motion. It seems that
both models learn to estimate optical flow better than learning
distinguishing patterns in normal motion.
Fig. 6. The CGAN discriminator’s output(right) is unable to
identify the anomalous motion of the motorcycle. 4.3. Predictive Coding Network
The PredNet model is trained on a look back of 10 consecu-
The thresholds in each experiment range from 0.002 to 1.0 in tive RGB images as suggested by the original work. As with
increments of 0.002. Five AUC scores are reported for each the CGAN, each image is cropped from the bottom as well
model, one for each type of anomalous motion, Table 1. as the top with the final input size 64 × 256 × 3 × 10. The
model was trained for 100 epochs. Anomalies are detected
4.1. Conditional GAN using the reconstruction error between the model’s predicted
future frame and the ground truth frame. Using this mecha-
The CGAN is trained on pairs of RGB images. Input images
nism, AUC scores for PredNet N + 1 are provided in Table
are cropped from the bottom to remove the visible hood of
1. The AUC scores of PredNet N + 1 show that close merge
the vehicle as well as from the top to reduce the amount of
and speeding motorcycle anomalies perform relatively better
sky/background in each frame. The final input size is 128 ×
than others and achieves the highest AUC score for the halted
512 × 6. The model is trained for 40 epochs.
vehicle anomaly albeit still near 0.5.
The first approach to detect anomalies is by using the dis-
By setting a look back of two, we extrapolate four frames
criminator’s 2D - one channel output as a heat map for patches
into the future, testing the sixth frame for anomalous motion,
containing an anomaly. [9] shows promising results using this
PredNet N + 6. Figure 7 shows a sample output and ground
approach but the discriminator’s output trained on the HTA
truth frame. Anomalies are detected in the same manner as
dataset does not produce the same results, Figure 6. The
with PredNet N +1. AUC scores for PredNet N +6 are shown
salient characteristic that may cause this discrepancy in the
in Table 1 and show significant improvement in detecting the
results is that, unlike the UCSD Pedestrian dataset, the HTA
speeding motorcycle anomaly. PredNet N + 6 achieves the
dataset does not maintain a static background.
highest or close to the highest AUC score for each anomaly
The second approach to detect anomalies with the CGAN
type except for halted vehicle.
is to use the reconstruction error from the generator’s out-
put. The AUC scores of the CGAN in Table 1 show that all
anomaly types are near 0.5, meaning that the reconstruction 5. CONCLUSION
error from generator’s output has minimal discriminative ca-
This paper presents a new anomaly detection dataset, the
pabilities in classifying abnormal motion.
Highway Traffic Anomaly (HTA) dataset. It differs from
existing anomaly detection datasets in many ways that make
4.2. FlowNet it more challenging. To the best of our knowledge, this is
FlowNet is also trained on pairs of RGB images from the the first anomaly detection dataset for autonomous driving.
training dataset. The input RGB images for FlowNet are Four state-of-the-art deep learning models were evaluated
cropped from the bottom, making the input size 2 × 256 × with a proposed heuristic to improve the reconstruction er-
512 × 3. The model is trained for 100 epochs. Anomalies are ror for anomaly detection tailored for the HTA dataset. The
detected in the test set using the same approach as CGAN, results indicate that state-of-the-art models do not perform
the reconstruction error between the predicted and ground well on the HTA dataset. Our proposed variation of the Pred-
truth optical flow. The AUC scores for the FlowNet are only Net model to predict the sixth future frame shows promising
slightly better than the CGAN results, Table 1. results on the speeding motorcycle anomaly and relatively
Both generative models estimate optical flow of abnor- better in all anomalies.
mal motion just as well as normal motion; AUC scores of

Authorized licensed use limited to: IEEE Xplore. Downloaded on October 25,2020 at 11:12:29 UTC from IEEE Xplore. Restrictions apply.
6. REFERENCES [12] Yong Shean Chong and Yong Haur Tay, “Abnormal
event detection in videos using spatiotemporal autoen-
[1] N. Patil and Prabir Kumar Biswas, “A survey of video coder,” in International Symposium on Neural Net-
datasets for anomaly detection in automated surveil- works. Springer, 2017, pp. 189–196.
lance,” 2016 Sixth International Symposium on Embed-
ded Computing and System Design (ISED), pp. 43–48, [13] Weixin Luo, Wen Liu, and Shenghua Gao, “Remember-
2016. ing history with convolutional lstm for anomaly detec-
tion,” in 2017 IEEE International Conference on Multi-
[2] Harpreet Singh, “Highway trafﬁc anomaly repos- media and Expo (ICME). IEEE, 2017, pp. 439–444.
itory,” https://github.com/harpreets652/
[14] Fisher Yu, Wenqi Xian, Yingying Chen, Fangchen
highway-traffic-anomaly, 2019.
Liu, Mike Liao, Vashisht Madhavan, and Trevor Dar-
[3] Vijay Mahadevan, Weixin Li, Viral Bhalodia, and Nuno rell, “Bdd100k: A diverse driving video database
Vasconcelos, “Anomaly detection in crowded scenes,” with scalable annotation tooling,” arXiv preprint
in 2010 IEEE Computer Society Conference on Com- arXiv:1805.04687, 2018.
puter Vision and Pattern Recognition. IEEE, 2010, pp. [15] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal
1975–1981. of Software Tools, 2000.
[4] Cewu Lu, Jianping Shi, and Jiaya Jia, “Abnormal event [16] B Kiran, Dilip Thomas, and Ranjith Parakkal, “An
detection at 150 fps in matlab,” in Proceedings of overview of deep learning based methods for unsu-
the IEEE international conference on computer vision, pervised and semi-supervised anomaly detection in
2013, pp. 2720–2727. videos,” Journal of Imaging, vol. 4, no. 2, pp. 36, 2018.

[5] Nikos Papanikolopoulos, “Detection of unusual crowd [17] Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip
activity,” Unusual crowd activity dataset. Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick
Van der Smagt, Daniel Cremers, and Thomas Brox,
[6] David MJ Tax and Robert PW Duin, “Support vector “Flownet: Learning optical ﬂow with convolutional net-
data description,” Machine learning, vol. 54, no. 1, pp. works,” arXiv preprint arXiv:1504.06852, 2015.
45–66, 2004.
[18] William Lotter, Gabriel Kreiman, and David Cox,
[7] B Kiran, Dilip Thomas, and Ranjith Parakkal, “An “Deep predictive coding networks for video predic-
overview of deep learning based methods for unsu- tion and unsupervised learning,” arXiv preprint
pervised and semi-supervised anomaly detection in arXiv:1605.08104, 2016.
videos,” Journal of Imaging, vol. 4, no. 2, pp. 36, 2018.

[8] Asimenia Dimokranitou, Adversarial autoencoders for

anomalous event detection in images, Ph.D. thesis,
2017.

[9] Mahdyar Ravanbakhsh, Enver Sangineto, Moin Nabi,

and Nicu Sebe, “Training adversarial discriminators for
cross-channel abnormal event detection in crowds,” in
2019 IEEE Winter Conference on Applications of Com-
puter Vision (WACV). IEEE, 2019, pp. 1896–1904.

[10] Pierre-Etienne Martin, Jenny Benois-Pineau, Renaud

Péteri, and Julien Morlier, “Optimal choice of motion
estimation methods for ﬁne-grained action classiﬁcation
with 3d convolutional networks,” in 2019 IEEE Interna-
tional Conference on Image Processing (ICIP). IEEE,
2019, pp. 554–558.