Video Anomaly Detection For Smart Surveillance: Related Concepts
Video Anomaly Detection For Smart Surveillance: Related Concepts
Related Concepts
Definition
Anomalies in videos are broadly defined as events or activities that are un-
usual and signify irregular behavior. The goal of anomaly detection is to tem-
porally or spatially localize the anomaly events in video sequences. Temporal
localization (i.e. indicating the start and end frames of the anomaly event in a
video) is referred to as frame-level detection. Spatial localization, which is more
challenging, means to identify the pixels within each anomaly frame that cor-
respond to the anomaly event. This setting is usually referred to as pixel-level
detection.
Background
In modern intelligent video surveillance systems, automatic anomaly detec-
tion through computer vision analytics plays a pivotal role which not only sig-
nificantly increases monitoring efficiency but also reduces the burden on live
monitoring. Video anomaly detection has been studied for a long time, while
this problem is far from being solved (as witnessed by the low accuracy on UCF-
Crime [22] dataset) due to the difficulty of modeling anomaly events and the
scarcity of anomaly data. Identifying anomaly events requires understanding of
complex visual patterns, and some patterns can only be detected when long-term
temporal relationship and causal reasoning are learned in the model, e.g. arson,
burglary, shoplifting, etc.
Early works mainly follow the setting of general anomaly detection which may
be better referred to as novelty detection, where all novel events are considered
as anomaly [13]. This problem is typically formulated as unsupervised learning,
where the models are trained with only normal video frames and validated with
both normal and anomaly frames. A popular idea is to find a set of basis to
represent normal frames and identify frames with high reconstruction loss or er-
ror as anomaly for inference, e.g. sparse coding [5,15], autoencoder [9]. However,
due to the limitation of data and computation, these approaches [13,12,5,15,23]
are conducted on small-scale datasets with relatively simple scenarios, which are
not satisfactory for real-world surveillance applications.
While it is theoretically pleasing to consider all the novel events as anomaly,
this setting has drawbacks for practical surveillance applications. Taking the
campus scenario [13,14] as an example, riding a bike is novel (i.e. considered as
an anomaly) since the model only sees people walking [13]. However, it should not
be considered as an anomaly in general for security purpose. As some anomaly
activities in real world applications may have clear definitions, e.g. different crim-
inal events which follow specific patterns, recent works [22,28] start to leverage
supervision for real-world anomaly detection. UCF-Crime [22] is currently the
largest anomaly detection dataset with realistic anomalies, which contains thou-
sands of anomaly and normal videos. The training set contains both anomaly
and normal videos with video-level annotation as a weak supervision, and the
frame-level annotation is provided for validation set. The detection performance
has been significantly improved with weakly supervised methods [22,28].
There is also a line of research focuses on specific anomaly detection tasks
where only one type of anomaly is considered, e.g. traffic accident on highway.
Since the camera poses, foreground patterns, and backgrounds are highly similar
and stable, the geometric prior knowledge and physics principles can be employed
for manually designed detection pipelines. Several representative works [21,3] rely
on object detection to identify anomaly events.
Representative Approaches
Based on the experimental setting on the training data, video anomaly detec-
tion methods can be generally classified into three categories, i.e. unsupervised,
weakly-supervised, and supervised. We provide a brief overview of recent ap-
proaches for each category.
Unsupervised Methods
Since real-world anomaly events happen with low probability, it is hard to cap-
ture all types of anomaly. However, normal videos are easy to access from social
media and public surveillance, unsupervised methods are thus motivated to de-
tect anomaly events with only normal videos in the training set. Although the
unsupervised methods are not able to achieve satisfactory performance on com-
plex real-world scenarios, they are believed to have better generalization ability
on unseen anomaly patterns.
Classic Machine Learning: Early unsupervised methods mainly adopt classic
machine learning techniques with hand-crafted features as well probability mod-
els. Kim et al . [12] propose to first extract optical flow features and find typical
patterns with a mixture of probabilistic PCA (Principal Component Analysis).
A space-time MRF (Markov Random Field) is then constructed to model the
relationship between spatio-temporal local regions of a video for Bayesian in-
ference. Inspired by studies of crowd behavior like social force model, Mehran
et al . [18] estimate the interaction forces in crowd to better model the normal
crowd behavior. Then normal and anomaly frames are classified with BoW (Bag
of Words) and LDA (Latent Dirichlet Allocation). Li et al . [13] introduce a
mixture of DT-based (Dynamic Textures) model for temporal normalcy. And a
discriminant saliency detection is utilized for measurement of spatial normalcy.
Ullah et al . [23] first extract the corner features and refine them with interaction
flow. A random forest is then trained to classify normal and anomaly frames.
Cong et al . introduce sparse coding for anomaly detection, and Lu et al . [15]
further propose an efficient sparse combination learning framework to achieve a
speed of 150 frames per second (fps).
Deep Learning: Thanks to deep learning techniques, recent works are able to
take advantage of large-scale dataset and powerful computation resource. Follow-
ing the setting of unsupervised anomaly detection, a number of works [9,16,17,7]
are proposed based on deep AE (autoencoder). Hasan et al . [9] propose to learn
both motion feature and discriminative regular patterns with a FCN (Fully Con-
volutional Network) based AE. The regularity score is computed based on the
reconstruction error of AE model. To better model the temporal relationship
within a video, [16] combines FCN and LSTM (long short term memory) as a
ConvLSTM-AE, which further improve the performance of AE framework. [17]
explores the combination of sparse coding and RNN (Recurrent Neural Net-
work). A temporally-coherent sparse coding framework is proposed to introduce
temporal information of video in the background of sparse coding. [7] proposes a
memory-augmented AE to memorize prototypical normal patterns for anomaly
detection. An attention-based sparse addressing is then designed to access the
memory and reconstruct future frames. For all mentioned AE based methods,
the anomaly events are determined based on the reconstruction error. On the
other hand, [11] proposes to formulate the problem as a multi-class classifica-
tion by applying k-means clustering and one-versus-all SVM (Support Vector
Machine).
With the increasing video data on social media platforms such as YouTube∗ ,
it is possible to access and annotate a large amount of anomaly videos [22].
For certain application scenarios where the anomaly activities are well defined,
the performance can be significantly improved by introducing supervision in-
formation. Recent works [22,28] follow the weakly supervised setting where only
video-level annotation is available for training. That is the training videos are la-
beled with normal or anomalous; however, the temporal location of the anomaly
event in each anomaly video is unknown (i.e. weak supervision).
Sultani et al . [22] and He et al . [10] formulate the weakly supervised problem
as MIL (multiple instance learning). Every frame of a normal video should be
normal, and there is at least one anomaly frame in an anomalous video. [10]
proposes a graph-based MIL framework with anchor dictionary learning, and
all experiments are conducted on UCSD [13] dataset with a weakly supervised
setting. [22] proposes a deep learning based method along with a large-scale
dataset with realistic crime-related anomalies and surveillance videos, namely
UCF-Crime [22]. A C3D framework is used to extract spatio-temporal features
and generate anomaly score. To distinguish normal and anomaly frames with this
weak supervision, the loss function forces the highest score of a negative video
to be higher than the highest score of a normal video. With the parameters of
C3D model frozen, [22] outperforms previous works by a large margin on the
UCF-Crime dataset.
Instead of improving the MIL technique, Zhong et al . [28] consider the weakly
supervised learning as a noisy label learning problem, where the annotation of
some frames in anomaly videos are wrong. They train a GCN (Graph Convolu-
tional Network) based cleaner to refine the noisy labels so that the classification
network can be trained end-to-end with frame-level labels.
Supervised Methods
For certain scenarios where the backgrounds and objects are well defined, e.g.
the roads and cars for highway traffic accidents detection, recent works [3,24]
are usually based on the frame-level annotated training videos (i.e. the temporal
annotations of the anomalies in the training videos are available – supervised
setting). A popular solution is to leverage the geometric prior knowledge and
object detection with additional supervision from other public datasets.
[21] first applies Faster-RCNN to detect vehicles, then an attention-based
LSTM module is applied to learn the accident score. For recent works [3,24] on
AI city challenge† , the frame-level annotation of accident is given on training set.
Apart from applying object detection, [3] models the background and space using
semantic segmentation, and the geometric prior is leveraged by perspective de-
tection. The vehicle dynamics are then represented by a spatial-temporal matrix.
∗
https://www.youtube.com/
†
https://www.aicitychallenge.org/
The anomaly events are identified based on the IOU (Intersection Over Union)
of different objects while applying the NMS (Non-Maximum Suppression) pro-
cedure. [24] utilizes YOLOv3 (You Only Look Once) as the object detector and
specifically improves the framework for small object scenarios. Then a multi-
object tracking is introduced to generate the trajectories of anomaly vehicles.
The accident starting time is estimated based on a curve fitting algorithm.
Datasets
In this section, we briefly review the popular datasets for video anomaly
detection. An overview of all listed datasets is provided in Table 1.
UCSD: The UCSD dataset contains two subsets, denoted as Ped1 and Ped2.
They are captured with different camera poses at two spots in UCSD campus
where most pedestrians walk. The training set (34 clips for Ped1 and 16 clips for
Ped2) only contains normal frames, and the test set (36 clips for Ped1 and 12 clips
for Ped2) consists of both normal and anomaly frames. Frame-level annotation
is provided for all test clips and 10 of them have pixel-level ground-truth. UCSD
dataset considers pedestrians walking as the normal pattern, so non pedestrian
entities like bikers and skaters are defined as anomaly instances. Dataset link:
http://www.svcl.ucsd.edu/projects/anomaly/dataset.html
Subway: Subway [2] dataset contains two subsets, i.e. Subway Entrance
and Subway Exit. They contain only one long surveillance video each in subway
station. They are first proposed specifically for real-time detection of unusual
events detection in crowded subway scenes, e.g. moving in the wrong direc-
tion, or no payment. Dataset link: http://vision.eecs.yorku.ca/research/
anomalous-behaviour-data/
Avenue: The Avenue [15] dataset contains 15 videos, and each video is about
2 minutes long. The total frame number is 35,240. 8,478 frames from 4 videos
are used as training set. Typical unusual events include running and throw-
ing objects. Dataset link: http://www.cse.cuhk.edu.hk/leojia/projects/
detectabnormal/dataset.html
UMN: The UMN [1] (University of Minnesota) dataset consists of five videos
captured from different angles. The normal pattern is defined as walking and the
main anomaly activity is running. Dataset link: http://mha.cs.umn.edu/
DAD: DAD [4] (Dashcam Accident Dataset) is proposed specifically for
accident detection. The normal pattern is vehicles moving around and anomaly
events include different traffic accidents, e.g. car hits car, or motorbike hits
motorbike. DAD dataset consists of 678 videos from six cities. 58 videos are
used for training. For the rest 620 videos, 620 clips with accidents are sampled
as positive clips and 1130 normal clips are sampled as negative clips. They are
then randomly split into two subsets, i.e. 455 positive and 829 negative clips
for training, and 165 positive and 301 negative clips for testing. Dataset link:
https://aliensunmin.github.io/project/dashcam/
CADP: CADP [21] (Car Accident Detection and Prediction) focuses on
car accident on CCTV (Closed-Circuit Television) cameras. All the 1416 videos
of CADP contain traffic accidents, and 205 of them have temporal as well
as spatial annotations. CADP contains videos captured under various cam-
era types, qualities, weather conditions, and the anomaly events are realistic
for real-world applications. Dataset link: https://ankitshah009.github.io/
accident_forecasting_traffic_camera
A3D: A3D [26] consists of 1500 on-road abnormal event video clips from
dashboard cameras. Each video contains an abnormal traffic event, and the
anomaly start and end times are annotated by human annotators. A total of
128,175 frames (ranging from 23 to 208 frames) at 10 frames per second are
clustered into 18 types of traffic accidents. Dataset link: https://github.com/
MoonBlvd/tad-IROS2019
DADA: DADA [6] is a traffic accident dataset collected for driver attention
prediction in accidental scenarios. It has 658,476 available frames contained in
2000 videos with the resolution of 1584×660. The videos are divided into 54 kinds
of categories, such as “hitting” and “out of control”, based on the participants
of accidents (e.g. pedestrian, vehicle, cyclist, etc.). The spatial crash-objects,
temporal window of the occurrence of accidents are annotated. Dataset link:
https://github.com/JWFangit/LOTVS-DADA
DoTA: DoTA [25] (Detection of Traffic Anomaly) is a recent traffic anomaly
detection dataset containing 4,677 videos with temporal, spatial, and categorical
annotations. The objective is to introduce a when-where-what pipeline to detect,
localize, and recognize anomalous events from egocentric videos. The video clips
are collected from YouTube channels with diverse dash camera accident videos
from different countries under different weather and lighting conditions. Dataset
link: https://github.com/MoonBlvd/Detection-of-Traffic-Anomaly
Iowa DOT Traffic: Iowa DOT (Department of Transportation) Traffic
dataset [19] consists of 200 videos, each approximately 15 minutes in length,
recorded at 30 fps and 800 × 410 resolution. Training and testing set each con-
tains 100 videos. As the official dataset for the 2018 AI City challenge [19] Track
3, it does not provide annotation for the testing set. Main anomaly patterns are
car crashes and stalled vehicles. Dataset link: https://www.aicitychallenge.
org/2018-ai-city-challenge/
ShanghaiTech: ShanghaiTech [14] dataset is collected in ShanghaiTech Uni-
versity under 13 scenes with complex light conditions and camera viewpoints.
It consists of 437 videos with 726 average frames each. The training set con-
sists of 330 normal videos and testing set contains 107 videos with 130 anoma-
lies. Anomaly events include unusual patterns in campus such as bikers or cars.
Dataset link: https://svip-lab.github.io/dataset/campus_dataset.html
UCF Crime: UCF Crime [22] consists of 1900 untrimmed videos covering
13 real-world anomaly events, including Abuse, Arrest, Arson, Assault, Road
Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplift-
ing, and Vandalism. 950 of them are normal videos and the rest videos contain
at least one anomaly event for each. The training set contains 800 normal and
810 anomalous videos. The remaining 150 normal and 140 anomalous videos are
temporally annotated for validation. Both training and testing sets cover all the
13 anomaly events. Some of the videos may contain multiple anomaly categories,
e.g. robbery along with fighting, burglary with vandalism, arrest with shooting.
All the videos are realistic for real-world surveillance applications. Furthermore,
UCF Crime covers different light conditions, image resolutions, and camera poses
in complex scenarios, thus is very challenging. Dataset link: https://www.crcv.
ucf.edu/projects/real-world/
Street Scene: Street Scene [20] dataset is focused on single scene anomaly
detection. It consists of 46 training videos and 35 testing videos taken from
a static USB camera looking down on a scene of a two-lane street with bike
lanes and pedestrian sidewalks. There are a total of 203,257 color video frames
(56,847 for training and 146,410 for testing) with 1280 × 720 resolution. The
frames were extracted from the original videos at 15 frames per second. 17 types
of anomaly events/activities are presented in the dataset such as jaywalking,
loitering, car illegally parked, etc. Dataset link: https://www.merl.com/demos/
video-anomaly-detection
Benchmarks
In this section, we introduce popular evaluation metrics and show existing
results on five popular benchmark datasets, i.e. UCSD Ped2 [13], Avenue [15],
UMN [1], Shanghai Tech [14], UCF [22], and Iowa DOT Traffic [19].
The frame-level evaluation criterion uses the frame-level ground truth anno-
tations to determine which detected frames are true positives (i.e. true anomaly
frames) and which are false positives, yielding frame-level true positive and false
positive rates. In pixel-level evaluation, it requires the algorithm to take into
account the spatial locations of anomaly objects in frames. A detection is con-
sidered to be correct if it covers at least 40% of anomaly pixels in the ground-
truth [13]. The pixel-level evaluation can be conducted only if the pixel-level
annotations are available for the testing videos.
As shown in Tables 2 and 3, the frame-level AUC (Area Under the Curve) of
ROC (Receiver Operating Characteristic) curve is widely used as the evaluation
metric for temporal localization of anomaly events. Since the anomaly detection
can be considered as a binary classification for each frame, the ROC curve is
generated by applying different thresholds for the anomaly score of each frame
and calculating the TPR (True Positive Rate) and FPR (False Positive Rate).
Table 3. AUC (%) of existing works on UCSD, Shanghai Tech, UCF Crime
with weakly supervised setting.
For traffic accident detection with supervised setting, the F1 score, RMSE
(Root Mean Square Error) of anomaly start time, and S3 score are used as
evaluation metrics. The F1 score is defined as:
2T P
F1 = , (1)
2T P + F P + F N
where TP, FP, and FN denote true positive, false positive, and false negative
numbers. The S3 score is computed as:
– Although different learning frameworks have been adopted, the learned rep-
resentations are still not satisfactory for distinguishing complex anomaly
activities. Possible better representations include better 3D feature extrac-
tor, attention mechanism, and causal reasoning (identifying the cause of an
anomaly event, e.g. too fast −→ accidents).
– Early works mainly focus on the unsupervised setting, and recent works have
shown potential on improving performance by leveraging some supervision
information for certain scenarios. It would be promising to explore better
setting for practical applications, e.g. better trade-off between the general-
ization ability (unsupervised setting) and performance (weakly supervised
setting).
– It may be acceptable for anomaly detection systems operating in public
spaces where there is no expectation of privacy. However, what if the tech-
nology needs to be applied to non-public spaces where there is a stronger
expectation of privacy? It is worth exploring effective ways to de-identify the
training videos and train anomaly models with de-identified data.
– The current anomaly detection approaches or systems act as an alerting
mechanism. How do we explain the AI decisions and convey these effectively
to stakeholders, e.g. law enforcement, attorneys, media, local residents, and
‡
https://www.aicitychallenge.org/
broader community. We expect techniques to close the gap between perfor-
mance and interpretable AI models.
References
[1] Unusual crowd activity dataset of University of Minnesota:
http://mha.cs.umn.edu/Movies/Crowd-Activity-All.avi.
[2] Amit Adam, Ehud Rivlin, Ilan Shimshoni, and Daviv Reinitz. Robust real-
time unusual event detection using multiple fixed-location monitors. IEEE
transactions on pattern analysis and machine intelligence, 30(3):555–560,
2008.
[3] Shuai Bai, Zhiqun He, Yu Lei, Wei Wu, Chengkai Zhu, Ming Sun, and
Junjie Yan. Traffic anomaly detection via perspective map based on spatial-
temporal information matrix. In Proc. CVPR Workshops, 2019.
[4] Fu-Hsiang Chan, Yu-Ting Chen, Yu Xiang, and Min Sun. Anticipating
accidents in dashcam videos. In Asian Conference on Computer Vision,
pages 136–153. Springer, 2016.
[5] Yang Cong, Junsong Yuan, and Ji Liu. Sparse reconstruction cost for ab-
normal event detection. In CVPR 2011, pages 3449–3456. IEEE, 2011.
[6] Jianwu Fang, Dingxin Yan, Jiahuan Qiao, and Jianru Xue. Dada: A large-
scale benchmark and model for driver attention prediction in accidental
scenarios. arXiv preprint arXiv:1912.12148, 2019.
[7] Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Man-
sour, Svetha Venkatesh, and Anton van den Hengel. Memorizing normality
to detect anomaly: Memory-augmented deep autoencoder for unsupervised
anomaly detection. In Proceedings of the IEEE International Conference
on Computer Vision, pages 1705–1714, 2019.
[8] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-
Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative ad-
versarial nets. In Advances in neural information processing systems, pages
2672–2680, 2014.
[9] Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K Roy-Chowdhury,
and Larry S Davis. Learning temporal regularity in video sequences. In Pro-
ceedings of the IEEE conference on computer vision and pattern recognition,
pages 733–742, 2016.
[10] Chengkun He, Jie Shao, and Jiayu Sun. An anomaly-introduced learning
method for abnormal event detection. Multimedia Tools and Applications,
77(22):29573–29588, 2018.
[11] Radu Tudor Ionescu, Fahad Shahbaz Khan, Mariana-Iuliana Georgescu,
and Ling Shao. Object-centric auto-encoders and dummy anomalies for
abnormal event detection in video. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 7842–7851, 2019.
[12] Jaechul Kim and Kristen Grauman. Observe locally, infer globally: a space-
time mrf for detecting abnormal activities with incremental updates. In
2009 IEEE Conference on Computer Vision and Pattern Recognition, pages
2921–2928. IEEE, 2009.
[13] Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. Anomaly detection
and localization in crowded scenes. IEEE transactions on pattern analysis
and machine intelligence, 36(1):18–32, 2013.
[14] W. Liu, D. Lian W. Luo, and S. Gao. Future frame prediction for anomaly
detection – a new baseline. In 2018 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2018.
[15] Cewu Lu, Jianping Shi, and Jiaya Jia. Abnormal event detection at 150 fps
in matlab. In Proceedings of the IEEE international conference on computer
vision, pages 2720–2727, 2013.
[16] Weixin Luo, Wen Liu, and Shenghua Gao. Remembering history with convo-
lutional lstm for anomaly detection. In 2017 IEEE International Conference
on Multimedia and Expo (ICME), pages 439–444. IEEE, 2017.
[17] Weixin Luo, Wen Liu, and Shenghua Gao. A revisit of sparse coding based
anomaly detection in stacked rnn framework. In Proceedings of the IEEE
International Conference on Computer Vision, pages 341–349, 2017.
[18] Ramin Mehran, Alexis Oyama, and Mubarak Shah. Abnormal crowd be-
havior detection using social force model. In 2009 IEEE Conference on
Computer Vision and Pattern Recognition, pages 935–942. IEEE, 2009.
[19] Milind Naphade, Zheng Tang, Ming-Ching Chang, David C Anastasiu, Anuj
Sharma, Rama Chellappa, Shuo Wang, Pranamesh Chakraborty, Tingting
Huang, Jenq-Neng Hwang, et al. The 2019 ai city challenge. In CVPR
Workshops, 2019.
[20] Bharathkumar Ramachandra and Michael Jones. Street scene: A new
dataset and evaluation protocol for video anomaly detection. In The IEEE
Winter Conference on Applications of Computer Vision, pages 2569–2578,
2020.
[21] Ankit Shah, Jean Baptiste Lamare, Tuan Nguyen Anh, and Alexander
Hauptmann. Cadp: A novel dataset for cctv traffic camera based acci-
dent analysis. arXiv preprint arXiv:1809.05782, 2018. First three authors
share the first authorship.
[22] Waqas Sultani, Chen Chen, and Mubarak Shah. Real-world anomaly de-
tection in surveillance videos. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pages 6479–6488, 2018.
[23] Habib Ullah, Mohib Ullah, and Nicola Conci. Dominant motion analysis in
regular and irregular crowd scenes. In International Workshop on Human
Behavior Understanding, pages 62–72. Springer, 2014.
[24] Gaoang Wang, Xinyu Yuan, Aotian Zhang, Hung-Min Hsu, and Jenq-
Neng Hwang. Anomaly candidate identification and starting time esti-
mation of vehicles from traffic videos. In AI City Challenge Workshop,
IEEE/CVF Computer Vision and Pattern Recognition (CVPR) Confer-
ence, Long Beach, California, 2019.
[25] Yu Yao, Xizi Wang, Mingze Xu, Zelin Pu, Ella Atkins, and David Crandall.
When, where, and what? a new dataset for anomaly detection in driving
videos. arXiv preprint arXiv:2004.03044, 2020.
[26] Yu Yao, Mingze Xu, Yuchen Wang, David J Crandall, and Ella M Atkins.
Unsupervised traffic accident detection in first-person videos. arXiv preprint
arXiv:1903.00618, 2019.
[27] Muchao Ye, Xiaojiang Peng, Weihao Gan, Wei Wu, and Yu Qiao. Anopcn:
Video anomaly detection via deep predictive coding network. In Proceedings
of the 27th ACM International Conference on Multimedia, pages 1805–1813,
2019.
[28] Jia-Xing Zhong, Nannan Li, Weijie Kong, Shan Liu, Thomas H Li, and
Ge Li. Graph convolutional label noise cleaner: Train a plug-and-play action
classifier for anomaly detection. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pages 1237–1246, 2019.