Anomaly_detection
Anomaly_detection
In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty
detection) is generally understood to be the identification of rare items, events or observations which
deviate significantly from the majority of the data and do not conform to a well defined notion of normal
behavior.[1] Such examples may arouse suspicions of being generated by a different mechanism,[2] or
appear inconsistent with the remainder of that set of data.[3]
Anomaly detection finds application in many domains including cybersecurity, medicine, machine vision,
statistics, neuroscience, law enforcement and financial fraud to name only a few. Anomalies were initially
searched for clear rejection or omission from the data to aid statistical analysis, for example to compute
the mean or standard deviation. They were also removed to better predictions from models such as linear
regression, and more recently their removal aids the performance of machine learning algorithms.
However, in many applications anomalies themselves are of interest and are the observations most
desirous in the entire data set, which need to be identified and separated from noise or irrelevant outliers.
Three broad categories of anomaly detection techniques exist.[1] Supervised anomaly detection
techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a
classifier. However, this approach is rarely used in anomaly detection due to the general unavailability of
labelled data and the inherent unbalanced nature of the classes. Semi-supervised anomaly detection
techniques assume that some portion of the data is labelled. This may be any combination of the normal
or anomalous data, but more often than not, the techniques construct a model representing normal
behavior from a given normal training data set, and then test the likelihood of a test instance to be
generated by the model. Unsupervised anomaly detection techniques assume the data is unlabelled and
are by far the most commonly used due to their wider and relevant application.
Definition
Many attempts have been made in the statistical and computer science communities to define an anomaly.
The most prevalent ones include the following, and can be categorised into three groups: those that are
ambiguous, those that are specific to a method with pre-defined thresholds usually chosen empirically,
and those that are formally defined:
Ill defined
An outlier is an observation which deviates so much from the other observations as to
arouse suspicions that it was generated by a different mechanism.[2]
Anomalies are instances or collections of data that occur very rarely in the data set and
whose features differ significantly from most of the data.
An outlier is an observation (or subset of observations) which appears to be inconsistent
with the remainder of that set of data.[3]
An anomaly is a point or collection of points that is relatively distant from other points in
multi-dimensional space of features.
Anomalies are patterns in data that do not conform to a well-defined notion of normal
behaviour.[1]
Specific
Let T be observations from a univariate Gaussian distribution and O a point from T. Then the
z-score for O is greater than a pre-selected threshold if and only if O is an outlier.
History
Intrusion detection
The concept of intrusion detection, a critical component of anomaly detection, has evolved significantly
over time. Initially, it was a manual process where system administrators would monitor for unusual
activities, such as a vacationing user's account being accessed or unexpected printer activity. This
approach was not scalable and was soon superseded by the analysis of audit logs and system logs for
signs of malicious behavior.[4]
By the late 1970s and early 1980s, the analysis of these logs was primarily used retrospectively to
investigate incidents, as the volume of data made it impractical for real-time monitoring. The
affordability of digital storage eventually led to audit logs being analyzed online, with specialized
programs being developed to sift through the data. These programs, however, were typically run during
off-peak hours due to their computational intensity.[4]
The 1990s brought the advent of real-time intrusion detection systems capable of analyzing audit data as
it was generated, allowing for immediate detection of and response to attacks. This marked a significant
shift towards proactive intrusion detection.[4]
As the field has continued to develop, the focus has shifted to creating solutions that can be efficiently
implemented across large and complex network environments, adapting to the ever-growing variety of
security threats and the dynamic nature of modern computing infrastructures.[4]
Applications
Anomaly detection is applicable in a very large number and variety of domains, and is an important
subarea of unsupervised machine learning. As such it has applications in cyber-security, intrusion
detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks,
detecting ecosystem disturbances, defect detection in images using machine vision, medical diagnosis and
law enforcement.[5]
Intrusion detection
Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986.[6]
Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done
with soft computing, and inductive learning.[7] Types of features proposed by 1999 included profiles of
users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means,
variances, covariances, and standard deviations.[8] The counterpart of anomaly detection in intrusion
detection is misuse detection.
Preprocessing
Preprocessing data to remove anomalies can be an important step in data analysis, and is done for a
number of reasons. Statistics such as the mean and standard deviation are more accurate after the removal
of anomalies, and the visualisation of data can also be improved. In supervised learning, removing the
anomalous data from the dataset often results in a statistically significant increase in accuracy.[11][12]
Video surveillance
Anomaly detection has become increasingly vital in video surveillance to enhance security and
safety.[13][14] With the advent of deep learning technologies, methods using Convolutional Neural
Networks (CNNs) and Simple Recurrent Units (SRUs) have shown significant promise in identifying
unusual activities or behaviors in video data.[13] These models can process and analyze extensive video
feeds in real-time, recognizing patterns that deviate from the norm, which may indicate potential security
threats or safety violations.[13] An important aspect for video surveillance is the developement of scalable
real-time frameworks.[15][16] Such pipelines are required for processing multiple video streams with low
computational resources.
IT infrastructure
In IT infrastructure management, anomaly detection is crucial for ensuring the smooth operation and
reliability of services.[17] Techniques like the IT Infrastructure Library (ITIL) and monitoring frameworks
are employed to track and manage system performance and user experience.[17] Detection anomalies can
help identify and pre-empt potential performance degradations or system failures, thus maintaining
productivity and business process effectiveness.[17]
IoT systems
Anomaly detection is critical for the security and efficiency of Internet of Things (IoT) systems.[18] It
helps in identifying system failures and security breaches in complex networks of IoT devices.[18] The
methods must manage real-time data, diverse device types, and scale effectively. Garbe et al.[19] have
introduced a multi-stage anomaly detection framework that improves upon traditional methods by
incorporating spatial clustering, density-based clustering, and locality-sensitive hashing. This tailored
approach is designed to better handle the vast and varied nature of IoT data, thereby enhancing security
and operational reliability in smart infrastructure and industrial IoT systems.[19]
Petroleum industry
Anomaly detection is crucial in the petroleum industry for monitoring critical machinery.[20] Martí et al.
used a novel segmentation algorithm to analyze sensor data for real-time anomaly detection.[20] This
approach helps promptly identify and address any irregularities in sensor readings, ensuring the reliability
and safety of petroleum operations.[20]
Methods
Many anomaly detection techniques have been proposed in literature.[1][22] The performance of methods
usually depend on the data sets. For example, some may be suited to detecting local outliers, while others
global, and methods have little systematic advantages over another when compared across many data
sets.[23][24] Almost all algorithms also require the setting of non-intuitive parameters critical for
performance, and usually unknown before application. Some of the popular techniques are mentioned
below and are broken down into categories:
Statistical
Parameter-free
Also referred to as frequency-based or counting-based, the simplest non-parametric anomaly detection
method is to build a histogram with the training data or a set of known normal instances, and if a test
point does not fall in any of the histogram bins mark it as anomalous, or assign an anomaly score to test
data based on the height of the bin it falls in.[1] The size of bins are key to the effectiveness of this
technique but must be determined by the implementer.
A more sophisticated technique uses kernel functions to approximate the distribution of the normal data.
Instances in low probability areas of the distribution are then considered anomalies[25].
Parametric-based
Z-score,
Tukey's range test
Grubbs's test
Density
Density-based techniques (k-nearest neighbor,[26][27][28] local outlier factor,[29] isolation
forests,[30][31] and many more variations of this concept[32])
Subspace-base (SOD),[33] correlation-based (COP)[34] and tensor-based[35] outlier
detection for high-dimensional data[36]
One-class support vector machines[37] (OCSVM, SVDD)
Neural networks
Replicator neural networks,[38] autoencoders, variational autoencoders,[39] long short-term
memory neural networks[40]
Bayesian networks[38]
Hidden Markov models (HMMs)[38]
Minimum Covariance Determinant[41][42]
Deep Learning[13]
Convolutional Neural Networks (CNNs): CNNs have shown exceptional performance
in the unsupervised learning domain for anomaly detection, especially in image and
video data analysis.[13] Their ability to automatically and hierarchically learn spatial
hierarchies of features from low to high-level patterns makes them particularly suited for
detecting visual anomalies. For instance, CNNs can be trained on image datasets to
identify atypical patterns indicative of defects or out-of-norm conditions in industrial
quality control scenarios.[43]
Simple Recurrent Units (SRUs): In time-series data, SRUs, a type of recurrent neural
network, have been effectively used for anomaly detection by capturing temporal
dependencies and sequence anomalies.[13] Unlike traditional RNNs, SRUs are designed
to be faster and more parallelizable, offering a better fit for real-time anomaly detection
in complex systems such as dynamic financial markets or predictive maintenance in
machinery, where identifying temporal irregularities promptly is crucial.[44]
Foundation models: Since the advent of large-scale foundation models that have been
used successfully on most downstream tasks, they have also been adapted for use in
anomaly detection and segmentation. Methods utilizing pretrained foundation models
inclue using the alignment of image and text embeddings (CLIP, etc.) for anomaly
localization,[45] while others may use the inpainting ability of generative image models
for reconstruction-error based anomaly detection.[46]
Cluster-based
Clustering: Cluster analysis-based outlier detection[47][48]
Deviations from association rules and frequent itemsets
Fuzzy logic-based outlier detection
Ensembles
Ensemble techniques, using feature bagging,[49][50] score normalization[51][52] and different
sources of diversity[53][54]
Others
Histogram-based Outlier Score (HBOS) uses value histograms and assumes feature independence for fast
predictions.[55]
The Subspace Outlier Degree (SOD)[33] identifies attributes where a sample is normal, and
attributes in which the sample deviates from the expected.
Correlation Outlier Probabilities (COP)[34] compute an error vector of how a sample point
deviates from an expected location, which can be interpreted as a counterfactual
explanation: the sample would be normal if it were moved to that location.
Software
ELKI is an open-source Java data mining toolkit that contains several anomaly detection
algorithms, as well as index acceleration for them.
PyOD is an open-source Python library developed specifically for anomaly detection.[56]
scikit-learn is an open-source Python library that contains some algorithms for unsupervised
anomaly detection.
Wolfram Mathematica provides functionality for unsupervised anomaly detection across
multiple data types [57]
Datasets
Anomaly detection benchmark data repository (http://www.dbs.ifi.lmu.de/research/outlier-ev
aluation/) with carefully chosen data sets of the Ludwig-Maximilians-Universität München;
Mirror (http://lapad-web.icmc.usp.br/repositories/outlier-evaluation/) Archived (https://web.ar
chive.org/web/20220331072353/http://lapad-web.icmc.usp.br/repositories/outlier-evaluatio
n/) 2022-03-31 at the Wayback Machine at University of São Paulo.
ODDS (http://odds.cs.stonybrook.edu/) – ODDS: A large collection of publicly available
outlier detection datasets with ground truth in different domains.
Unsupervised Anomaly Detection Benchmark (https://dataverse.harvard.edu/dataset.xhtml?
persistentId=doi:10.7910/DVN/OPQMVF) at Harvard Dataverse: Datasets for Unsupervised
Anomaly Detection with ground truth.
KMASH Data Repository (https://researchdata.edu.au/kmash-repository-outlier-detection/17
33742/) at Research Data Australia having more than 12,000 anomaly detection datasets
with ground truth.
See also
Change detection
Statistical process control
Novelty detection
Hierarchical temporal memory
References
1. Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey". ACM
Computing Surveys. 41 (3): 1–58. doi:10.1145/1541880.1541882 (https://doi.org/10.1145%2
F1541880.1541882). S2CID 207172599 (https://api.semanticscholar.org/CorpusID:2071725
99).
2. Hawkins, Douglas M. (1980). Identification of Outliers. Springer. ISBN 978-0-412-21900-9.
OCLC 6912274 (https://search.worldcat.org/oclc/6912274).
3. Barnett, Vic; Lewis, Lewis (1978). Outliers in statistical data. Wiley. ISBN 978-0-471-99599-
9. OCLC 1150938591 (https://search.worldcat.org/oclc/1150938591).
4. Kemmerer, R.A.; Vigna, G. (April 2002). "Intrusion detection: a brief history and overview" (h
ttps://dx.doi.org/10.1109/mc.2002.1012428). Computer. 35 (4): supl27 – supl30.
doi:10.1109/mc.2002.1012428 (https://doi.org/10.1109%2Fmc.2002.1012428). ISSN 0018-
9162 (https://search.worldcat.org/issn/0018-9162).
5. Aggarwal, Charu (2017). Outlier Analysis. Springer Publishing Company, Incorporated.
ISBN 978-3319475776.
6. Denning, D. E. (1987). "An Intrusion-Detection Model" (http://apps.dtic.mil/dtic/tr/fulltext/u2/a
484998.pdf) (PDF). IEEE Transactions on Software Engineering. SE-13 (2): 222–232.
CiteSeerX 10.1.1.102.5127 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.
5127). doi:10.1109/TSE.1987.232894 (https://doi.org/10.1109%2FTSE.1987.232894).
S2CID 10028835 (https://api.semanticscholar.org/CorpusID:10028835). Archived (https://we
b.archive.org/web/20150622044937/http://www.dtic.mil/dtic/tr/fulltext/u2/a484998.pdf) (PDF)
from the original on June 22, 2015.
7. Teng, H. S.; Chen, K.; Lu, S. C. (1990). "Adaptive real-time anomaly detection using
inductively generated sequential patterns". Proceedings. 1990 IEEE Computer Society
Symposium on Research in Security and Privacy (http://www.cs.unc.edu/~jeffay/courses/nid
sS05/ai/Teng-AdaptiveRTAnomaly-SnP90.pdf) (PDF). pp. 278–284.
doi:10.1109/RISP.1990.63857 (https://doi.org/10.1109%2FRISP.1990.63857). ISBN 978-0-
8186-2060-7. S2CID 35632142 (https://api.semanticscholar.org/CorpusID:35632142).
8. Jones, Anita K.; Sielken, Robert S. (2000). "Computer System Intrusion Detection: A
Survey" (https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=349277a67468e
7f6a5bfc487ab125887c6925229). Computer Science Technical Report. Department of
Computer Science, University of Virginia: 1–25}.
9. Stojanović, Branka; Božić, Josip; Hofer-Schmitz, Katharina; Nahrgang, Kai; Weber, Andreas;
Badii, Atta; Sundaram, Maheshkumar; Jordan, Elliot; Runevic, Joel (January 2021). "Follow
the Trail: Machine Learning for Fraud Detection in Fintech Applications" (https://www.ncbi.nl
m.nih.gov/pmc/articles/PMC7956727). Sensors. 21 (5): 1594.
Bibcode:2021Senso..21.1594S (https://ui.adsabs.harvard.edu/abs/2021Senso..21.1594S).
doi:10.3390/s21051594 (https://doi.org/10.3390%2Fs21051594). ISSN 1424-8220 (https://s
earch.worldcat.org/issn/1424-8220). PMC 7956727 (https://www.ncbi.nlm.nih.gov/pmc/articl
es/PMC7956727). PMID 33668773 (https://pubmed.ncbi.nlm.nih.gov/33668773).
10. Ahmed, Mohiuddin; Mahmood, Abdun Naser; Islam, Md. Rafiqul (February 2016). "A survey
of anomaly detection techniques in financial domain" (https://dx.doi.org/10.1016/j.future.201
5.01.001). Future Generation Computer Systems. 55: 278–288.
doi:10.1016/j.future.2015.01.001 (https://doi.org/10.1016%2Fj.future.2015.01.001).
ISSN 0167-739X (https://search.worldcat.org/issn/0167-739X). S2CID 204982937 (https://a
pi.semanticscholar.org/CorpusID:204982937).
11. Tomek, Ivan (1976). "An Experiment with the Edited Nearest-Neighbor Rule". IEEE
Transactions on Systems, Man, and Cybernetics. 6 (6): 448–452.
doi:10.1109/TSMC.1976.4309523 (https://doi.org/10.1109%2FTSMC.1976.4309523).
12. Smith, M. R.; Martinez, T. (2011). "Improving classification accuracy by identifying and
removing instances that should be misclassified" (http://axon.cs.byu.edu/papers/smith.ijcnn2
011.pdf) (PDF). The 2011 International Joint Conference on Neural Networks. p. 2690.
CiteSeerX 10.1.1.221.1371 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221.
1371). doi:10.1109/IJCNN.2011.6033571 (https://doi.org/10.1109%2FIJCNN.2011.603357
1). ISBN 978-1-4244-9635-8. S2CID 5809822 (https://api.semanticscholar.org/CorpusID:58
09822).
13. Qasim, Maryam; Verdu, Elena (2023-06-01). "Video anomaly detection system using deep
convolutional and recurrent models" (https://doi.org/10.1016%2Fj.rineng.2023.101026).
Results in Engineering. 18: 101026. doi:10.1016/j.rineng.2023.101026 (https://doi.org/10.10
16%2Fj.rineng.2023.101026). ISSN 2590-1230 (https://search.worldcat.org/issn/2590-123
0). S2CID 257728239 (https://api.semanticscholar.org/CorpusID:257728239).
14. Zhang, Tan; Chowdhery, Aakanksha; Bahl, Paramvir (Victor); Jamieson, Kyle; Banerjee,
Suman (2015-09-07). "The Design and Implementation of a Wireless Video Surveillance
System" (https://doi.org/10.1145/2789168.2790123). Proceedings of the 21st Annual
International Conference on Mobile Computing and Networking (https://discovery.ucl.ac.uk/i
d/eprint/1506446/). MobiCom '15. New York, NY, USA: Association for Computing
Machinery. pp. 426–438. doi:10.1145/2789168.2790123 (https://doi.org/10.1145%2F278916
8.2790123). ISBN 978-1-4503-3619-2. S2CID 12310150 (https://api.semanticscholar.org/Co
rpusID:12310150).
15. Park, Chaewon; Cho, MyeongAh; Lee, Minhyeok; Lee, Sangyoun (2022). "FastAno: Fast
Anomaly Detection via Spatio-temporal Patch Transformation" (https://ieeexplore.ieee.org/d
ocument/9706649). 2022 IEEE/CVF Winter Conference on Applications of Computer Vision
(WACV). IEEE. pp. 1908–1918. doi:10.1109/WACV51458.2022.00197 (https://doi.org/10.11
09%2FWACV51458.2022.00197). ISBN 978-1-6654-0915-5.
16. Ristea, Nicolae-Cătălin; Croitoru, Florinel-Alin; Ionescu, Radu Tudor; Popescu, Marius;
Khan, Fahad Shahbaz; Shah, Mubarak (2024-06-16). "Self-Distilled Masked Auto-Encoders
are Efficient Video Anomaly Detectors" (https://ieeexplore.ieee.org/document/10655393).
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.
pp. 15984–15995. arXiv:2306.12041 (https://arxiv.org/abs/2306.12041).
doi:10.1109/CVPR52733.2024.01513 (https://doi.org/10.1109%2FCVPR52733.2024.0151
3). ISBN 979-8-3503-5300-6.
17. Gow, Richard; Rabhi, Fethi A.; Venugopal, Srikumar (2018). "Anomaly Detection in Complex
Real World Application Systems" (https://ieeexplore.ieee.org/document/8101009). IEEE
Transactions on Network and Service Management. 15: 83–96.
doi:10.1109/TNSM.2017.2771403 (https://doi.org/10.1109%2FTNSM.2017.2771403).
hdl:1959.4/unsworks_73660 (https://hdl.handle.net/1959.4%2Funsworks_73660).
S2CID 3883483 (https://api.semanticscholar.org/CorpusID:3883483). Retrieved 2023-11-08.
18. Chatterjee, Ayan; Ahmed, Bestoun S. (August 2022). "IoT anomaly detection methods and
applications: A survey" (https://doi.org/10.1016%2Fj.iot.2022.100568). Internet of Things.
19: 100568. arXiv:2207.09092 (https://arxiv.org/abs/2207.09092).
doi:10.1016/j.iot.2022.100568 (https://doi.org/10.1016%2Fj.iot.2022.100568). ISSN 2542-
6605 (https://search.worldcat.org/issn/2542-6605). S2CID 250644468 (https://api.semantics
cholar.org/CorpusID:250644468).
19. Garg, Sahil; Kaur, Kuljeet; Batra, Shalini; Kaddoum, Georges; Kumar, Neeraj; Boukerche,
Azzedine (2020-03-01). "A multi-stage anomaly detection scheme for augmenting the
security in IoT-enabled applications" (https://www.sciencedirect.com/science/article/pii/S016
7739X19319703). Future Generation Computer Systems. 104: 105–118.
doi:10.1016/j.future.2019.09.038 (https://doi.org/10.1016%2Fj.future.2019.09.038).
ISSN 0167-739X (https://search.worldcat.org/issn/0167-739X). S2CID 204077191 (https://a
pi.semanticscholar.org/CorpusID:204077191).
20. Martí, Luis; Sanchez-Pi, Nayat; Molina, José Manuel; Garcia, Ana Cristina Bicharra
(February 2015). "Anomaly Detection Based on Sensor Data in Petroleum Industry
Applications" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4367333). Sensors. 15 (2):
2774–2797. Bibcode:2015Senso..15.2774M (https://ui.adsabs.harvard.edu/abs/2015Senso..
15.2774M). doi:10.3390/s150202774 (https://doi.org/10.3390%2Fs150202774). ISSN 1424-
8220 (https://search.worldcat.org/issn/1424-8220). PMC 4367333 (https://www.ncbi.nlm.nih.
gov/pmc/articles/PMC4367333). PMID 25633599 (https://pubmed.ncbi.nlm.nih.gov/2563359
9).
21. Aljameel, Sumayh S.; Alomari, Dorieh M.; Alismail, Shatha; Khawaher, Fatimah; Alkhudhair,
Aljawharah A.; Aljubran, Fatimah; Alzannan, Razan M. (August 2022). "An Anomaly
Detection Model for Oil and Gas Pipelines Using Machine Learning" (https://doi.org/10.339
0%2Fcomputation10080138). Computation. 10 (8): 138. doi:10.3390/computation10080138
(https://doi.org/10.3390%2Fcomputation10080138). ISSN 2079-3197 (https://search.worldc
at.org/issn/2079-3197).
22. Zimek, Arthur; Filzmoser, Peter (2018). "There and back again: Outlier detection between
statistical reasoning and data mining algorithms" (https://web.archive.org/web/20211114121
638/https://findresearcher.sdu.dk:8443/ws/files/153197807/There_and_Back_Again.pdf)
(PDF). Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 8 (6):
e1280. doi:10.1002/widm.1280 (https://doi.org/10.1002%2Fwidm.1280). ISSN 1942-4787 (h
ttps://search.worldcat.org/issn/1942-4787). S2CID 53305944 (https://api.semanticscholar.or
g/CorpusID:53305944). Archived from the original (https://findresearcher.sdu.dk:8443/ws/file
s/153197807/There_and_Back_Again.pdf) (PDF) on 2021-11-14. Retrieved 2019-12-09.
23. Campos, Guilherme O.; Zimek, Arthur; Sander, Jörg; Campello, Ricardo J. G. B.;
Micenková, Barbora; Schubert, Erich; Assent, Ira; Houle, Michael E. (2016). "On the
evaluation of unsupervised outlier detection: measures, datasets, and an empirical study".
Data Mining and Knowledge Discovery. 30 (4): 891. doi:10.1007/s10618-015-0444-8 (http
s://doi.org/10.1007%2Fs10618-015-0444-8). ISSN 1384-5810 (https://search.worldcat.org/is
sn/1384-5810). S2CID 1952214 (https://api.semanticscholar.org/CorpusID:1952214).
24. Anomaly detection benchmark data repository (http://www.dbs.ifi.lmu.de/research/outlier-ev
aluation/) of the Ludwig-Maximilians-Universität München; Mirror (http://lapad-web.icmc.usp.
br/repositories/outlier-evaluation/) Archived (https://web.archive.org/web/20220331072353/h
ttp://lapad-web.icmc.usp.br/repositories/outlier-evaluation/) 2022-03-31 at the Wayback
Machine at University of São Paulo.
25. Chandola, Varun; Banerjee, Arindam; Kumar, Vipin (2009-07-30). "Anomaly detection: A
survey" (https://dl.acm.org/doi/10.1145/1541880.1541882). ACM Comput. Surv. 41 (3):
15:1–15:58. doi:10.1145/1541880.1541882 (https://doi.org/10.1145%2F1541880.1541882).
ISSN 0360-0300 (https://search.worldcat.org/issn/0360-0300).
26. Knorr, E. M.; Ng, R. T.; Tucakov, V. (2000). "Distance-based outliers: Algorithms and
applications". The VLDB Journal the International Journal on Very Large Data Bases. 8 (3–
4): 237–253. CiteSeerX 10.1.1.43.1842 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi
=10.1.1.43.1842). doi:10.1007/s007780050006 (https://doi.org/10.1007%2Fs00778005000
6). S2CID 11707259 (https://api.semanticscholar.org/CorpusID:11707259).
27. Ramaswamy, S.; Rastogi, R.; Shim, K. (2000). Efficient algorithms for mining outliers from
large data sets. Proceedings of the 2000 ACM SIGMOD international conference on
Management of data – SIGMOD '00. p. 427. doi:10.1145/342009.335437 (https://doi.org/10.
1145%2F342009.335437). ISBN 1-58113-217-4.
28. Angiulli, F.; Pizzuti, C. (2002). Fast Outlier Detection in High Dimensional Spaces. Principles
of Data Mining and Knowledge Discovery. Lecture Notes in Computer Science. Vol. 2431.
p. 15. doi:10.1007/3-540-45681-3_2 (https://doi.org/10.1007%2F3-540-45681-3_2).
ISBN 978-3-540-44037-6.
29. Breunig, M. M.; Kriegel, H.-P.; Ng, R. T.; Sander, J. (2000). LOF: Identifying Density-based
Local Outliers (http://www.dbs.ifi.lmu.de/Publikationen/Papers/LOF.pdf) (PDF). Proceedings
of the 2000 ACM SIGMOD International Conference on Management of Data. SIGMOD.
pp. 93–104. doi:10.1145/335191.335388 (https://doi.org/10.1145%2F335191.335388).
ISBN 1-58113-217-4.
30. Liu, Fei Tony; Ting, Kai Ming; Zhou, Zhi-Hua (December 2008). "Isolation Forest". 2008
Eighth IEEE International Conference on Data Mining (https://www.computer.org/csdl/proce
edings/icdm/2008/3502/00/3502a413-abs.html). pp. 413–422. doi:10.1109/ICDM.2008.17 (h
ttps://doi.org/10.1109%2FICDM.2008.17). ISBN 9780769535029. S2CID 6505449 (https://a
pi.semanticscholar.org/CorpusID:6505449).
31. Liu, Fei Tony; Ting, Kai Ming; Zhou, Zhi-Hua (March 2012). "Isolation-Based Anomaly
Detection" (https://www.researchgate.net/publication/239761771). ACM Transactions on
Knowledge Discovery from Data. 6 (1): 1–39. doi:10.1145/2133360.2133363 (https://doi.org/
10.1145%2F2133360.2133363). S2CID 207193045 (https://api.semanticscholar.org/CorpusI
D:207193045).
32. Schubert, E.; Zimek, A.; Kriegel, H. -P. (2012). "Local outlier detection reconsidered: A
generalized view on locality with applications to spatial, video, and network outlier
detection". Data Mining and Knowledge Discovery. 28: 190–237. doi:10.1007/s10618-012-
0300-z (https://doi.org/10.1007%2Fs10618-012-0300-z). S2CID 19036098 (https://api.sema
nticscholar.org/CorpusID:19036098).
33. Kriegel, H. P.; Kröger, P.; Schubert, E.; Zimek, A. (2009). Outlier Detection in Axis-Parallel
Subspaces of High Dimensional Data. Advances in Knowledge Discovery and Data Mining.
Lecture Notes in Computer Science. Vol. 5476. p. 831. doi:10.1007/978-3-642-01307-2_86
(https://doi.org/10.1007%2F978-3-642-01307-2_86). ISBN 978-3-642-01306-5.
34. Kriegel, H. P.; Kroger, P.; Schubert, E.; Zimek, A. (2012). Outlier Detection in Arbitrarily
Oriented Subspaces. 2012 IEEE 12th International Conference on Data Mining. p. 379.
doi:10.1109/ICDM.2012.21 (https://doi.org/10.1109%2FICDM.2012.21). ISBN 978-1-4673-
4649-8.
35. Fanaee-T, H.; Gama, J. (2016). "Tensor-based anomaly detection: An interdisciplinary
survey" (http://repositorio.inesctec.pt/handle/123456789/5381). Knowledge-Based Systems.
98: 130–147. doi:10.1016/j.knosys.2016.01.027 (https://doi.org/10.1016%2Fj.knosys.2016.0
1.027). S2CID 16368060 (https://api.semanticscholar.org/CorpusID:16368060).
36. Zimek, A.; Schubert, E.; Kriegel, H.-P. (2012). "A survey on unsupervised outlier detection in
high-dimensional numerical data". Statistical Analysis and Data Mining. 5 (5): 363–387.
doi:10.1002/sam.11161 (https://doi.org/10.1002%2Fsam.11161). S2CID 6724536 (https://ap
i.semanticscholar.org/CorpusID:6724536).
37. Schölkopf, B.; Platt, J. C.; Shawe-Taylor, J.; Smola, A. J.; Williamson, R. C. (2001).
"Estimating the Support of a High-Dimensional Distribution". Neural Computation. 13 (7):
1443–71. CiteSeerX 10.1.1.4.4106 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.
1.1.4.4106). doi:10.1162/089976601750264965 (https://doi.org/10.1162%2F089976601750
264965). PMID 11440593 (https://pubmed.ncbi.nlm.nih.gov/11440593). S2CID 2110475 (htt
ps://api.semanticscholar.org/CorpusID:2110475).
38. Hawkins, Simon; He, Hongxing; Williams, Graham; Baxter, Rohan (2002). "Outlier Detection
Using Replicator Neural Networks". Data Warehousing and Knowledge Discovery. Lecture
Notes in Computer Science. Vol. 2454. pp. 170–180. CiteSeerX 10.1.1.12.3366 (https://cites
eerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.3366). doi:10.1007/3-540-46145-0_17 (htt
ps://doi.org/10.1007%2F3-540-46145-0_17). ISBN 978-3-540-44123-6. S2CID 6436930 (htt
ps://api.semanticscholar.org/CorpusID:6436930).
39. An, J.; Cho, S. (2015). "Variational autoencoder based anomaly detection using
reconstruction probability" (http://dm.snu.ac.kr/static/docs/TR/SNUDM-TR-2015-03.pdf)
(PDF). Special Lecture on IE. 2 (1): 1–18. SNUDM-TR-2015-03.
40. Malhotra, Pankaj; Vig, Lovekesh; Shroff, Gautman; Agarwal, Puneet (22–24 April 2015).
Long Short Term Memory Networks for Anomaly Detection in Time Series (https://www.rese
archgate.net/publication/304782562). ESANN 2015: 23rd European Symposium on Artificial
Neural Networks, Computational Intelligence and Machine Learning. pp. 89–94. ISBN 978-
2-87587-015-5.
41. Hubert, Mia; Debruyne, Michiel; Rousseeuw, Peter J. (2018). "Minimum covariance
determinant and extensions" (https://doi.org/10.1002%2Fwics.1421). WIREs Computational
Statistics. 10 (3). arXiv:1709.07045 (https://arxiv.org/abs/1709.07045).
doi:10.1002/wics.1421 (https://doi.org/10.1002%2Fwics.1421). ISSN 1939-5108 (https://sea
rch.worldcat.org/issn/1939-5108). S2CID 67227041 (https://api.semanticscholar.org/CorpusI
D:67227041).
42. Hubert, Mia; Debruyne, Michiel (2010). "Minimum covariance determinant" (https://onlinelibr
ary.wiley.com/doi/abs/10.1002/wics.61). WIREs Computational Statistics. 2 (1): 36–43.
doi:10.1002/wics.61 (https://doi.org/10.1002%2Fwics.61). ISSN 1939-0068 (https://search.w
orldcat.org/issn/1939-0068). S2CID 123086172 (https://api.semanticscholar.org/CorpusID:1
23086172).
43. Alzubaidi, Laith; Zhang, Jinglan; Humaidi, Amjad J.; Al-Dujaili, Ayad; Duan, Ye; Al-Shamma,
Omran; Santamaría, J.; Fadhel, Mohammed A.; Al-Amidie, Muthana; Farhan, Laith (2021-
03-31). "Review of deep learning: concepts, CNN architectures, challenges, applications,
future directions" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8010506). Journal of Big
Data. 8 (1): 53. doi:10.1186/s40537-021-00444-8 (https://doi.org/10.1186%2Fs40537-021-0
0444-8). ISSN 2196-1115 (https://search.worldcat.org/issn/2196-1115). PMC 8010506 (http
s://www.ncbi.nlm.nih.gov/pmc/articles/PMC8010506). PMID 33816053 (https://pubmed.ncbi.
nlm.nih.gov/33816053).
44. Belay, Mohammed Ayalew; Blakseth, Sindre Stenen; Rasheed, Adil; Salvo Rossi, Pierluigi
(January 2023). "Unsupervised Anomaly Detection for IoT-Based Multivariate Time Series:
Existing Solutions, Performance Analysis and Future Directions" (https://www.ncbi.nlm.nih.g
ov/pmc/articles/PMC10007300). Sensors. 23 (5): 2844. Bibcode:2023Senso..23.2844B (http
s://ui.adsabs.harvard.edu/abs/2023Senso..23.2844B). doi:10.3390/s23052844 (https://doi.or
g/10.3390%2Fs23052844). ISSN 1424-8220 (https://search.worldcat.org/issn/1424-8220).
PMC 10007300 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10007300).
PMID 36905048 (https://pubmed.ncbi.nlm.nih.gov/36905048).
45. Jeong, Jongheon; Zou, Yang; Kim, Taewan; Zhang, Dongqing; Ravichandran, Avinash;
Dabeer, Onkar (June 2023). "WinCLIP: Zero-/Few-Shot Anomaly Classification and
Segmentation" (https://doi.org/10.1109/cvpr52729.2023.01878). 2023 IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. pp. 19606–19616.
arXiv:2303.14814 (https://arxiv.org/abs/2303.14814). doi:10.1109/cvpr52729.2023.01878 (ht
tps://doi.org/10.1109%2Fcvpr52729.2023.01878). ISBN 979-8-3503-0129-8.
46. Liu, Zhenzhen; Zhou, Jin Peng; Weinberger, Kilian Q. (2024-05-09). "Leveraging diffusion
models for unsupervised out-of-distribution detection on image manifold" (https://www.ncbi.n
lm.nih.gov/pmc/articles/PMC11112019). Frontiers in Artificial Intelligence. 7.
doi:10.3389/frai.2024.1255566 (https://doi.org/10.3389%2Ffrai.2024.1255566). ISSN 2624-
8212 (https://search.worldcat.org/issn/2624-8212). PMC 11112019 (https://www.ncbi.nlm.ni
h.gov/pmc/articles/PMC11112019). PMID 38783869 (https://pubmed.ncbi.nlm.nih.gov/3878
3869).
47. He, Z.; Xu, X.; Deng, S. (2003). "Discovering cluster-based local outliers". Pattern
Recognition Letters. 24 (9–10): 1641–1650. Bibcode:2003PaReL..24.1641H (https://ui.adsa
bs.harvard.edu/abs/2003PaReL..24.1641H). CiteSeerX 10.1.1.20.4242 (https://citeseerx.ist.
psu.edu/viewdoc/summary?doi=10.1.1.20.4242). doi:10.1016/S0167-8655(03)00003-5 (http
s://doi.org/10.1016%2FS0167-8655%2803%2900003-5).
48. Campello, R. J. G. B.; Moulavi, D.; Zimek, A.; Sander, J. (2015). "Hierarchical Density
Estimates for Data Clustering, Visualization, and Outlier Detection". ACM Transactions on
Knowledge Discovery from Data. 10 (1): 5:1–51. doi:10.1145/2733381 (https://doi.org/10.11
45%2F2733381). S2CID 2887636 (https://api.semanticscholar.org/CorpusID:2887636).
49. Lazarevic, A.; Kumar, V. (2005). "Feature bagging for outlier detection". Proceedings of the
eleventh ACM SIGKDD international conference on Knowledge discovery in data mining.
pp. 157–166. CiteSeerX 10.1.1.399.425 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi
=10.1.1.399.425). doi:10.1145/1081870.1081891 (https://doi.org/10.1145%2F1081870.1081
891). ISBN 978-1-59593-135-1. S2CID 2054204 (https://api.semanticscholar.org/CorpusID:
2054204).
50. Nguyen, H. V.; Ang, H. H.; Gopalkrishnan, V. (2010). Mining Outliers with Ensemble of
Heterogeneous Detectors on Random Subspaces. Database Systems for Advanced
Applications. Lecture Notes in Computer Science. Vol. 5981. p. 368. doi:10.1007/978-3-642-
12026-8_29 (https://doi.org/10.1007%2F978-3-642-12026-8_29). ISBN 978-3-642-12025-1.
51. Kriegel, H. P.; Kröger, P.; Schubert, E.; Zimek, A. (2011). Interpreting and Unifying Outlier
Scores. Proceedings of the 2011 SIAM International Conference on Data Mining. pp. 13–24.
CiteSeerX 10.1.1.232.2719 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.232.
2719). doi:10.1137/1.9781611972818.2 (https://doi.org/10.1137%2F1.9781611972818.2).
ISBN 978-0-89871-992-5.
52. Schubert, E.; Wojdanowski, R.; Zimek, A.; Kriegel, H. P. (2012). On Evaluation of Outlier
Rankings and Outlier Scores. Proceedings of the 2012 SIAM International Conference on
Data Mining. pp. 1047–1058. doi:10.1137/1.9781611972825.90 (https://doi.org/10.1137%2F
1.9781611972825.90). ISBN 978-1-61197-232-0.
53. Zimek, A.; Campello, R. J. G. B.; Sander, J. R. (2014). "Ensembles for unsupervised outlier
detection". ACM SIGKDD Explorations Newsletter. 15: 11–22.
doi:10.1145/2594473.2594476 (https://doi.org/10.1145%2F2594473.2594476).
S2CID 8065347 (https://api.semanticscholar.org/CorpusID:8065347).
54. Zimek, A.; Campello, R. J. G. B.; Sander, J. R. (2014). Data perturbation for outlier detection
ensembles. Proceedings of the 26th International Conference on Scientific and Statistical
Database Management – SSDBM '14. p. 1. doi:10.1145/2618243.2618257 (https://doi.org/1
0.1145%2F2618243.2618257). ISBN 978-1-4503-2722-0.
55. Goldstein, Markus; Dengel, Andreas (2012). "Histogram-based Outlier Score (HBOS): A fast
Unsupervised Anomaly Detection Algorithm" (https://www.goldiges.de/publications/HBOS-KI
-2012.pdf) (PDF). Personal page of Markus Goldstein. (Poster only at KI 2012 conference,
not in proceedings)
56. Zhao, Yue; Nasrullah, Zain; Li, Zheng (2019). "Pyod: A python toolbox for scalable outlier
detection" (https://www.jmlr.org/papers/volume20/19-011/19-011.pdf) (PDF). Journal of
Machine Learning Research. 20. arXiv:1901.01588 (https://arxiv.org/abs/1901.01588).
57. "FindAnomalies" (https://reference.wolfram.com/language/ref/FindAnomalies.html).
Mathematica documentation.
Retrieved from "https://en.wikipedia.org/w/index.php?title=Anomaly_detection&oldid=1272858300"