Machine Learning For Anomaly Detection in Particle Phy - 2024 - Reviews in Physi
Machine Learning For Anomaly Detection in Particle Phy - 2024 - Reviews in Physi
Reviews in Physics
journal homepage: www.elsevier.com/locate/revip
Keywords: The detection of out-of-distribution data points is a common task in particle physics. It is used
Anomaly detection for monitoring complex particle detectors or for identifying rare and unexpected events that
Outlier detection may be indicative of new phenomena or physics beyond the Standard Model. Recent advances
Particle physics
in Machine Learning for anomaly detection have encouraged the utilization of such techniques
Quantum machine learning
on particle physics problems. This review article provides an overview of the state-of-the-art
Model-independent
techniques for anomaly detection in particle physics using machine learning. We discuss the
challenges associated with anomaly detection in large and complex data sets, such as those
produced by high-energy particle colliders, and highlight some of the successful applications of
anomaly detection in particle physics experiments.
1. Introduction
Anomaly detection plays an important role in various scientific disciplines, aiding in the discovery of rare and unusual events
that deviate significantly from the norm. I the context of high energy physics (HEP), two primary types of anomaly detection are
used: outlier detection and finding over-densities. Outlier detection focuses on the identification of unusual or unexpected events
that stand out from the norm. These outliers are typically found in the tails of distributions, representing rare occurrences. In HEP,
the search for outliers becomes particularly crucial as it unveils exceptional phenomena or anomalies that hold valuable insights into
the fundamental nature of particle interactions. Detecting over-densities in data involves a slightly different approach to anomaly
detection. This type of anomaly detection can be seen as analogous to the traditional bump hunt, where one looks for a localized
excesses in data points compared to the expected distribution. These over-densities, or resonances, may indicate the presence of
new particles or unexpected physical processes. Traditional rule-based approaches for anomaly detection can be limited by the
complexity and variability of the data, making it challenging to define rules that cover all possible scenarios. As a result, machine
learning (ML) techniques have gained popularity as a more flexible and powerful approach for performing anomaly detection [1].
ML-based anomaly detection methods have especially been gaining popularity in particle physics as a way of extracting potential
new physics signals in a model-agnostic way, by rephrasing the problem as an out-of-distribution detecting task [2]. In this context,
model-agnostic refers to assuming no, or at least minimal, prior information regarding the physical model describing the new-physics
phenomena. A typical search for new physics signatures involves looking for a specific signal and maximizing the analysis sensitivity
for that single model. This analysis is not useful to investigate other new physics models. In an anomaly detection driven search,
however, the aim is to be model-agnostic and only look for deviations from the background. This is less sensitive to any model that
is biased to a specific signal, yet, it enables the simultaneous search of multiple new physics scenarios. An additional advantage of
anomaly detection is that it allows algorithms to undergo direct training on unlabeled data. This has generated substantial interest
in the physics community, resulting in several community challenges on anomaly detection driven searches for new physics [2–5].
∗ Corresponding author.
E-mail addresses: [email protected] (V. Belis), [email protected] (P. Odagiu), [email protected] (T.K. Aarrestad).
https://doi.org/10.1016/j.revip.2024.100091
Received 31 October 2023; Received in revised form 19 December 2023; Accepted 16 January 2024
Available online 18 January 2024
2405-4283/© 2024 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
V. Belis et al. Reviews in Physics 12 (2024) 100091
This review paper aims to provide an overview of the various machine learning techniques used for anomaly detection in particle
physics, including their strengths and limitations. We will focus on its usage in high energy particle physics, but several of the
techniques discussed generalize to neutrino physics, astro-particle physics, and gravitational wave detection. Our main focus will be
on the usage of anomaly detection as a means of discovering new physics, but we will also discuss its usage in system monitoring.
A more exhaustive summary of outlier detection can be found in Refs. [6,7]. This review is organized as follows. First, we will give
a brief introduction to training methodologies in Section 2. This is followed by an overview over how anomaly detection is used
for model-independent searches in high energy physics experiments in Section 3, including recent results. In Section 4 we discuss
anomaly detection for triggering and in Section 5 we briefly discuss usecases in detector monitoring. Finally, in Section 6, we discuss
anomaly detection in the emerging field of quantum machine learning.
Particle physics data is unique in that it inherently cannot be labeled in the same way that, for instance, images can.
Fundamentally, every data sample can be a signal process, a background process or a quantum mechanical superposition of the
two. Consequently, typical deep learning setups where a loss function defined in terms of predicted versus true labels, and which is
minimized over some dataset containing such true labels, is impossible. In order to account for this, highly accurate simulation
of physical processes exist. This simulated data acts as a labeled surrogate of the real processes measured in particle physics
experiments.
For the training of deep neural networks in a supervised manner in particle physics, simulated labeled data must be used. To utilize
the vast amount of unlabeled data, and overcome difficulties related to differences between simulation and data, other strategies
such as weakly- or semi-supervised training paradigms have been developed. In semi-supervised learning, a small amount of labeled
data is combined with a large amount of unlabeled data at training time. In weakly-supervised learning, noisy or imprecise sources
are used to label the training data. One can also take advantage of self-supervised methods. Here, one takes advantage of underlying
structure in the data to obtain supervisory signals from the data itself. This is for instance the case for an autoencoder, a model that is
trained to compress and decompress the input, and uses the difference between the original and the reconstructed input to compute
a reconstruction error. The original input then serves as the label that the reconstructed input is trying to target. Self-supervised and
un-supervised methods have been gaining significant popularity in particle physics, not only due to the large amount of unlabeled
data available, but also as a way of reducing the high degree of model-dependence introduced by using simulated data do define
search strategies. These training paradigms are increasingly applied to natural language processing and computer vision, aimed at
harnessing the growing reservoir of unlabeled data accessible on the internet. This surge has resulted in the emergence of innovative
and effective approaches, which can be adapted and employed within the physical sciences. In the following, we will mainly focus on
self-, semi- and weakly- supervised methods although supervised approaches to anomaly detection in particle physics also exist [8].
Searching for physics beyond the Standard Model is one of the most important aspects of the physics program at the Large
Hadron Collider (LHC). Since the start of proton–proton collisions at the LHC in 2011, the ATLAS [9] and CMS [10] Collaborations
have derived stringent bounds on a range of new physics signatures, pushing the allowed mass range for many postulated new
particles far into the TeV scale. While it is possible that these particles have yet to be observed because they are too heavy to be
produced at the LHC, or have to small cross section to be detected with the current data size, it could also be that new particles are
kinematically accessible and produced at observable rates, but our current methods of detection prevent their discovery.
Searches for new physics processes at particle colliders are usually performed as blind searches. Such searches proceed by defining
a region of interest in the parameter space, using simulated data of the signal and the Standard Model background processes in
order to enhance the data purity. The data is only looked at in the very end where it is tested for the presence of signal through a
simultaneous fit of the signal and background probability distributions, hoping to extract a non-zero signal component.
Hundreds of such searches have been performed for hundreds of different potential new particles, but thus far none have been
discovered. Despite this, there are still regions of the data that have not yet been probed for the presence of a signal. This has led
to an increased interest in more model-agnostic search strategies. Model-independent searches is nothing new in high energy particle
physics, and strategies relying less on a signal hypothesis have been devised and utilized [11–25]. These mainly take advantage of
Monte Carlo simulation, and use this to compare distributions in the observed data to simulation across several observables and
many histogram bins. The drawback of this methodology is that one needs to rely on accurate simulation, and also that, due to the
vast size of the parameter space being searched, an observation that appears statistically significant could potentially be the result
of a statistical fluctuation.
In the following, we discuss machine learning techniques which mitigate some of these challenges and have the potential to
improve and extend model-independent searches.
2
V. Belis et al. Reviews in Physics 12 (2024) 100091
In order to train the most powerful ML-based classifier to discriminate signal from background, one would ideally train a network
in a supervised manner with labeled data. This relies on a signal hypothesis that is chosen a-priori. An early attempt at discriminating
background from ‘‘everything else’’ in order to obtain some degree of model-independence, was demonstrated in Ref. [8]. Targeting
searches for new physics in hadronic final states, a classifier was trained to discriminate QCD jets from various potential signal jets
using Monte Carlo simulation. The disadvantage of such an approach is the dependence on signal simulation and which signals are
to be included in the training. Although simulated particle physics data is highly accurate over several orders of magnitude in length
scale, simulation is known to not fully accurately reproduce collider data and this disagreement affects the tagging performance.
Using weakly- or self-supervised (see Section 3.2.1) methods, algorithms can be trained directly on the data itself which has the
added benefit of not having to derive transfer factors when training on synthetic data and testing on real data.
where 𝑝sig (𝑥) is the probability density of signal, it would be monotonically related to the signal-to-background LR for any signal
present in the data. A strategy for learning a good approximation of the likelihood ratio 𝑅(𝑥), is to train a classifier between data
from a signal enriched region and samples drawn from a (fully data-driven) background model. If the background model is accurate
and the classifier is well-trained, this approaches the likelihood ratio 𝑅(𝑥) by the Neyman–Pearson Lemma [26].
Hence, the aim is to test whether the signal region data contains a combination of signal and background data. In the event
that there are signal events present in the signal region, the classifier can differentiate between the signal region data and the
background template. The true signal events are expected to have higher classification scores than the true background data. A cut
on this classifier score can then be used to enhance the significance of signal events, making it a useful anomaly detection metric.
In Ref. [27] a method referred to as Learning from Label Proportions [28] was utilized to discriminate between quarks and
gluons using impure data samples. Despite not having access to the per-instance labels, the class proportions could be derived using
domain knowledge. A supervised task was then defined using the class proportions themselves as the target, although operating the
algorithm at a per-instance level. This concept has been extended in the Classification WithOut Labels (CWola) [29] framework.
In this setup, the class proportions themselves do not need to be known, and it is enough to have two datasets at hand with an
unequal fraction of signal instances in each set. A standard classifier can then be trained to discriminate between the two mixed
datasets, and this can be shown to be the optimal classifier to discriminate between signal and background instances. The larger the
difference in signal fraction between is dataset, the better the classifier becomes. The challenge is being able to design such mixed
datasets, especially for a model-independent setup.
The CWola strategy has been demonstrated and deployed for various model-independent search setups. In Ref. [30], the authors
introduce the CWola bumphunt. In this setup, one attempts to look for new, heavy generic particles that resonate around the particle
mass in the dijet invariant mass spectrum. Starting from the weak assumption that this is a localized, narrow resonance, two mixed
samples are created in the following way: The region in the dijet invariant mass close to the particle mass is defined as the signal-
enriched mixed sample, and the regions next to it are defined as background-enriched regions. This is illustrated in Fig. 1. In this
way, the dijet invariant mass sideband regions serve as the background samples; these can serve as a good model for the background
if the input features are statistically independent from the dijet invariant mass. If there is a signal present in the signal-like region
around the particle mass, the classifier learns to identify it, while in the absence of a signal the classifier will likely learn random
noise as there would be no difference between the two groups of events. It is crucial that the features being used for classification
are not correlated with the dijet invariant mass. Otherwise, the classifier will be able to differentiate background events in the
signal region from those in the adjacent dijet invariant mass regions used as the background-enriched mixed sample. Background
events within the signal region will then be classified as signal-like, which can introduce artificial sculpting of the dijet invariant
mass distribution. Note that the above strategy only works for narrow resonances, if there is a significant amount of signal in both
datasets, as would be the case for a broad resonance, the classification performance is reduced. This method was used to analyze
data collected by the ATLAS experiment in the search for generic new heavy resonances decaying into jets in Ref. [25], a first of
its kind using weak supervision for model-agnostic searches. The power of this analysis can be seen in Fig. 2. This plot shows 95%
confidence level upper limits on the cross section for a wide range of different signal models. The results are compared to those
of a generic dijet search and to dedicated searches, when these exist. This demonstrates that utilizing powerful anomaly detection
techniques like CWola, one can very efficiently search for a wide range of potential New Physics scenarios using a single method.
3
V. Belis et al. Reviews in Physics 12 (2024) 100091
Fig. 1. Weakly supervised density estimation techniques like CWola, CATHODE and Tag’N’Train take advantage of the fact that the signal (blue) can be localized
in a specific region of phase space (SR). One can then use data sidebands (SB) to either estimate the background distribution (red) in the signal region, or to
train a weakly supervised classifier (CWoLa) between SR and SB.
Source: Figure from Ref. [31].
Fig. 2. 95% confidence level upper limits on the crossection for a wide range of different signal models using the CWola bumphunt method [25].
This methodology can also be applied in other setups than for a dijet bumphunt. In Ref. [32], model-agnostic learning using the
CWola method is harnessed in order to improve the sensitivity of searches for new physics models with anomalous jet dynamics
and a mono-jet signature. Focusing on cases where a heavy new particle decays into two jets which hadronize partially in the dark
sector (making them semi-visible jets), and where one of the jets become completely invisible and the other partially visible, anomaly
detection is utilized to detect the semi-visible anomalous jet. The degree of visibility of this jet can vary, making it difficult to train a
supervised algorithm for each visibility fraction. Rather, CWola is deployed to train a generic anomalous jet identification classifier.
The dominant background for a mono jet search is the electroweak production of vector bosons and jets, where the vector boson
further decays to neutrinos, 𝑍(𝜈𝜈)+jets. The experimental signature is missing transverse energy and a jet, mimicking the signal
signature. Taking advantage of the fact that the vector boson also can decay visibly into two leptons and in these cases the jet
remains the same, a background enriched control CWola sample can be defined using 𝑍(𝓁𝓁)+jets events. None of the signal should
be present in events with a di-lepton and jet signature. A model-independent anomalous jet tagger is then trained supervised to
discriminate between jets coming from a (𝓁𝓁) + 𝑗𝑒𝑡 and a (𝜈𝜈)+jet sample. If the monojet signature is present, CWola guarantees that
the best algorithm trained to distinguish between these two regions, is also the best algorithm to discriminate between a normal
SM jet from the V+jet background, and a semi-visible anomalous jet. This illustrates how generally CWola can be used. The only
requirement is that one is able to define regions of the data depleted and enriched in signal, and that the signal and background
4
V. Belis et al. Reviews in Physics 12 (2024) 100091
events are statistically equal in the two regions. In terms of model independence, some degree of signal assumption is needed in
order to define appropriate mixed samples.
Methods can also be used to bootstrap CWola and further improve the classification performance. In [33], a powerful and model-
independent anomalous jet tagger is defined starting from the CWola hunting methodology, but defining the mixed samples for
training differently. Targeting signals where both jets in the event are anomalous, the key idea is that for a resonance decaying to
a pair of anomalous jets, one can use an initial self- or supervised classifier (like an autoencoder, see Section 3.2.1) to tag an event
as signal-like or background-like using one jet and then use that information to construct samples for training a classifier using
the other jet with weak supervision. By using an autoencoder as an initial classifier, one can group events into a signal-like and
background-like sample based on the anomaly score on one of the jets (assuming that if the one jet is anomalous, the other must
be too). A classifier can then be trained for the other jet that has not been tagged, where the mixed samples are defined based on
the anomaly score of the tagged jet.
Another weakly-supervised method for over-density detection is ANODE [34]. In ANODE, conditional neural density estimation
is used in order to interpolate probability densities from a data sideband into the data signal region. This interpolation is used
and compared to the probability density of the actual data observed in the signal region, and used to construct a likelihood ratio
as in Eq. (1). This implies having to learn both the interpolated likelihood of the background in the signal region, as well as the
likelihood for data in the signal region (see Fig. 1). An improvement on this method is CATHODE (Classifying Anomalies Through
Outer Density Estimation) [31]. In CATHODE, rather than directly constructing the likelihood ratio, one rather samples events from
the trained background estimator after it is interpolated into the signal region. This avoids having to learn the likelihood of data in
the signal region. Then, a classifier is trained to discriminate data in the signal region from the data samples from the interpolated
density estimator. This algorithm was first demonstrated for searches for heavy particles decaying into two jets. CATHODE proceeds
by first training a conditional normalizing flow [35] on the dijet invariant mass sidebands and then interpolating this into the signal
region; samples from this flow are used as the background model and should correctly take into account any correlations between
the input features and the dijet invariant mass.
Normalizing flows are a type of generative model that learn to transform a simple probability distribution (usually a standard
Gaussian distribution) into a more complex distribution that resembles the target distribution of the data. This is achieved by defining
a sequence of invertible transformations that map samples from the simple distribution to samples from the target distribution. The
resulting model can be used to generate new data samples, perform density estimation, and compute likelihoods. Invertibility is
important, as it ensures that the transformation has a well-defined inverse, which is needed for density estimation and likelihood
computation. The key challenge in designing normalizing flows is to ensure that the resulting distribution is both complex enough
to capture the target distribution and easy to work with, in the sense that likelihood computation and sampling are efficient.
Recent work has focused on designing more expressive and flexible transformations, such as coupling layers, which allow for the
transformation to depend on only a subset of the input variables [36]. CATHODE utilizes such a normalizing flow to estimate the
background density, conditioned on the dijet invariant mass. The density can then be interpolated into the dijet invariant mass
signal region, while accounting for all correlations between the input features. Finally, a classifier is trained to distinguish between
the artificially generated background samples from the normalizing flow (trained in data sidebands) and actual samples from the
data signal region, yielding an estimate of the likelihood ratio as an anomaly metric (following the CWola paradigm).
A similar method is CURTAINS [37]. This method also takes advantage of a conditioned invertible neural network to learn
the distribution of background events in a sideband and then use that to transform datapoints to those of the target distribution
in the signal region. CURTAINs use an optimal transport loss to train the network to minimize the distance between the model
output and the target data. The goal is to approximate the optimal transport function between two points in feature space when
moving along the resonant spectrum. As a result, instead of generating new samples to create a background template, CURTAINs
transforms the data in the side-bands to equivalent data points with a mass in the signal region. This approach eliminates the
need to match data encodings to an intermediate prior distribution, which is the case of CATHODE, as it can lead to mismodelling
of underlying correlations between the observables in the data if the trained posterior is not in perfect agreement with the prior
distribution. Additionally, CURTAINs can also be employed to transform side-band data into validation regions, rather than simply
constructing the background template in the signal region, making the algorithm easier to validate and test. Once the CURTAINs
density estimation algorithm has been trained, a similar approach as in CATHODE is taken. Specifically, the transformed data (from
sideband to signal region) is assumed to represent a sample of pure background events, while the signal region data represents a
mixture of signal and background. A CWola classifier is trained to discriminate between the two datasets based on this assumption.
In Ref. [38,39], an improvement of the CURTAINs technique is introduced, where a maximum likelihood estimation is used
instead of an optimal transport loss. This improves the fidelity of the transformed data and is significantly faster and easier to train.
More recently, diffusion models [40], emerging as potent tools for high-dimensional density estimation, have been explored both
for overdensity estimation [41] and for outlier detection [42].
There are also weakly supervised methods that take advantage of simulation in the training of density estimators. In Simulation
Assisted Likelihood-free Anomaly Detection (SALAD), a reweighting function for reweighting simulation to match data in the data
sidebands is trained. This (parametrized) reweighting function is then interpolated into the signal region. Finally, a classifier
to discriminate between the two is trained to get the likelihood ratio. Another simulation-assisted technique is Flow-enhanced
transportation for anomaly detection (FETA) [43], a mixture of SALAD and CURTAINS. A normalizing flow is trained in the sideband
to map MC simulation to data. This learned flow is then applied to simulation in the signal region to obtain an approximation of
the background.
5
V. Belis et al. Reviews in Physics 12 (2024) 100091
There are caveats when deploying weakly supervised methods. Asymptotically, a weakly supervised classifier will converge to
the performance of a fully supervised one. But in practice, performance typically degrades with smaller samples sizes available for
training and lower fractions of signal events in the data sample. However, one can still obtain signal versus background classifiers
with reasonable performance even with signal fractions well below 1%. Biases in the background model can also lead to degraded
performance; if the classifier is able to distinguish between the background events in the two samples it will learn to encode that
difference instead of learning to identify signal events.
The above methods focus on detecting new physics as overdensities in very specific regions of the kinematic phase space; this
paradigm is similar to a traditional bump hunt, often performed in HEP searches for novel particles. However, new physics signatures
are equally likely to manifest themselves as unexpected events in the tail of distributions. This type of events may be identified using
out-of-distribution detection algorithms. The prime example of such an algorithm is the auto-encoder [44–46], which is especially
popular in high-energy physics applications [47–55].
where 𝑥 is the input data, 𝑧 is the latent space data, and 𝜃 are the weights of the decoder. This reconstruction loss is propagated
through both the decoder and the encoder. Thus, the latent space and the reconstructed data evolve simultaneously.
The extent to which the auto-encoder latent space follows a statistical distribution is referred to as the latent space regularity. The
latent space of the standard auto-encoder does not follow any particular distribution. The regularity of the standard AE depends
on the input features, the dimension of the latent space, and the encoder architecture. Thus, the encoder will shape the latent
space such that it facilitates the reconstruction task, thus minimizing the MSE loss from Eq. (3). In contrast, the Variational Auto-
encoder (VAE) [57] is an extension of the conventional auto-encoder described above, which models the latent representation to
approximate a given probability distribution. This is typically a Gaussian distribution, described by a mean and a variance; however,
many alternatives exist [58–62], and the choice of latent space distribution ultimately depends on the task.
The main idea of variational inference is to define a parametrized family of distributions and to search within it for the best
approximation of the chosen prior distribution. The ‘‘best approximation’’ is defined as the element of the aforementioned family of
distributions that minimizes a pre-defined function that measures the dissimilarity between the trial approximation and the prior.
The function that is most commonly employed for this task is the Kullback–Leibler [63] (KL) divergence, defined as
1 ∑( )
DKL (𝜇,
⃗ 𝜎⃗ ) = − log(𝜎𝑖2 ) − 𝜎𝑖2 − 𝜇𝑖2 + 1 , (4)
2 𝑖
for the specific case of comparing a parametrized Gaussian distribution N(𝜇, ⃗ 𝜎⃗ ) with N(1, 0). A broader discussion on the KL
divergence is found in Ref. [64]. Note that the KL divergence is a somewhat unstable dissimilarity metric. Hence, more robust
alternatives exist, such as the Wasserstein distance, which led to the creation of an AE architecture with the same name [65].
Variations on the Wasserstein AE have also been applied in a high-energy physics context [66,67].
The VAE loss consists of two components: the reconstruction loss, conventionally the MSE, and the KL divergence term. The
latter encourages the VAE to produce a latent space that follows a well-defined prior distribution, regularizing the model. Thus, the
VAE loss can be written schematically as
where MSE labels the reconstruction loss, DKL is the KL regularization term, and 𝛽 ∈ [0, 1] is a hyperparameter that balances the
effect of the two loss components.
The weakly supervised methods from the previous sections aim to learn the likelihood ratio and thus can identify anomalies. In
contrast, self-supervised models only learn the probability density of the background. Hence, an event may be labeled as anomalous
if its probability to be associated with the learned latent distribution is very low. Additionally, the learned distribution exists in a
lower dimensional embedded space. This stops the model from memorizing the input and is a form of lossy compression. Therefore,
6
V. Belis et al. Reviews in Physics 12 (2024) 100091
Fig. 3. An auto-encoder is trained to encode the input to a lower dimensional embedded space, and then decode it again in order to reconstruct the original
input (top). The difference between the input and the output can be used as an anomaly score (bottom).
Fig. 4. The output, or anomaly score, of an auto-encoder trained on data collected by the ATLAS Experiment and evaluated on data (black) and a range of
benchmark BSM signals [73].
the model is generally capable of reconstructing events it is frequently exposed to during its training, but it fails at reconstructing
events that are rare in the training set. The difference between the input data and its reconstructed counterpart may then be used to
define an anomaly score: a high MSE is expected for anomalous data and a low MSE is expected for typical events. An illustration
of this paradigm is shown in Fig. 3. There exist several studies in HEP where AEs and VAEs are used for detecting new physics as
outliers in the data [62,68–72]. For example, this type of workflow was used to search for new physics in the two-body invariant
mass spectrum of two jets or a jet and a lepton with the ATLAS Experiment in Ref. [73]. Therein, a selection on an auto-encoder
output is used to suppress the background and define signal regions with a high signal-to-background ratio. The auto-encoder output
for data and for a range of potential new physics signatures is shown in Fig. 4.
As mentioned in the beginning of this section, autoencoders are efficient for event-by-event outlier detection and are not expected
to perform well in finding overdensities. This makes them complimentary to the weakly supervised methods. Furthermore, an
additional problem that auto-encoders have is discussed in Ref. [74]. In the aforementioned work it is demonstrated that the
connection between large MSE and anomalies is not completely clear: for data sets with a nontrivial topology, there will always
be points that wrongly are classified as anomalous. Conventionally, this can be mitigated by using VAEs and classifying anomalous
events using the regularized latent space. An alternative method of circumventing this issue is based on the so called normalized
7
V. Belis et al. Reviews in Physics 12 (2024) 100091
Fig. 5. Dijet mass distribution for the cluster with the highest signal-over-background ratio (right) and for the cluster closest to it (left). The background
component in the signal region can be modeled from the closest cluster [81].
AE [75], which is located at the boundary between self-supervised and unsupervised learning. This newer type of AE architecture
uses energy-based models as an alternative to the likelihood ratio or the MSE. Thus, the normalized AE avoids classifying genuinely
complex albeit standard events as anomalous. For more details on this last kind of AE and its possible application to HEP, see
Ref. [76]. As mentioned earlier, diffusion models are also being explored as an alternative method to perform density estimation,
similar to variational autoencoders, utilizing the learned density as a permutation-invariant anomaly detection score [42].
As mentioned earlier, diffusion models are also increasingly being investigated as an alternative approach for density estimation.
This method parallels the use of variational autoencoders, leveraging the learned density to create a permutation-invariant score
for anomaly detection, as detailed in Ref. [42].
For true model-independent searches, no assumption should be made about the alternative model. In Ref. [80], the alternative
hypothesis is parametrized as a mixture of the background model and a number of additional Gaussians. Another method is the
New Physics Learning Machine (NPLM) [87]. Here, the alternative hypothesis is being parametrized by the network itself: given a
dataset and a reference sample (like Monte Carlo simulation or data from a data sideband), a neural network is constructed such
that it parametrizes the alternative model as small perturbations away from the reference. When this model is trained, it learns
8
V. Belis et al. Reviews in Physics 12 (2024) 100091
Fig. 6. A schematic of the New Physics Learning Machine. Given a data sample and a reference sample, the neural network itself parametrizes the alternative
hypothesis. The output of the algorithm is the ratio between the best-fit data distribution and the reference distribution, and a test statistic variable t [87].
Fig. 7. P-values as a function of injected signal cross sections for different anomaly detection methods (solid lines) for two different signals: X→YY’ with
𝑀𝑋 = 3 TeV, 𝑀𝑌 = 170 GeV and 𝑀𝑌 ′ = 170 GeV (left), and W’ →tB’ with 𝑀𝑊 ′ = 3 TeV and 𝑀𝐵′ = 400 GeV (right). Several reference methods (dashed
lines) are also added: an inclusive dijet search (black), a traditional two-prong-targeted event selection (brown), and a traditional three-prong-targeted selection
(beige) [88].
the maximum likelihood fit to the data by construction, since its loss incorporates the log likelihood of the data. Its output is the
ratio between the best-fit data distribution and the reference distribution, which is used as a test statistic to select data that displays
a high level of discrepancy with the reference model. This ratio measures the disagreement between the reference model and the
data and can be used for hypothesis testing. An overview of the NPLM design in shown in Fig. 6. A drawback of this method is the
difficulty of defining the reference sample . For example, can be a taken from Monte Carlo simulation, with the caveat that
this might be a less than optimal approximation to nature. Alternatively, the reference sample can be taken from a data sideband.
However, in this case the difficulty is to find a region that is signal free, but still statistically identical to the data signal region.
Integrating NPLM with techniques such as CURTAINs or CATHODE offers a potential method for creating the reference sample.
This entails training a conditional density estimator with data from signal-free sidebands, enabling effective extrapolation into the
signal region. For the technique to be effective and avoid generating false positives, it is crucial that the density estimation maintains
a high degree of accuracy throughout the entire spectrum of the variable of interest. One challenge arises when integrating NPLM
in its full power, which is capable of identifying overdensities across multiple dimensions simultaneously, with conditional density
estimation. This integration demands conditioning on multiple variables at the same time, adding a layer of difficulty to the process.
A challenge in utilizing anomaly detection for discovering new physics lies in the inherent difficulty of optimizing these
algorithms when the nature of the signal remains unknown a priori. Moreover, the sensitivity of various anomaly detection methods
can vary considerably depending on the type of signal, as demonstrated in Fig. 7 and elaborated in Ref. [88]. The best one can hope
to do is to monitor the performance on wide variety of different potential signals
9
V. Belis et al. Reviews in Physics 12 (2024) 100091
The ultimate limiting factor for many searches for new physics is the event selection system of particle detectors. Tens of terabytes
of data per second are produced from proton–proton collisions occurring every 25 ns, an extreme data rate that cannot be read out
and stored. The rate is reduced by a real-time, two-stage event filter processing system – the trigger – which decides whether each
collision event should be kept for further analysis or be discarded. The first stage, the Level-1 trigger, is completely hardware-based
running around one thousand large field programmable (FPGA) gate arrays on custom boards. The data is buffered close to the
detector while the processing occurs, with a maximum latency of (1) μs to make the trigger decision. High selection accuracy in
the trigger is crucial in order to keep only the most interesting events while keeping the output bandwidth low, reducing the event
rate from 40 MHz to 100 kHz. The data accepted by the Level-1 trigger, are read out from the detector and sent to the second
software-based event filtering system, the High Level Trigger (HLT). Here the data rate is further reduced from 100 kHz to 1 kHz.
This processing is done on commercially available CPUs and, recently, also on GPU accelerators [89]. The latency requirement at
the HLT is (100) ms.
Recently, it has been proposed to look for Beyond Standard Model physics signatures in a model-agnostic way at the trigger level,
both at the HLT [90] and at the Level-1 [91] stage. The existing selection algorithms within the trigger system currently prioritize
collisions that generate high-energy outgoing particles. However, these algorithms have reduced sensitivity to e.g. signatures
involving a high multiplicity of low-momentum particles. To address this, outlier detection techniques have garnered attention for
their potential to enhance acceptance rates for events that are challenging to capture using conventional algorithms. The challenge
when designing such models is to adhere to the strict latency, resource and throughput constraints of the trigger.
Algorithm targeting the completely hardware-based Level-1 system are especially difficult to design. Deploying ML algorithms
on FPGAs presents a significant challenge due to the specialized engineering expertise required. Unlike traditional software
implementations, which can be executed on general-purpose processors, FPGAs demand a deep understanding of hardware design
and optimization. The process of mapping complex mathematical operations, like those found in neural networks, onto FPGA circuits
is intricate and requires careful consideration of factors such as data flow, parallelism, and memory access patterns. This becomes
especially important in the Level-1 trigger, where the maximum latency per algorithm can be as low as 50 ns and only a few
percent of the FPGA resources can be allocated to one specific algorithm. Recently, this process has been made easier through
the introduction of software libraries that perform an automatic translation of ML models into highly parallel FPGA firmware,
hls4ml [92] and Finn [93]. These libraries are interfaced to libraries that perform quantization-aware training, a method for reducing
the numerical precision of weights and activations in a neural network during training, hence reducing their memory footprint.
Utilizing these tools, Ref. [91] demonstrates that real-time anomaly detection using a variational auto-encoder architecture is
feasible within 100 ns and using only a fraction of the FPGA resources. This is made possible through quantization-aware training,
and clever architecture choices. For instance, rather than using the mean-squared error between the input and the output of the
autoencoder as anomaly score, only the KL divergence term entering the VAE loss is used. The benefits of using the encoder and
the KL term only is that one can avoid performing Gaussian sampling on the hardware, saving resources and latency by not having
to evaluate the decoder and in addition there is no need to buffer the input data for computation of the MSE. This demonstration
of the capability to perform real-time anomaly detection on FPGAs has generated attention within the community and initiated a
challenge similar to those found on Kaggle, the ADC 2021 challenge [84], as well as a dataset [4] for benchmarking such algorithms.
Recently, an outlier detection algorithm similar to the one described in Ref. [91] has been deployed into a copy of the CMS
Experiment Global Trigger (GT) board [94]. This copy of the GT system receives exactly the same input as the CMS GT, but cannot
trigger a full read-out of the CMS detector, making it an excellent test-bench for algorithms targeting the main system. The anomaly
detection algorithm, referred to as AXOL1TL, has been trained on unbiased data collected by the CMS experiment and shown to
improve the signal efficiency for a range of different BSM signals by up to 46%, without significantly increasing the background
rate. Taking as input a subset of the available information available in the CMS Level-1 Trigger (the four-momentum of 10 jets, 4
muons, 4 electrons/photons and missing energy), and returning an output anomaly score for each event, AXOL1TL operates at an
extremely low latency of 50 ns and uses less than 1% of the FPGA resources. This is made possible through aggressive quantization
and parallelization of the autoencoder. Fig. 8 shows an event display of the highest anomaly score event selected by AXOL1TL after
analyzing CMS data collected in 2023. The event does not pass and other L1 trigger algorithm and is characterized by a very high
number of low to medium momentum jets.
When using outlier detection as a way of discovering new physics phenomena, the detection of outliers is not sufficient. A full
statistical framework for hypothesis testing is also necessary in order to claim discovery. This requires a background estimate with
which the observed data in the signal region can be compared to, and this can be either based on simulated data or it can be
fully data-driven. This is also true when it comes to analyzing data collected by an anomaly trigger. From the discussion above
on density estimation, the background estimate was an important part of the anomaly detection method itself, for instance in the
CWoLa bumphunt the estimate was taken from the regions adjacent to the signal region and in Cathode it was sampled from
the ML-generated density estimate itself. For autoencoder-based anomaly, the background estimate is not provided by the model
itself. It only offers a method for achieving high signal sensitivity. In Ref. [95], a statistical method for detecting non-resonant
anomalies using auto-encoders is introduced. In this approach, multiple autoencoders are trained with the aim of maximizing their
independence from one another. This is achieved by utilizing the distance de-correlation (DisCo) method [96,97], where a regularizer
term based on the DisCo measure of statistical dependence is included in the training. Events are classified as anomalous if their
reconstruction quality is poor across all autoencoders. Instances classified as anomalous by one autoencoder, but not by all, provide
the necessary context for estimating the Standard Model background in a manner that is not model dependent. This estimation is
10
V. Belis et al. Reviews in Physics 12 (2024) 100091
Fig. 8. An event selected by an autoencoder-based anomaly detection hardware triggering algorithm in the CMS Experiment.
Source: Figure from Ref. [94].
carried out using the ABCD method. The ABCD method is commonly used for data-driven background estimation in particle physics.
It consists of designing four data regions, A, B C, and D, based on orthogonal selections on two independent variables, 𝜏1 and 𝜏2 , and
then transferring the background prediction from the three signal-free regions into the signal region. In Ref. [95], the two variables
are the anomaly scores of the two autoencoders, which are now statistically independent due to the inclusion of the DisCo term at
training time. In order to use this to design an anomaly detection method that can run in the trigger system, the authors propose
to preserve all events falling within the signal-sensitive region as defined by the two autoencoders. Additionally, a random subset
of events in the other three regions would be conserved for subsequent offline background estimation.
Trigger algorithms typically focus on achieving a specific signal efficiency. However, in anomaly detection, which aims to be
sensitive to a broad spectrum of new signals, there is not a definite signal efficiency to target. Instead, these algorithms are calibrated
for a certain false positive rate (FPR), which is directly related to a predetermined trigger rate. This rate generally falls within the
range of around (10 − 100) Hz, necessitating an FPR near ∼ 10−6 . During the tuning process, the signal efficiency across various
potential new physics scenarios is tracked for guidance, along with the loss on the self-supervisory label (measured as the Mean
Squared Error between input and output). Yet, developing improved metrics for refining outlier detection methods, particularly
those that include sensitivity to novel, unidentified signals, remains a challenging and unresolved task.
Anomaly detection can also be used for automatic data quality control in particle detectors.
Data quality control involves measuring physical properties of proton collision products, and attempting to detect irregular
behavior within subdetectors. This can for instance be cases where a portion of the subdetector becomes unresponsive. When such
anomalies occur, their impact is manifest in the properties that are measured or reconstructed from the data.
To maintain high data quality, data quality monitoring (DQM) traditionally relies on some predefined set of statistics and rules
that define the expected or normal values for these statistics. An important feature of these metrics is their capacity to flag anomalies
when significant deviations from the expected distributions are detected. The creation and maintenance of such statistics require
deep understanding of the detector and the potential anomalies it may encounter.
As an alternative, ML based solutions have been proposed to automate this process [98–100]. In Ref. [99], a thorough comparison
of supervised, semi-supervised and self-supervised approaches for the detection of potential detector failure is introduced. For a
classifier based algorithm, the caveat of the necessity of obtaining a large amount of training data is overcome by using a low number
of training parameters in the model. To improve the anomaly detection capabilities of autoencoders used for DQM, a mix of labeled
negative and unlabeled samples are used, so-called semi-supervised novelty detection. Autoencoder-based anomaly detection has
11
V. Belis et al. Reviews in Physics 12 (2024) 100091
been integrated into the CMS Experiment online DQM infrastructure [100]. For the reasons discussed in Ref. [74], autoencoders are
not necessarily the best architecture for such tasks, as the outliers one are attempting to identify might be a simpler subset of the
training data and will hence not be classified as outliers. Again, here variational autoencoders and classification in the latent space
can help.
6.1. Background
In recent years, Quantum Machine Learning (QML) has emerged as a new paradigm for data processing at the intersection of
machine learning and quantum information processing. Quantum computing has the potential to address real-world challenges
that are difficult or even intractable for classical computers [101–103]. Such problems include prime number factoring [104], a
problem at the basis of classical modern-day encryption, search in unstructured databases [105], solving systems of equations [106],
and simulations of quantum systems, enabling first-principle computation of chemical properties in atomic, molecular, and nuclear
systems [107–109].1
Initially, applications of quantum computing in Machine Learning (ML) focused on investigating speedups in computationally
expensive subroutines of learning algorithms, such as optimization and matrix inversion [110–112]. Through this scope, replacing
classical subroutines with quantum algorithms provide provable speedups in terms of runtime complexity of the ML training.
Nevertheless, such proofs frequently require large and fault tolerant quantum hardware. Namely, quantum computers with error
correction schemes that are able to arbitrarily suppress the inherent logical error rates [113]. Such devices do not exist yet. Currently
available quantum computers are noisy and have limited number of qubits of small decoherence times [114]. Hence, the size of
the quantum circuits and the number of operations that can be carried out at present are limited. Quantum algorithms with too
many operations for the device at hand, can be rendered useless, or at least equivalent to a classical computation, by the inherent
hardware noise.
Lately, studies have also investigated the potential of quantum computing to enhance fundamentally the learning model
itself [115–118]. QML models have been shown to generalize well with few training data [119], to provide advantages over classical
algorithms for specific types of learning problems [120–124], and are able to identify patterns in data that cannot be recognized
efficiently with classical methods [125]. Mirroring classical models for classification tasks, QML algorithms can be coarsely grouped
into two categories: kernel-based methods and variational learning approaches [126].
In the former category a main example are Quantum Support Vector Machines (QSVM), where a classical Support Vector
Machine (SVM) [127] is equipped with a quantum kernel. The values of the kernel are evaluated on a quantum computer through
measurements [118,128]. During training, (Q)SVMs find the hyperplane that separates different classes of data, maximizing the
margin between them. These models can create non-linear decision boundaries in the data input space when equipped with kernel
functions [127]. The kernels are constructed by feature maps that transform the data into a higher dimensional feature space, in
which the classes can be more effectively separated by a linear decision boundary. In the case of a quantum kernel, the data is
mapped to the Hilbert space spanned by the qubit states. The dimensionality of this space grows exponentially with the number
of qubits, and hence, such models are difficult to simulate classically. After constructing the quantum kernel matrix from the
measurements the loss function of the model is minimized on a classical device using quadratic programming techniques. In particle
physics, SVMs can be used for supervised classification tasks [129–131], although their use is not as prevalent as deep learning
approaches or ensemble models such as boosted decision trees. Additionally, kernel machines have been extended to an unsupervised
setting [132], where the training data is assumed to contain mostly background events and an upper bound on the expected anomaly
contamination is set using a hyperparameter.
The latter category encompasses parametrized quantum circuits, also referred to as Quantum Neural Networks (QNN) or
variational quantum circuits. These circuits are composed of gates whose parameters can be tuned iteratively to minimize a loss
function using classical gradient-based learning techniques [133,134]. The output of the circuit is an expectation value of an
operator, that is sampled from a quantum computer. This approach allows for the training of quantum circuits to perform specific
tasks, such as classification and generative modeling. Specific architectures of QNNs have been shown to be universal function
approximators [135], and that they can be expressed as a Fourier series expansion [136]. Contrary to (quantum) kernel machines,
the loss function landscape of QNNs is non-convex, which can lead to trainability issues similar to the ones of the vanishing gradient
problem in classical neural networks [137–139]. Nevertheless, the authors of Ref. [140] argue that under certain conditions, kernel-
based models can also manifest similar problems in training. Additionally, some works provide a unified view of QML [141], while
others claim that the kernel-based learning is more natural for QML [142].
In most current applications, one can treat the quantum computer as a specialized processing unit, that is part of the overall
computation. The hybrid algorithms discussed above aim to leverage the different strengths of classical and quantum processing
units while mitigating their corresponding weaknesses.
12
V. Belis et al. Reviews in Physics 12 (2024) 100091
Studies have assessed the potential of quantum computing and variational algorithms for simulations of lattice field theo-
ries [143–145], as an alternative to Monte Carlo (MC) techniques [146–149], and for parton showering simulations [150,151].
Research along these lines is motivated by the question of whether quantum algorithms can provide a natural platform for simulating
fundamental physics [152]. An additional motive is the prospect of quantum computers providing a more favorable computational
complexity than currently available classical methods. Furthermore, QML models have been developed for solving reconstruction
problems in the context of collider experiments [153–157].
In terms of classification tasks in a model-dependent setting, quantum computing was first considered in Ref. [158]. Therein, the
training of a classifier for 𝐻 → 𝛾𝛾 events was mapped to a quantum annealing task. Since then, studies have mainly focused on the
design and implementation of supervised QML algorithms based on different QSVM and QNN architectures that are able classify High
Energy Physics (HEP) events by discriminating the signal distribution from the background distribution [159–167]. Such quantum
models, are often developed and assessed via computationally expensive quantum simulations on classical processors using limited
number of qubits; typically up to 20. In these simulations, the algorithms can be investigated in an ideal noiseless environment.
After the architecture has been chosen and its hyperparameters have converged to values that lead to good performance on the
learning task at hand, the QML algorithms are tested by running experiments on real hardware via cloud-based platforms.
The developed quantum models are typically benchmarked against classical models of similar complexity, that are trained on
the same small data sets. So far, the number of training data points is at the order of 102 to 104 . HEP datasets are frequently high
dimensional, with number of features exceeding the order of presently available number qubits, posing a challenge for direct input
and processing by data encoding circuits on current quantum devices. To address this challenge, a set of reduced features is typically
used as an input to the QML models. This representation of reduced dimensionality is obtained by manual selection of physical
variables [159,160], Principle Component Analysis (PCA) [166–169], or autoencoders [161,170]. In the case of autoencoders, the
compression of HEP events can be regarded as more representative since these models can, at least approximately, retain non-linear
correlations of the input features in their latent space. Such higher order correlations are removed by definition in the case of PCA,
and are potentially lost in manual feature selection or feature extraction based on univariate discrimination metrics [161].
So far, in most studies regarding supervised models, the performance of the quantum algorithms is competitive and matches the
performance of their classical counterparts. Numerical evidence suggests that QML algorithms might outperform classical models
when the training datasets are small [159,162,166,171]. However, such a property has not been proven in general, yet (see Fig. 9).
Recently, different strategies were proposed for new-physics searches at the Large Hadron Collider (LHC) using QML in
the context of anomaly detection [168,170,172–176]. In Ref. [172], Gaussian Boson Sampling (GBS) is used to create a lower
dimensional representation of Beyond Standard Model (BSM) events where the Higgs boson decays into two pseudoscalar particles.
GBS is classically difficult to simulate and can be implemented using continuous variable devices such as photonic quantum
computers. This procedure is combined with a quantum version of the K-means clustering algorithm, Q-means, to detect anomalies.
K-means is a method that aims to partition an unlabeled dataset into K clusters in the feature space. Each cluster center, also
called a centroid, is defined as the mean of the datapoints that belong to that cluster. Each datapoint is assigned to the nearest
centroid, according to some distance measure, typically the Euclidean metric, which serves us the loss function of the algorithm.
The cluster assignment and the coordinates of the centroids are iteratively updated, according to the loss minimization procedure,
until the datapoints have converged to specific stable clusters. In the case of Q-means, the datapoints are embedded in the quantum
Hilbert space, where the distance calculation occurs depending on the chosen quantum circuit. Additionally, depending on the design
of the quantum model the minimization of the loss can be accomplished with a quantum or classical algorithm. K-means has been
applied to HEP for jet clustering [177–179], while Q-means and its variants have been applied also for unsupervised detection of
new-physics events [170,172].
A quantum autoencoder (QAE) is considered in Ref. [173] using four physical variables as an input. The authors demonstrate,
both in quantum simulation and hardware, that the QAE is a promising approach for BSM scenarios involving a resonant heavy
Higgs decaying to a pair of top quarks, and a Standard Model (SM) Higgs decaying to two dark matter particles. The authors of
Ref. [176] employ QAEs for the detection of long-lived particles and adapt the proposed methods for execution on real quantum
hardware. Additionally, different architectures of QNN have been investigated in Ref. [174], using low dimensional (simulated)
datasets involving Higgs-like scalar particles signatures as anomalies in a semi-supervised setting. The authors do not identify any
region where the tested QML models present an advantage in performance or in terms of the needed size of the trained dataset.
Ref. [168] proposes a simulation-assisted new-physics search where supervised quantum and classical SVMs are trained on
a dataset that contains SM processes as background and an artificial set of anomalous events obtained from the SM-distributed
features as signal. Specifically, the authors generate the distributions of the signal samples by a so-called scrambling process,
in which the feature distributions of the background are smeared by the normal distribution, preserving energy and momentum
13
V. Belis et al. Reviews in Physics 12 (2024) 100091
Fig. 9. A typical classical-quantum pipeline. The input data is compressed using dimensionality reduction in the form of an autoencoder, and is then passed to
an unsupervised quantum kernel machine and clustering algorithms [170].
conservation. Furthermore, it is demonstrated that the considered models are able to generalize to real signals such as Higgs and
Graviton production events.
In Ref. [175], a quantum Generative Adversarial Network is designed to extract an anomaly score for each data input. The
authors benchmark the proposed model and verify its efficacy in data sets where they treat the Higgs boson production and
Graviton production as anomalies, respectively. Additionally, generative modeling in the context of Hamiltonian learning has been
investigated for semi-leptonic top and dijet event production [180]. The anomaly score in Ref. [180] is constructed using the different
properties of the time evolution of quantum states that represent background and signal data.
A new-physics search in dijet topologies is addressed in Ref. [170], where an unsupervised quantum kernel machine and quantum
clustering methods are designed to define a metric of typicality for QCD jets (cf. Fig. 9). The dijet events are described by 600
features – 100 particle constituents per jet and three features per particle – and the examined anomalies include two different
Graviton scenarios and a BSM scalar boson production with the final state. The authors develop a convolutional autoencoder to
produce a compressed representation of the HEP events, addressing the challenge of directly processing realistic high-dimensional
data on current quantum devices. Consequently, the quantum anomaly detection algorithms use as an input the latent representation
of the data that is generated by the encoder and are trained using QCD background events. For the proposed kernel-based anomaly
detection model, this work identifies an advantage in performance of the quantum model over its classical counterpart.
In HEP applications so far, the quantum models are not designed to explicitly manifest an inductive bias towards the structure of
the chosen (simulated) particle physics datasets. In the aforementioned studies, the model architectures, i.e., quantum circuits used
for the implementation of QNNs and feature maps for the kernel methods, are constructed following ansätze in the QML literature
that have desired properties such as expressiveness and hardware efficiency.
Many QML algorithms have been inspired by classical model architectures, such as autoencoders [181], convolutional neural
networks [182], equivariant models [183], and graph neural networks [184]. Despite drawing inspiration from classical models,
these quantum counterparts may exhibit distinct properties and inductive biases [121,185]. The studies presented in Sections 6.3
and 6.4 compare the performance of their proposed models to their classical counterparts for the task at hand. However, beyond
promising results in specific problems and datasets, identifying precisely in which applications QML models could provide consistent
benefits such as enhancement in model performance, or computational speed-ups, still remains an open question and an active area
of research. Furthermore, due to limitations in current hardware, the behavior of QML models in the regime that is comparable
to current state-of-the-art deep learning models, i.e., having millions or even billions of training samples and model parameters, is
unknown.
14
V. Belis et al. Reviews in Physics 12 (2024) 100091
In general, the exploration of QML strategies for HEP data is, at least partly, motivated by the question of whether quantum
models can exploit correlations and information existing in particle physics datasets leading to advantages in performance. It is
important to note, that no studies, so far, have used quantum models for supervised or unsupervised classification in real data from
HEP experiments.
The data measured by the detectors and stored for the analysis of HEP experiments is classical. However, a quantum field
theory framework is essential to predict and properly explain the outcome of such experiments. Furthermore, remnants of the
initial quantum mechanical process – particle interaction – are still present in the data. Specifically, measuring spin correlations
between particles [186], observing entanglement between particles produced in proton collisions [187–190] and violation of Bell
inequalities [191–193] in LHC data has been established. Measuring these first-principle quantities highlights that data from particle
physics experiments cannot be described by classical local hidden-variable theories. In conclusion, the topics discussed above
represent an active field of research and hold promise for classical and quantum data analysis algorithms that can enhance our
ability to probe for new-physics.
Anomaly detection techniques have become an integral part of modern particle physics research, and are being utilized on
multiple fronts: for new physics searches, triggering and event selection, model-independent methods, detector fault and data quality
monitoring, and in quantum machine learning. Automated data quality monitoring using ML-based anomaly detection methods is
now being utilized in experiments, saving significant person power. This will be important as the LHC is upgraded to the HL-LHC,
collecting an order of magnitude more data that will need to be validated. With the absence of any signs of new physics at particle
colliders, the development of methodologies for model-independent searches and unbiased event filtering systems for the collection
and subsequent analysis of particle collider data is crucial. To this end, multiple open data challenges related to anomaly detection
for new physics discovery have been created [5,84,194]. This has led to a substantial increase of novel methods that are being
incorporated into particle physics experiments. There are still conceptual challenges related to model-independent methods, as
discussed in this review article, and more research is needed to ensure these methods are sound and consistent. For example, a
central problem is validating the performance of anomaly detection algorithms developed for new physics searches. The current
form of validation for this type of algorithms consists of measuring the signal efficiency on a few selected benchmark simulated
signal samples. This is not ideal since good performance on a simulated anomaly sample is not equivalent to good performance in
a realistic setting: perhaps a true new physics sample remains elusive to an algorithm that performs extremely well on a selection
of simulated new physics hypotheses. This is not the usual context for conventional anomaly detection applications where a human
expert can produce an anomaly data set by explicitly labeling anomalous samples, e.g., in identifying financial fraud [195]. The
development of a rigorous and objective metric that measures the performance of anomaly detection algorithms for new physics
searches remains an open problem. Alongside this main challenge, the robustness of such algorithms against anomalies produced
by detector effects and minimal correlation of the anomaly metric with a certain data feature [86] represent important issues in
anomaly detection for physics that have not yet been overcome. In summary, anomaly detection in particle physics is a rapidly
growing field of research, with advancements continuously being made on the algorithmic, computational, and conceptual side.
Several challenges remain to be addressed, ensuring that the field will continue to grow in the coming years.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared
to influence the work reported in this paper.
Acknowledgments
P.O. and T.Å. are supported by the Swiss National Science Foundation Grant No. PZ00P2_201594. V.B. is supported by an ETH
Research Grant (Grant No. ETH C-04 21-2).
References
[1] M.A. Pimentel, D.A. Clifton, L. Clifton, L. Tarassenko, A review of novelty detection, Signal Process. 99 (2014) 215–249.
[2] G. Kasieczka, et al., The LHC Olympics 2020: a community challenge for anomaly detection in high energy physics, Rep. Progr. Phys. 84 (12) (2021).
[3] G. Kasieczka, B. Nachman, D. Shih, New methods and datasets for group anomaly detection from fundamental physics, in: Conference on Knowledge
Discovery and Data Mining, 2021, arXiv:2107.02821.
[4] E. Govorkova, E. Puljak, T. Aarrestad, M. Pierini, K.A. Woźniak, J. Ngadiuba, LHC physics dataset for unsupervised New Physics detection at 40 MHz,
Sci. Data 9 (1) (2022) 118.
[5] T. Aarrestad, M. van Beekveld, M. Bona, A. Boveia, S. Caron, J. Davies, A.D. Simone, C. Doglioni, J.M. Duarte, A. Farbin, H. Gupta, L. Hendriks, L.
Heinrich, J. Howarth, P. Jawahar, A. Jueid, J. Lastow, A. Leinweber, J. Mamuzic, E. Merényi, A. Morandini, P. Moskvitina, C. Nellist, J. Ngadiuba, B.
Ostdiek, M. Pierini, B. Ravina, R.R. de Austri, S. Sekmen, M. Touranakou, M. Vaškevičiūte, R. Vilalta, J.R. Vlimant, R. Verheyen, M. White, E. Wulff, E.
Wallin, K.A. Wozniak, Z. Zhang, The dark machines anomaly score challenge: Benchmark data and model independent event classification for the large
hadron collider, SciPost Phys. 12 (2022) 043, http://dx.doi.org/10.21468/SciPostPhys.12.1.043, URL https://scipost.org/10.21468/SciPostPhys.12.1.043.
[6] V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: A survey, ACM Comput. Surv. (ISSN: 0360-0300) 41 (3) (2009) http://dx.doi.org/10.1145/
1541880.1541882.
[7] C.C. Aggarwal, Outlier Analysis, second ed., Springer Publishing Company, ISBN: 3319475770, 2016, Incorporated.
15
V. Belis et al. Reviews in Physics 12 (2024) 100091
[8] J.A. Aguilar-Saavedra, J. Collins, R.K. Mishra, A generic anti-QCD jet tagger, J. High Energy Phys. 2017 (11) (2017) 163.
[9] ATLAS Collaboration, The ATLAS experiment at the CERN large hadron collider, JINST 3 (08) (2008) S08003, http://dx.doi.org/10.1088/1748-
0221/3/08/S08003.
[10] CMS Collaboration, The CMS Experiment at the CERN LHC, JINST 3 (2008) S08004, http://dx.doi.org/10.1088/1748-0221/3/08/S08004.
[11] B. Abbott, et al., D0 Collaboration, Search for new physics in e𝜇X data at DØ using SLEUTH: A quasi-model-independent search strategy for new physics,
Phys. Rev. D 62 (2000) 092004, http://dx.doi.org/10.1103/PhysRevD.62.092004, arXiv:hep-ex/0006011.
[12] F.D. Aaron, et al., H1 Collaboration, A general search for new phenomena at HERA, Phys. Lett. B 674 (2009) 257–268, http://dx.doi.org/10.1016/j.
physletb.2009.03.034, arXiv:0901.0507.
[13] A. Aktas, et al., H1 Collaboration, A General search for new phenomena in ep scattering at HERA, Phys. Lett. B 602 (2004) 14–30, http://dx.doi.org/
10.1016/j.physletb.2004.09.057, arXiv:hep-ex/0408044.
[14] K. Cranmer, Searching for New Physics: Contributions to LEP and the LHC (Ph.D. thesis), Wisconsin U., Madison, 2005, URL http://cds.cern.ch/record/
823591. Presented on 01 Nov 2005.
[15] T. Aaltonen, et al., CDF Collaboration, Model-independent and quasi-model-independent search for new physics at CDF, Phys. Rev. D 78 (2008)
http://dx.doi.org/10.1103/PhysRevD.78.012002, arXiv:0712.1311.
[16] T. Aaltonen, et al., CDF Collaboration, Model-independent global search for new high-𝑝𝑇 physics at CDF, 2007, http://dx.doi.org/10.2172/922303, arXiv.
arXiv:0712.2534.
[17] T. Aaltonen, et al., CDF Collaboration, Global search for new physics with 2.0 fb−1 at CDF, Phys. Rev. D 79 (2009) http://dx.doi.org/10.1103/PhysRevD.
79.011101, arXiv:0809.3781.
√
[18] CMS Collaboration, MUSiC, a Model Unspecific Search for New Physics, in pp Collisions at 𝑠 = 8 TeV, Tech. Rep., CERN, Geneva, 2017, URL
http://cds.cern.ch/record/2256653.
√
[19] CMS Collaboration, Model Unspecific Search for New Physics in pp Collisions at 𝑠 = 7 TeV, Tech. Rep., CERN, Geneva, 2011, URL http://cds.cern.ch/
record/1360173.
√
[20] CMS Collaboration, MUSiC, A Model Unspecific Search for New Physics, in pp Collisions at 𝑠 = 13 TeV, Tech. Rep., CERN, Geneva, 2020, URL
https://cds.cern.ch/record/2718811.
√
[21] CMS Collaboration, MUSiC: a model-unspecific search for new physics in proton–proton collisions at 𝑠 = 13 TeV, Eur. Phys. J. C 81 (7) (2021) 629,
http://dx.doi.org/10.1140/epjc/s10052-021-09236-z, arXiv:2010.02984.
[22] ATLAS Collaboration, A strategy for a general search for new phenomena using data-derived signal regions and its application within the ATLAS
experiment, Eur. Phys. J. C 79 (2) (2019) 120, http://dx.doi.org/10.1140/epjc/s10052-019-6540-y, arXiv:1807.07447.
√
[23] ATLAS Collaboration, A General Search for New Phenomena with the ATLAS Detector in pp Collisions at 𝑠 = 8 tev, Tech. Rep., CERN, Geneva, 2014,
URL https://cds.cern.ch/record/1666536.
√
[24] ATLAS Collaboration, A General Search for New Phenomena with the ATLAS Detector in pp Collisions at 𝑠 = 7 TeV, Tech. Rep., CERN, Geneva, 2012,
URL https://cds.cern.ch/record/1472686.
√
[25] ATLAS Collaboration, Dijet resonance search with weak supervision using 𝑠 = 13 TeV 𝑝𝑝 collisions in the ATLAS detector, Phys. Rev. Lett. 125 (13)
(2020) 131801, http://dx.doi.org/10.1103/PhysRevLett.125.131801, arXiv:2005.02983.
[26] J. Neyman, E.S. Pearson, IX. On the problem of the most efficient tests of statistical hypotheses, Philos. Trans. R. Soc. Lond. Ser. A Contain. Pap. Math.
Phys. Char. 231 (694–706) (1933) 289–337.
[27] L.M. Dery, B. Nachman, F. Rubbo, A. Schwartzman, Weakly supervised classification in high energy physics, J. High Energy Phys. 2017 (5) (2017) 145,
arXiv:1702.00414.
[28] G. Patrini, R. Nock, P. Rivera, T. Caetano, (Almost) no label no cry, in: Advances in Neural Information Processing Systems, 2014, pp. 190–198.
[29] E.M. Metodiev, B. Nachman, J. Thaler, Classification without labels: learning from mixed samples in high energy physics, J. High Energy Phys. 2017
(10) (2017) 174, http://dx.doi.org/10.1007/JHEP10(2017)174,
[30] J.H. Collins, K. Howe, B. Nachman, Extending the search for new resonances with machine learning, Phys. Rev. D 99 (2019) 014038, http://dx.doi.org/
10.1103/PhysRevD.99.014038, URL https://link.aps.org/doi/10.1103/PhysRevD.99.014038.
[31] A. Hallin, J. Isaacson, G. Kasieczka, C. Krause, B. Nachman, T. Quadfasel, M. Schlaffer, D. Shih, M. Sommerhalder, Classifying anomalies through outer
density estimation, Phys. Rev. D 106 (2022) 055006, http://dx.doi.org/10.1103/PhysRevD.106.055006, URL https://link.aps.org/doi/10.1103/PhysRevD.
106.055006.
[32] T. Finke, M. Krämer, M. Lipp, A. Mück, Boosting mono-jet searches with model-agnostic machine learning, J. High Energy Phys. 2022 (8) (2022) 15,
http://dx.doi.org/10.1007/JHEP08(2022)015,
[33] O. Amram, C.M. Suarez, Tag N’ Train: a technique to train improved classifiers on unlabeled data, J. High Energy Phys. 01 (2021) 153, http:
//dx.doi.org/10.1007/JHEP01(2021)153, arXiv:2002.12376.
[34] B. Nachman, D. Shih, Anomaly detection with density estimation, 2020, arXiv:2001.04990.
[35] D.J. Rezende, S. Mohamed, Variational inference with normalizing flows, 2016, arXiv:1505.05770.
[36] L. Dinh, J. Sohl-Dickstein, S. Bengio, Density estimation using real NVP, 2017, arXiv:1605.08803.
[37] J.A. Raine, S. Klein, D. Sengupta, T. Golling, CURTAINs for your sliding window: Constructing unobserved regions by transforming adjacent intervals,
Front. Big Data 6 (2023) 899345, http://dx.doi.org/10.3389/fdata.2023.899345, arXiv:2203.09470.
[38] S. Klein, J.A. Raine, T. Golling, Flows for flows: Training normalizing flows between arbitrary distributions with maximum likelihood estimation, 2022,
arXiv:2211.02487.
[39] D. Sengupta, S. Klein, J.A. Raine, T. Golling, CURTAINs flows for flows: Constructing unobserved regions with maximum likelihood estimation, 2023,
arXiv:2305.04646.
[40] J. Sohl-Dickstein, E.A. Weiss, N. Maheswaranathan, S. Ganguli, Deep unsupervised learning using nonequilibrium thermodynamics, in: Proceedings of the
32nd International Conference on International Conference on Machine Learning, Vol. 37, ICML ’15, JMLR.org, 2015, pp. 2256–2265.
[41] D. Sengupta, M. Leigh, J.A. Raine, S. Klein, T. Golling, Improving new physics searches with diffusion models for event observables and jet constituents,
2023, arXiv:2312.10130.
[42] V. Mikuni, B. Nachman, High-dimensional and permutation invariant anomaly detection, 2023, arXiv:2306.03933.
[43] T. Golling, S. Klein, R. Mastandrea, B. Nachman, Flow-enhanced transportation for anomaly detection, Phys. Rev. D 107 (2023) 096025, http:
//dx.doi.org/10.1103/PhysRevD.107.096025, URL https://link.aps.org/doi/10.1103/PhysRevD.107.096025.
[44] Y. LeCun, Modeles connexionnistes de l’apprentissage (connectionist learning models), Ph.D. thesis, Universite P. et M. Curie (Paris 6), 1987.
[45] D.H. Ballard, Modular learning in neural networks, in: Proceedings of the Sixth National Conference on Artificial Intelligence-Volume 1, 1987, pp.
279–284.
[46] G.E. Hinton, R. Zemel, Autoencoders, minimum description length and Helmholtz free energy, in: Advances in Neural Information Processing Systems,
Vol. 6, 1993.
[47] A. Radovic, M. Williams, D. Rousseau, M. Kagan, D. Bonacorsi, A. Himmel, A. Aurisano, K. Terao, T. Wongjirad, Machine learning at the energy and
intensity frontiers of particle physics, Nature 560 (7716) (2018) 41–48, http://dx.doi.org/10.1038/s41586-018-0361-2.
[48] K. Albertsson, et al., Machine learning in high energy physics community white paper, 2019, arXiv:1807.02876.
16
V. Belis et al. Reviews in Physics 12 (2024) 100091
[49] P. Jawahar, T. Aarrestad, N. Chernyavskaya, M. Pierini, K.A. Wozniak, J. Ngadiuba, J. Duarte, S. Tsan, Improving variational autoencoders for new
physics detection at the LHC with normalizing flows, Front. Big Data 5 (2022) 803685, http://dx.doi.org/10.3389/fdata.2022.803685, arXiv:2110.08508.
[50] S. Tsan, R. Kansal, A. Aportela, D. Diaz, J. Duarte, S. Krishna, F. Mokhtar, J.-R. Vlimant, M. Pierini, Particle graph autoencoders and differentiable,
learned energy mover’s distance, 2021, arXiv:2111.12849.
[51] T. Finke, M. Krämer, A. Morandini, A. Mück, I. Oleksiyuk, Autoencoders for unsupervised anomaly detection in high energy physics, J. High Energy
Phys. 2021 (6) (2021) http://dx.doi.org/10.1007/jhep06(2021)161.
[52] P. Laguarta, et al., Detection of anomalies amongst LIGO’s glitch populations with autoencoders, 2023, arXiv:2310.03453.
[53] L. Vaslin, V. Barra, J. Donini, GAN-AE : An anomaly detection algorithm for New Physics search in LHC data, 2023, arXiv:2305.15179.
[54] L. Anzalone, S.S. Chhibra, B. Maier, N. Chernyavskaya, M. Pierini, Triggering dark showers with conditional dual auto-encoders, 2023, arXiv:2306.12955.
[55] V. Böhm, A.G. Kim, S. Juneau, Fast and efficient identification of anomalous galaxy spectra with neural density estimation, Mon. Not. R. Astron. Soc.
526 (2) (2023) 3072–3087, http://dx.doi.org/10.1093/mnras/stad2773, arXiv:2308.00752.
[56] R. Balestriero, et al., A cookbook of self-supervised learning, 2023, arXiv:2304.12210.
[57] D.P. Kingma, M. Welling, Auto-encoding variational Bayes, 2014, arXiv:1312.6114.
[58] W. Joo, W. Lee, S. Park, I.-C. Moon, Dirichlet variational autoencoder, 2019, arXiv:1901.02739.
[59] G. Patrini, R. van den Berg, P. Forré, M. Carioni, S. Bhargav, M. Welling, T. Genewein, F. Nielsen, Sinkhorn AutoEncoders, 2019, arXiv:1810.01118.
[60] O. Cerri, T.Q. Nguyen, M. Pierini, M. Spiropulu, J.-R. Vlimant, Variational autoencoders for new physics mining at the Large Hadron Collider, J. High
Energy Phys. 2019 (5) (2019) http://dx.doi.org/10.1007/jhep05(2019)036.
[61] B. Dillon, T. Plehn, C. Sauer, P. Sorrenson, Better latent spaces for better autoencoders, SciPost Phys. 11 (3) (2021) http://dx.doi.org/10.21468/scipostphys.
11.3.061.
[62] T. Cheng, J.-F. Arguin, J. Leissner-Martin, J. Pilette, T. Golling, Variational autoencoders for anomalous jet tagging, Phys. Rev. D 107 (1) (2023)
http://dx.doi.org/10.1103/physrevd.107.016002.
[63] J.M. Joyce, Kullback-Leibler divergence, in: International Encyclopedia of Statistical Science, Springer, Berlin, Heidelberg, ISBN: 978-3-642-04898-2, 2011,
pp. 720–722, http://dx.doi.org/10.1007/978-3-642-04898-2_327.
[64] J. Paisley, D. Blei, M. Jordan, Variational Bayesian inference with stochastic search, 2012, arXiv:1206.6430.
[65] I. Tolstikhin, O. Bousquet, S. Gelly, B. Schoelkopf, Wasserstein auto-encoders, 2019, arXiv:1711.01558.
[66] P.T. Komiske, E.M. Metodiev, J. Thaler, Metric space of collider events, Phys. Rev. Lett. 123 (4) (2019) http://dx.doi.org/10.1103/physrevlett.123.041801.
[67] P.T. Komiske, E.M. Metodiev, J. Thaler, The hidden geometry of particle collisions, J. High Energy Phys. 2020 (7) (2020) http://dx.doi.org/10.1007/
jhep07(2020)006.
[68] M. Farina, Y. Nakai, D. Shih, Searching for new physics with deep autoencoders, Phys. Rev. D 101 (7) (2020) 075021, http://dx.doi.org/10.1103/
PhysRevD.101.075021, arXiv:1808.08992.
[69] T. Heimel, G. Kasieczka, T. Plehn, J.M. Thompson, QCD or What? SciPost Phys. 6 (3) (2019) 030, http://dx.doi.org/10.21468/SciPostPhys.6.3.030,
arXiv:1808.08979.
[70] A. Blance, M. Spannowsky, P. Waite, Adversarially-trained autoencoders for robust unsupervised new physics searches, J. High Energy Phys. 10 (2019)
047, http://dx.doi.org/10.1007/JHEP10(2019)047, arXiv:1905.10384.
[71] J. Hajer, Y.-Y. Li, T. Liu, H. Wang, Novelty detection meets collider physics, Phys. Rev. D 101 (7) (2020) 076015, http://dx.doi.org/10.1103/PhysRevD.
101.076015, arXiv:1807.10261.
[72] T.S. Roy, A.H. Vijay, A robust anomaly finder based on autoencoders, 2019, arXiv:1903.02032.
[73] G. Aad, et al., ATLAS Collaboration, Search for new phenomena in two-body invariant mass distributions using unsupervised machine learning for anomaly
√
detection at 𝑠 = 13 TeV with the ATLAS detector, 2023, arXiv:2307.01612.
[74] J. Batson, C.G. Haaf, Y. Kahn, D.A. Roberts, Topological obstructions to autoencoding, J. High Energy Phys. 2021 (4) (2021) 280.
[75] S. Yoon, Y.-K. Noh, F.C. Park, Autoencoding under normalization constraints, 2023, arXiv:2105.05735.
[76] B.M. Dillon, L. Favaro, T. Plehn, P. Sorrenson, M. Krämer, A normalized autoencoder for LHC triggers, 2023, arXiv:2206.14225.
[77] B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifiers, in: Proceedings of the Fifth Annual Workshop on Computational
Learning Theory, 1992, pp. 144–152.
[78] F.T. Liu, K.M. Ting, Z.-H. Zhou, Isolation forest, in: 2008 Eighth IEEE International Conference on Data Mining, 2008, pp. 413–422, http://dx.doi.org/
10.1109/ICDM.2008.17.
[79] M. van Beekveld, S. Caron, L. Hendriks, P. Jackson, A. Leinweber, S. Otten, R. Patrick, R. Ruiz De Austri, M. Santoni, M. White, Combining outlier analysis
algorithms to identify new physics at the LHC, J. High Energy Phys. 09 (2021) 024, http://dx.doi.org/10.1007/JHEP09(2021)024, arXiv:2010.07940.
[80] M. Kuusela, T. Vatanen, E. Malmi, T. Raiko, T. Aaltonen, Y. Nagai, Semi-supervised anomaly detection – towards model-independent searches of new
physics, J. Phys. Conf. Ser. 368 (1) (2012) 012032.
[81] V. Mikuni, F. Canelli, Unsupervised clustering for collider physics, Phys. Rev. D 103 (2021) 092007, http://dx.doi.org/10.1103/PhysRevD.103.092007,
URL https://link.aps.org/doi/10.1103/PhysRevD.103.092007.
[82] V. Mikuni, F. Canelli, ABCNet: an attention-based method for particle tagging, Eur. Phys. J. Plus 135 (6) (2020) 463.
[83] S. Roche, Q. Bayer, B. Carlson, W. Ouligian, P. Serhiayenka, J. Stelzer, T.M. Hong, Nanosecond anomaly detection with decision trees for high energy
physics and real-time application to exotic Higgs decays, 2023, arXiv:2304.03836.
[84] The ADC 2021 challenge, 2023, https://mpp-hep.github.io/ADC2021/. (Accessed 30 September 2023).
[85] A. Halilovic, Anomaly detection for the CERN large hadron collider injection magnets, 2018, Leuven U. URL https://cds.cern.ch/record/2665985. Presented
28 Jun 2018.
[86] T. Golling, T. Nobe, D. Proios, J.A. Raine, D. Sengupta, S. Voloshynovskiy, J.-F. Arguin, J.L. Martin, J. Pilette, D.B. Gupta, A. Farbin, The mass-ive issue:
Anomaly detection in jet physics, 2023, arXiv:2303.14134.
[87] R.T. D’Agnolo, A. Wulzer, Learning new physics from a machine, Phys. Rev. D 99 (2019) 015014, http://dx.doi.org/10.1103/PhysRevD.99.015014, URL
https://link.aps.org/doi/10.1103/PhysRevD.99.015014.
[88] P. Harris, W.P. Mccormack, S.E. Park, T. Quadfasel, M. Sommerhalder, L.J. Moureaux, G. Kasieczka, O. Amram, P. Maksimovic, B. Maier, M. Pierini,
K.A. Wozniak, T.K. Aarrestad, J. Ngadiuba, I. Zoi, S.K. Bright-Thonney, D. Shih, A. Bal, Machine Learning Techniques for Model-Independent Searches
in Dijet Final States, Tech. Rep., CERN, Geneva, 2023, URL https://cds.cern.ch/record/2881089.
[89] A. Bocci, on behalf of the CMS Collaboration, CMS high level trigger performance comparison on CPUs and GPUs, J. Phys. Conf. Ser. 2438 (1) (2023)
012016.
[90] O. Cerri, T.Q. Nguyen, M. Pierini, M. Spiropulu, J.-R. Vlimant, Variational autoencoders for new physics mining at the Large Hadron Collider, J. High
Energy Phys. 2019 (5) (2019) 36, http://dx.doi.org/10.1007/JHEP05(2019)036,
[91] E. Govorkova, E. Puljak, T. Aarrestad, T. James, V. Loncar, M. Pierini, A.A. Pol, N. Ghielmetti, M. Graczyk, S. Summers, J. Ngadiuba, T.Q. Nguyen,
J. Duarte, Z. Wu, Autoencoders on field-programmable gate arrays for real-time, unsupervised new physics detection at 40 MHz at the Large Hadron
Collider, Nat. Mach. Intell. 4 (2) (2022) 154–161.
[92] J. Duarte, et al., Fast inference of deep neural networks in FPGAs for particle physics, JINST 13 (07) (2018) P07027, http://dx.doi.org/10.1088/1748-
0221/13/07/P07027, arXiv:1804.06913.
17
V. Belis et al. Reviews in Physics 12 (2024) 100091
[93] M. Blott, T.B. Preußer, N.J. Fraser, G. Gambardella, K. O’brien, Y. Umuroglu, M. Leeser, K. Vissers, FINN-R: An end-to-end deep-learning framework for fast
exploration of quantized neural networks, ACM Trans. Reconfigurable Technol. Syst. (ISSN: 1936-7406) 11 (3) (2018) http://dx.doi.org/10.1145/3242897.
[94] CMS, Anomaly detection in the CMS global trigger test crate for run 3, 2023, URL https://cds.cern.ch/record/2876546.
[95] V. Mikuni, B. Nachman, D. Shih, Online-compatible unsupervised nonresonant anomaly detection, Phys. Rev. D 105 (2022) http://dx.doi.org/10.1103/
PhysRevD.105.055006, URL https://link.aps.org/doi/10.1103/PhysRevD.105.055006.
[96] G. Kasieczka, D. Shih, Robust jet classifiers through distance correlation, Phys. Rev. Lett. 125 (2020) http://dx.doi.org/10.1103/PhysRevLett.125.122001,
URL https://link.aps.org/doi/10.1103/PhysRevLett.125.122001.
[97] G. Kasieczka, B. Nachman, M.D. Schwartz, D. Shih, Automating the ABCD method with machine learning, Phys. Rev. D 103 (2021) http://dx.doi.org/
10.1103/PhysRevD.103.035021, URL https://link.aps.org/doi/10.1103/PhysRevD.103.035021.
[98] M. Borisyak, F. Ratnikov, D. Derkach, A. Ustyuzhanin, Towards automation of data quality system for CERN CMS experiment, J. Phys. Conf. Ser. 898
(9) (2017) 092041.
[99] A.A. Pol, Machine Learning Anomaly Detection Applications to Compact Muon Solenoid Data Quality Monitoring, U. Paris-Saclay, 2020, URL https:
//cds.cern.ch/record/2790963. Presented on 08 Jun 2020.
[100] A.A. Pol, G. Cerminara, C. Germain, M. Pierini, A. Seth, Detector monitoring with artificial neural networks at the CMS experiment at the CERN Large
Hadron Collider, 2018, arXiv:1808.00911.
[101] F. Arute, K. Arya, R. Babbush, D. Bacon, J.C. Bardin, R. Barends, R. Biswas, S. Boixo, F.G.S.L. Brandao, D.A. Buell, et al., Quantum supremacy
using a programmable superconducting processor, Nature 574 (7779) (2019) 505–510, http://dx.doi.org/10.1038/s41586-019-1666-5, (ISSN: 0028-0836,
1476-4687). URL http://www.nature.com/articles/s41586-019-1666-5.
[102] H.-S. Zhong, H. Wang, Y.-H. Deng, M.-C. Chen, L.-C. Peng, Y.-H. Luo, J. Qin, D. Wu, X. Ding, Y. Hu, et al., Quantum computational advantage using
photons, Science 370 (6523) (2020) 1460–1463, http://dx.doi.org/10.1126/science.abe8770, (ISSN: 0036-8075, 1095-9203). URL https://www.science.
org/doi/10.1126/science.abe8770.
[103] L.S. Madsen, F. Laudenbach, M.F. Askarani, F. Rortais, T. Vincent, J.F.F. Bulmer, F.M. Miatto, L. Neuhaus, L.G. Helt, M.J. Collins, et al., Quantum
computational advantage with a programmable photonic processor, Nature 606 (7912) (2022) 75–81, http://dx.doi.org/10.1038/s41586-022-04725-x,
(ISSN :0028-0836, 1476-4687). URL https://www.nature.com/articles/s41586-022-04725-x.
[104] P.W. Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer, SIAM J. Comput. 26 (5) (1997) 1484–1509,
http://dx.doi.org/10.1137/S0097539795293172.
[105] L.K. Grover, A fast quantum mechanical algorithm for database search, 1996, arXiv:quant-ph/9605043.
[106] A.W. Harrow, A. Hassidim, S. Lloyd, Quantum algorithm for linear systems of equations, Phys. Rev. Lett. 103 (15) (2009) http://dx.doi.org/10.1103/
PhysRevLett.103.150502.
[107] A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J.M. Chow, J.M. Gambetta, Hardware-efficient variational quantum eigensolver for small
molecules and quantum magnets, Nature 549 (7671) (2017) 242–246, http://dx.doi.org/10.1038/nature23879, 1476-4687.
[108] P.K. Barkoutsos, J.F. Gonthier, I. Sokolov, N. Moll, G. Salis, A. Fuhrer, M. Ganzhorn, D.J. Egger, M. Troyer, A. Mezzacapo, S. Filipp, I. Tavernelli,
Quantum algorithms for electronic structure calculations: Particle-hole Hamiltonian and optimized wave-function expansions, Phys. Rev. A 98 (2018)
http://dx.doi.org/10.1103/PhysRevA.98.022322, URL https://link.aps.org/doi/10.1103/PhysRevA.98.022322.
[109] O. Kiss, M. Grossi, P. Lougovski, F. Sanchez, S. Vallecorsa, T. Papenbrock, Quantum computing of the 6 Li nucleus via ordered unitary coupled clusters,
Phys. Rev. C 106 (2022) http://dx.doi.org/10.1103/PhysRevC.106.034325, URL https://link.aps.org/doi/10.1103/PhysRevC.106.034325.
[110] S. Lloyd, M. Mohseni, P. Rebentrost, Quantum principal component analysis, Nat. Phys. (ISSN: 1745-2481) 10 (99) (2014) 631–633, http://dx.doi.org/
10.1038/nphys3029.
[111] N. Wiebe, D. Braun, S. Lloyd, Quantum algorithm for data fitting, Phys. Rev. Lett. 109 (5) (2012) http://dx.doi.org/10.1103/PhysRevLett.109.050505.
[112] P. Rebentrost, M. Mohseni, S. Lloyd, Quantum support vector machine for big data classification, Phys. Rev. Lett. 113 (13) (2014) http://dx.doi.org/10.
1103/PhysRevLett.113.130503.
[113] R. Acharya, I. Aleiner, et al., Suppressing quantum errors by scaling a surface code logical qubit, Nature (ISSN: 1476-4687) 614 (7949) (2023) 676–681,
http://dx.doi.org/10.1038/s41586-022-05434-1.
[114] J. Preskill, Quantum Computing in the NISQ era and beyond, Quantum (ISSN: 2521-327X) 2 (2018) 79, http://dx.doi.org/10.22331/q-2018-08-06-79,
URL https://quantum-journal.org/papers/q-2018-08-06-79/.
[115] J. Biamonte, et al., Quantum machine learning, Nature (ISSN: 1476-4687) 549 (7671) (2017) 195–202, http://dx.doi.org/10.1038/nature23474.
[116] M. Schuld, N. Killoran, Quantum machine learning in feature Hilbert spaces, Phys. Rev. Lett. 122 (2018) http://dx.doi.org/10.1103/physrevlett.122.
040504, URL https://arxiv.org/abs/1803.07128v1.
[117] M. Schuld, F. Petruccione, Supervised Learning with Quantum Computers, Springer International Publishing, 2018, http://dx.doi.org/10.1007/978-3-319-
96424-9.
[118] V. Havlíček, A. Córcoles, K. Temme, et al., Supervised learning with quantum-enhanced feature spaces, Nature 567 (2019) 209–212, http://dx.doi.org/
10.1038/s41586-019-0980-2.
[119] M. Caro, et al., Generalization in quantum machine learning from few training data, Nature Commun. (ISSN: 2041-1723) 13 (1) (2022) 4919,
http://dx.doi.org/10.1038/s41467-022-32550-3, URL https://www.nature.com/articles/s41467-022-32550-3. Number: 1 Publisher: Nature Publishing
Group.
[120] Y. Liu, et al., A rigorous and robust quantum speed-up in supervised machine learning, Nat. Phys. 17 (9) (2021) 1013–1017.
[121] J. Kübler, S. Buchholz, B. Schölkopf, The inductive bias of quantum kernels, in: M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, J.W. Vaughan (Eds.),
Advances in Neural Information Processing Systems, Vol. 34, Curran Associates, Inc, 2021, pp. 12661–12673, URL https://proceedings.neurips.cc/paper/
2021/file/69adc1e107f7f7d035d7baf04342e1ca-Paper.pdf.
[122] H. Huang, et al., Power of data in quantum machine learning, Nature Commun. (ISSN: 2041-1723) 12 (1) (2021) 2631, http://dx.doi.org/10.1038/s41467-
021-22539-9.
[123] N. Pirnay, R. Sweke, J. Eisert, J.-P. Seifert, A super-polynomial quantum-classical separation for density modelling, 2022, http://dx.doi.org/10.48550/
ARXIV.2210.14936, arXiv. arXiv:2210.14936.
[124] T. Muser, E. Zapusek, V. Belis, F. Reiter, Provable advantages of kernel-based quantum learners and quantum preprocessing based on Grover’s algorithm,
2023, arXiv:2309.14406.
[125] H. Huang, et al., Quantum advantage in learning from experiments, Science 376 (6598) (2022) 1182–1186, http://dx.doi.org/10.1126/science.abn7293,
URL https://www.science.org/doi/abs/10.1126/science.abn7293.
[126] M. Cerezo, G. Verdon, H.-Y. Huang, L. Cincio, P.J. Coles, Challenges and opportunities in quantum machine learning, Nat. Comput. Sci. (ISSN: 2662-8457)
2 (99) (2022) 567–576, http://dx.doi.org/10.1038/s43588-022-00311-3.
[127] B.E. Boser, et al., A training algorithm for optimal margin classifiers, in: Proceedings of the fifth annual workshop on Computational learning theory,
1992.
[128] M. Schuld, N. Killoran, Quantum machine learning in feature Hilbert spaces, Phys. Rev. Lett. (ISSN: 1079-7114) 122 (4) (2019) http://dx.doi.org/10.
1103/physrevlett.122.040504.
18
V. Belis et al. Reviews in Physics 12 (2024) 100091
[129] A. Vaiciulis, Support vector machines in analysis of top quark production, Nucl. Instrum. Methods Phys. Res. A (ISSN: 0168-9002) 502 (2) (2003)
492–494, http://dx.doi.org/10.1016/S0168-9002(03)00479-0, URL https://www.sciencedirect.com/science/article/pii/S0168900203004790. Proceedings
of the VIII International Workshop on Advanced Computing and Analysis Techniques in Physics Research.
[130] F. Sforza, V. Lippi, Support vector machine classification on a biased training set: Multi-jet background rejection at hadron colliders, Nucl. Instrum.
Methods Phys. Res. A (ISSN: 0168-9002) 722 (2013) 11–19, http://dx.doi.org/10.1016/j.nima.2013.04.046, URL https://www.sciencedirect.com/science/
article/pii/S0168900213004671.
[131] M. Sahin, D. Krücker, I.-A. Melzer-Pellmann, Performance and optimization of support vector machines in high-energy physics classification problems, Nucl.
Instrum. Methods Phys. Res. A (ISSN: 0168-9002) 838 (2016) 137–146, http://dx.doi.org/10.1016/j.nima.2016.09.017, URL https://www.sciencedirect.
com/science/article/pii/S016890021630941X.
[132] B. Schölkopf, R.C. Williamson, A. Smola, J. Shawe-Taylor, J. Platt, Support vector method for novelty detection, in: S. Solla, et al. (Eds.), Advances in
Neural Information Processing Systems, Vol. 12, MIT Press, 1999.
[133] M. Benedetti, E. Lloyd, S. Sack, M. Fiorentini, Parameterized quantum circuits as machine learning models, Quantum Sci. Technol. (ISSN: 2058-9565) 4
(4) (2019) http://dx.doi.org/10.1088/2058-9565/ab4eb5.
[134] K. Mitarai, M. Negoro, M. Kitagawa, K. Fujii, Quantum circuit learning, Phys. Rev. A 98 (3) (2018) http://dx.doi.org/10.1103/PhysRevA.98.032309.
[135] A. Pérez-Salinas, A. Cervera-Lierta, E. Gil-Fuster, J.I. Latorre, Data re-uploading for a universal quantum classifier, Quantum (ISSN: 2521-327X) 4 (2020)
226, http://dx.doi.org/10.22331/q-2020-02-06-226.
[136] M. Schuld, R. Sweke, J.J. Meyer, Effect of data encoding on the expressive power of variational quantum-machine-learning models, Phys. Rev. A 103
(3) (2021) 032430, http://dx.doi.org/10.1103/PhysRevA.103.032430.
[137] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, P.J. Coles, Cost function dependent barren plateaus in shallow parametrized quantum circuits, Nature Commun.
(ISSN: 2041-1723) 12 (11) (2021) 1791, http://dx.doi.org/10.1038/s41467-021-21728-w.
[138] S. Wang, E. Fontana, M. Cerezo, K. Sharma, A. Sone, L. Cincio, P.J. Coles, Noise-induced barren plateaus in variational quantum algorithms, Nature
Commun. (ISSN: 2041-1723) 12 (11) (2021) 6961, http://dx.doi.org/10.1038/s41467-021-27045-6.
[139] Z. Holmes, K. Sharma, M. Cerezo, P.J. Coles, Connecting ansatz expressibility to gradient magnitudes and barren plateaus, PRX Quantum 3 (1) (2022)
http://dx.doi.org/10.1103/PRXQuantum.3.010313.
[140] S. Thanasilp, S. Wang, M. Cerezo, Z. Holmes, Exponential concentration and untrainability in quantum kernel methods, 2022, arXiv:2208.11060.
[141] S. Jerbi, L.J. Fiderer, H. Poulsen Nautrup, J.M. Kübler, H.J. Briegel, V. Dunjko, Quantum machine learning beyond kernel methods, Nature Commun.
(ISSN: 2041-1723) 14 (11) (2023) 517, http://dx.doi.org/10.1038/s41467-023-36159-y.
[142] M. Schuld, Supervised quantum machine learning models are kernel methods, 2021, http://dx.doi.org/10.48550/arXiv.2101.11020, arXiv. URL http:
//arxiv.org/abs/2101.11020.
[143] Y.Y. Atas, J. Zhang, R. Lewis, A. Jahanpour, J.F. Haase, C.A. Muschik, SU(2) hadrons on a quantum computer via a variational approach, Nature Commun.
(ISSN: 2041-1723) 12 (1) (2021) 6499, http://dx.doi.org/10.1038/s41467-021-26825-4.
[144] J. Mildenberger, W. Mruczkiewicz, J.C. Halimeh, Z. Jiang, P. Hauke, Probing confinement in a Z2 lattice gauge theory on a quantum computer, 2022,
arXiv:2203.08905.
[145] L. Funcke, T. Hartung, K. Jansen, S. Kühn, Review on quantum computing for lattice field theory, 2023, arXiv:2302.00467.
[146] O. Kiss, M. Grossi, E. Kajomovitz, S. Vallecorsa, Conditional Born machine for Monte Carlo event generation, Phys. Rev. A 106 (2) (2022) http:
//dx.doi.org/10.1103/PhysRevA.106.022612, (ISSN: 2469-9926, 2469-9934).
[147] A. Delgado, K.E. Hamilton, Unsupervised quantum circuit learning in high energy physics, Phys. Rev. D 106 (9) (2022) http://dx.doi.org/10.1103/
PhysRevD.106.096006, (ISSN: 2470-0010, 2470-0029).
[148] S.Y. Chang, S. Herbert, S. Vallecorsa, E.F. Combarro, R. Duncan, Dual-parameterized quantum circuit GAN model in high energy physics, EPJ Web Conf.
(ISSN: 2100-014X) 251 (2021) http://dx.doi.org/10.1051/epjconf/202125103050.
[149] C. Bravo-Prieto, J. Baglio, M. Cè, A. Francis, D.M. Grabowska, S. Carrazza, Style-based quantum generative adversarial networks for Monte Carlo events,
Quantum (ISSN: 2521-327X) 6 (2022) 777, http://dx.doi.org/10.22331/q-2022-08-17-777.
[150] B. Nachman, D. Provasoli, W.A. de Jong, C.W. Bauer, Quantum algorithm for high energy physics simulations, Phys. Rev. Lett. 126 (2021) http:
//dx.doi.org/10.1103/PhysRevLett.126.062001, URL https://link.aps.org/doi/10.1103/PhysRevLett.126.062001.
[151] K. Bepari, S. Malik, M. Spannowsky, S. Williams, Quantum walk approach to simulating parton showers, Phys. Rev. D 106 (2022) http://dx.doi.org/10.
1103/PhysRevD.106.056002, URL https://link.aps.org/doi/10.1103/PhysRevD.106.056002.
[152] A.D. Meglio, K. Jansen, I. Tavernelli, C. Alexandrou, S. Arunachalam, C.W. Bauer, K. Borras, S. Carrazza, A. Crippa, V. Croft, R. de Putter, A. Delgado,
V. Dunjko, D.J. Egger, E. Fernandez-Combarro, E. Fuchs, L. Funcke, D. Gonzalez-Cuadra, M. Grossi, J.C. Halimeh, Z. Holmes, S. Kuhn, D. Lacroix, R.
Lewis, D. Lucchesi, M.L. Martinez, F. Meloni, A. Mezzacapo, S. Montangero, L. Nagano, V. Radescu, E.R. Ortega, A. Roggero, J. Schuhmacher, J. Seixas,
P. Silvi, P. Spentzouris, F. Tacchino, K. Temme, K. Terashi, J. Tura, C. Tuysuz, S. Vallecorsa, U.-J. Wiese, S. Yoo, J. Zhang, Quantum computing for
high-energy physics: State of the art and challenges. Summary of the QC4HEP working group, 2023, arXiv:2307.03236.
[153] C. Tüysüz, et al., Particle track reconstruction with quantum algorithms, EPJ Web Conf. (ISSN: 2100-014X) 245 (2020) 09013, http://dx.doi.org/10.
1051/epjconf/202024509013.
ˇ
[154] M. Grossi, J. Novak, B. Ker[s]evan, D. Rebuzzi, Comparing traditional and deep-learning techniques of kinematic reconstruction for polarization
discrimination in vector boson scattering, Eur. Phys. J. C (ISSN: 1434-6052) 80 (12) (2020) 1144, http://dx.doi.org/10.1140/epjc/s10052-020-08713-1.
[155] D. Magano, et al., Quantum speedup for track reconstruction in particle accelerators, Phys. Rev. D 105 (7) (2022) http://dx.doi.org/10.1103/PhysRevD.
105.076012, URL https://link.aps.org/doi/10.1103/PhysRevD.105.076012. Publisher: American Physical Society.
[156] J.M. de Lejarza, et al., Quantum clustering and jet reconstruction at the LHC, Phys. Rev. D 106 (2022) http://dx.doi.org/10.1103/PhysRevD.106.036021,
URL https://link.aps.org/doi/10.1103/PhysRevD.106.036021.
[157] P. Duckett, et al., Reconstructing charged particle track segments with a quantum-enhanced support vector machine, 2022, http://dx.doi.org/10.48550/
ARXIV.2212.07279, arXiv. arXiv:2212.07279.
[158] A. Mott, et al., Solving a Higgs optimization problem with quantum annealing for machine learning, Nature 550 (7676) (2017) 375–379, http:
//dx.doi.org/10.1038/nature24047.
[159] K. Terashi, et al., Event classification with quantum machine learning in high-energy physics, Comput. Softw. Big Sci. 5 (1) (2021) 2, http://dx.doi.org/
10.1007/s41781-020-00047-7, arXiv:2002.09935.
[160] A. Blance, M. Spannowsky, Quantum machine learning for particle physics using a variational quantum classifier, 2020, http://dx.doi.org/10.1007/
JHEP02(2021)212, arXiv. arXiv:2010.07335.
[161] V. Belis, S. Gonzalez-Castillo, C. Reissel, et al., Higgs analysis with quantum classifiers, EPJ Web Conf. 251 (2021) http://dx.doi.org/10.1051/epjconf/
202125103070.
[162] W. Guan, et al., Quantum machine learning in high energy physics, Mach. Learn.: Sci. Technol. 2 (1) (2021) http://dx.doi.org/10.1088/2632-2153/abc17d.
[163] J. Heredge, C. Hill, L. Hollenberg, M. Sevior, Quantum support vector machines for continuum suppression in B meson decays, Comput. Softw. Big Sci.
(ISSN: 2510-2044) 5 (1) (2021) 27, http://dx.doi.org/10.1007/s41781-021-00075-x.
[164] S.Y.-C. Chen, T.-C. Wei, C. Zhang, H. Yu, S. Yoo, Quantum convolutional neural networks for high energy physics data analysis, 2020, http:
//dx.doi.org/10.48550/arXiv.2012.12177, arXiv. URL http://arxiv.org/abs/2012.12177.
19
V. Belis et al. Reviews in Physics 12 (2024) 100091
[165] S.Y.-C. Chen, T.-C. Wei, C. Zhang, H. Yu, S. Yoo, Hybrid quantum-classical graph convolutional network, 2021, http://dx.doi.org/10.48550/arXiv.2101.
06189, arXiv. arXiv:2101.06189. URL http://arxiv.org/abs/2101.06189.
[166] S. Wu, et al., Application of quantum machine learning using the quantum kernel algorithm on high energy physics analysis at the LHC, Phys. Rev. Res.
3 (2021) http://dx.doi.org/10.1103/PhysRevResearch.3.033221, URL https://link.aps.org/doi/10.1103/PhysRevResearch.3.033221.
[167] S.L. Wu, et al., Application of quantum machine learning using the quantum variational classifier method to high energy physics analysis at the LHC on
IBM quantum computer simulator and hardware with 10 qubits, J. Phys. G: Nucl. Part. Phys. 48 (12) (2021) http://dx.doi.org/10.1088/1361-6471/ac1391.
[168] J. Schuhmacher, L. Boggia, V. Belis, et al., Unravelling physics beyond the standard model with classical and quantum anomaly detection, Mach. Learn.
Sci. Tech. 4 (4) (2023) 045031, http://dx.doi.org/10.1088/2632-2153/ad07f7, arXiv:2301.10787.
[169] M.C. Peixoto, N.F. Castro, M.C. Romão, M.G.J. Oliveira, I. Ochoa, Fitting a collider in a quantum computer: Tackling the challenges of quantum machine
learning for big datasets, 2023, http://dx.doi.org/10.48550/arXiv.2211.03233, arXiv. arXiv:2211.03233. URL http://arxiv.org/abs/2211.03233.
[170] K.A. Woźniak, V. Belis, E. Puljak, et al., Quantum anomaly detection in the latent space of proton collision events at the LHC, 2023, http://dx.doi.org/
10.48550/ARXIV.2301.10780, arXiv. arXiv:2301.10780.
[171] A. Gianelle, P. Koppenburg, D. Lucchesi, D. Nicotra, E. Rodrigues, L. Sestini, J. de Vries, D. Zuliani, Quantum Machine Learning for b-jet charge
identification, J. High Energy Phys. (ISSN: 1029-8479) 2022 (8) (2022) 14, http://dx.doi.org/10.1007/JHEP08(2022)014.
[172] A. Blance, M. Spannowsky, Unsupervised event classification with graphs on classical and photonic quantum computers, J. High Energy Phys. (ISSN:
1029-8479) 2021 (8) (2021) 170, http://dx.doi.org/10.1007/JHEP08(2021)170.
[173] V. Ngairangbam, et al., Anomaly detection in high-energy physics using a quantum autoencoder, Phys. Rev. D 105 (2022) 095004, http://dx.doi.org/10.
1103/PhysRevD.105.095004, URL https://link.aps.org/doi/10.1103/PhysRevD.105.095004.
[174] S. Alvi, C.W. Bauer, B. Nachman, Quantum anomaly detection for collider physics, J. High Energy Phys. (ISSN: 1029-8479) 2023 (2) (2023) 220,
http://dx.doi.org/10.1007/JHEP02(2023)220.
[175] E. Bermot, C. Zoufal, M. Grossi, J. Schuhmacher, F. Tacchino, S. Vallecorsa, I. Tavernelli, Quantum generative adversarial networks for anomaly detection
in high energy physics, 2023, http://dx.doi.org/10.48550/arXiv.2304.14439, arXiv. arXiv:2304.14439. URL http://arxiv.org/abs/2304.14439.
[176] S. Bordoni, D. Stanev, T. Santantonio, S. Giagu, Long-lived particles anomaly detection with parametrized quantum circuits, Particles 6 (1) (2023)
297–311, http://dx.doi.org/10.3390/particles6010016, arXiv:2312.04238.
[177] S. Chekanov, A new jet algorithm based on the k-means clustering for the reconstruction of heavy states from jets, Eur. Phys. J. C - Part. Fields (ISSN:
1434-6052) 47 (3) (2006) 611–616, http://dx.doi.org/10.1140/epjc/s2006-02618-3.
[178] J. Thaler, K. Van Tilburg, Maximizing boosted top identification by minimizing N-subjettiness, J. High Energy Phys. (ISSN: 1029-8479) 2012 (2) (2012)
93, http://dx.doi.org/10.1007/JHEP02(2012)093.
[179] I.W. Stewart, F.J. Tackmann, J. Thaler, C.K. Vermilion, T.F. Wilkason, XCone: N-jettiness as an exclusive cone jet algorithm, J. High Energy Phys. (ISSN:
1029-8479) 2015 (11) (2015) 72, http://dx.doi.org/10.1007/JHEP11(2015)072.
[180] J.Y. Araz, M. Spannowsky, Quantum-probabilistic Hamiltonian learning for generative modelling & anomaly detection, 2023, arXiv:2211.03803.
[181] J. Romero, J.P. Olson, A. Aspuru-Guzik, Quantum autoencoders for efficient compression of quantum data, Quantum Sci. Technol. (ISSN: 2058-9565) 2
(4) (2017) 045001, http://dx.doi.org/10.1088/2058-9565/aa8072, arXiv:1612.02806.
[182] I. Cong, et al., Quantum convolutional neural networks, Nat. Phys. 15 (2019) 1273–1278, http://dx.doi.org/10.1038/s41567-019-0648-8, (ISSN:
1745-2473, 1745-2481). URL http://www.nature.com/articles/s41567-019-0648-8.
[183] Q.T. Nguyen, L. Schatzki, P. Braccia, M. Ragone, P.J. Coles, F. Sauvage, M. Larocca, M. Cerezo, Theory for equivariant quantum neural networks, 2022,
arXiv:2210.08566.
[184] G. Verdon, T. McCourt, E. Luzhnica, V. Singh, S. Leichenauer, J. Hidary, Quantum graph neural networks, 2019, arXiv:1909.12264.
[185] J. Bowles, V.J. Wright, M. Farkas, N. Killoran, M. Schuld, Contextuality and inductive bias in quantum machine learning, 2023, http://dx.doi.org/10.
48550/arXiv.2302.01365, arXiv. arXiv:2302.01365 [quant-ph]. URL http://arxiv.org/abs/2302.01365.
√
[186] CMS Collaboration, Measurement of the top quark polarization and t̄t spin correlations using dilepton final states in proton-proton collisions at 𝑠 = 13
TeV, Phys. Rev. D 100 (7) (2019) http://dx.doi.org/10.1103/PhysRevD.100.072002, (ISSN: 2470-0010, 2470-0029).
[187] G. Aad, et al., ATLAS Collaboration, Observation of quantum entanglement in top-quark pairs using the ATLAS detector, 2023, arXiv:2311.07288.
[188] A. Cervera-Lierta, J.I. Latorre, J. Rojo, L. Rottoli, Maximal entanglement in high energy physics, SciPost Phys. (ISSN: 2542-4653) 3 (5) (2017) 036,
http://dx.doi.org/10.21468/SciPostPhys.3.5.036.
[189] C. Severi, C.D.E. Boschi, F. Maltoni, M. Sioli, Quantum tops at the LHC: from entanglement to Bell inequalities, Eur. Phys. J. C (ISSN: 1434-6052) 82
(4) (2022) 285, http://dx.doi.org/10.1140/epjc/s10052-022-10245-9.
[190] M. Fabbrichesi, R. Floreanini, E. Gabrielli, L. Marzola, Bell inequalities and quantum entanglement in weak gauge bosons production at the LHC and
future colliders, 2023, http://dx.doi.org/10.48550/arXiv.2302.00683, arXiv. arXiv:2302.00683. URL http://arxiv.org/abs/2302.00683.
[191] M. Fabbrichesi, R. Floreanini, G. Panizzo, Testing bell inequalities at the LHC with top-quark pairs, Phys. Rev. Lett. 127 (16) (2021) http://dx.doi.org/
10.1103/PhysRevLett.127.161801.
[192] Y. Afik, J.R.M.d. Nova, Quantum information with top quarks in QCD, Quantum 6 (2022) 820, http://dx.doi.org/10.22331/q-2022-09-29-820.
[193] D. Ghosh, R. Sharma, Bell violation in 2 → 2 scattering in photon, gluon and graviton EFTs, 2023, http://dx.doi.org/10.48550/arXiv.2303.03375, arXiv.
arXiv:2303.03375. URL http://arxiv.org/abs/2303.03375.
[194] The LHC Olympics challenges, 2023, https://lhco2020.github.io/homepage/. (Accessed 30 September 2023).
[195] M. Jiang, C. Hou, A. Zheng, X. Hu, S. Han, H. Huang, X. He, P.S. Yu, Y. Zhao, Weakly supervised anomaly detection: A survey, 2023, arXiv:2302.04549.
20