Practical Hadron Collider Physics
Practical Hadron Collider Physics
Christopher White
School of Physical and Chemical Sciences, Queen Mary University of London,
London, UK
Martin White
School of Physical Sciences, University of Adelaide, Adelaide, Australia
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system
or transmitted in any form or by any means, electronic, mechanical, photocopying, recording
or otherwise, without the prior permission of the publisher, or as expressly permitted by law or
under terms agreed with the appropriate rights organization. Multiple copying is permitted in
accordance with the terms of licences issued by the Copyright Licensing Agency, the Copyright
Clearance Centre and other reproduction rights organizations.
Permission to make use of IOP Publishing content other than as set out above may be sought
at [email protected].
Andy Buckley, Christopher White and Martin White have asserted their right to be identified as
the authors of this work in accordance with sections 77 and 78 of the Copyright, Designs and
Patents Act 1988.
DOI 10.1088/978-0-7503-2444-1
Version: 20211201
IOP ebooks
British Library Cataloguing-in-Publication Data: A catalogue record for this book is available
from the British Library.
US Office: IOP Publishing, Inc., 190 North Independence Mall West, Suite 601, Philadelphia,
PA 19106, USA
Preface xii
Author biographies xv
1 Introduction 1-1
1.1 Types of collider 1-4
1.2 Relativistic kinematics 1-5
1.3 Events, cross-sections and luminosity 1-8
1.4 Differential cross-sections 1-13
1.5 Particle detectors 1-14
Further reading 1-19
Exercises 1-19
v
Practical Collider Physics
vi
Practical Collider Physics
vii
Practical Collider Physics
viii
Practical Collider Physics
ix
Practical Collider Physics
x
Practical Collider Physics
13 Outlook 13-1
13.1 The future of the LHC 13-1
13.2 Beyond the LHC 13-3
xi
Preface
The life of a particle physicist is, in many ways, a charmed one. Students entering the
field rapidly discover an amazing world of international collaboration and jet-
setting, plus fascinating physics problems that are at the frontier of human under-
standing. Furthermore, particle physics is extraordinarily broad in the range of
subjects and techniques it encompasses: from quantum field theory to statistics, to
computational methods and ‘big data’ processing, to detector development, oper-
ations, and engineering. This makes the field exciting for students with a wide range
of computational and mathematical backgrounds.
Nevertheless, there is one time of year that every supervisor of graduate students
dreads, and that is the arrival of new students. This has nothing to do with the
students themselves, and everything to do with the fact that much of what they need
to know is not written down anywhere. Despite a mountain of excellent books on
quantum field theory, statistics, and experimental particle physics, there are no
books that summarise the many pieces of received wisdom that are necessary to
actually do something with hadron collider data. For the supervisors, this entails
having the same conversations every year in an increasingly weary tone. For the
poor arriving students, this makes the journey into particle physics considerably
more stressful than it has to be, as they are left to piece together the toolkit they need
to get them through their day job, whilst suffering from the false impression that
everyone else is too clever for them.
The purpose of this book is to summarise the practical methods and tools of high-
energy collider physics in a convenient form, covering the minimum amount of
material that one needs to know about both theoretical and experimental methods
for hadron colliders. Although the book is divided into a mostly theoretical part,
followed by a mostly experimental part, we strongly encourage theorists and
experimentalists to read both parts to gain a complete understanding of how hadron
collider physics works in practice. It is impossible to properly interpret Large
Hadron Collider (LHC) data without a great deal of experimental knowledge, and
neither is it possible for experimentalists to properly complete their work without
knowing a reasonable amount of theory. In essence, our book digests the normal
supervisor–student mentoring conversations into a convenient form, so that students
can hit the ground running as they start a graduate project in the field. As a reflection
of this role, we have also tried to make this a practical manual, focused on concepts
and strategies needed to make progress on concrete research tasks, and—as far as we
can manage—with a minimum of formality. For supervisors, we hope that the book
will provide a handy reference for those senior moments when they cannot quite
remember how something works.
The first part of the book mostly covers theoretical topics. Chapter 1 provides a
brief, non-technical review of most of the concepts that we treat in detail later in the
book, including the particles of the Standard Model, the different types of particle
collider, relativistic kinematics and various concepts that are important to
xii
Practical Collider Physics
xiii
Practical Collider Physics
and II as required. Martin is acknowledged for devising and kickstarting the project,
organising the writing schedule and leading the administrative work. Chris and
Martin wish to thank Andy for his amazing additional work on the typesetting of the
draft, plus his wonderful hand-drawn illustrations. We further thank Jan Conrad,
Christoph Englert, Chris Gutschow, Thomas Kabelitz, Zach Marshall, Josh
McFayden, Knut Morå, and Chris Pollard for their excellent feedback on the draft,
and give special thanks to Jack Araz for his illustrative implementation of top-quark
reconstruction performance.
Finally, we express our immense gratitude to our partners—Jo, Michael and
Catherine—and our children Alec, Edith, Toby, Alfie and Henry. Their enduring
love, support, and sanity were vital as we completed the book, large parts of which
were written during the bewildering events of 2020–21. Although this book has been
declared boring by a six year old, we hope that both commencing graduate students
and seasoned practitioners will find much to enjoy in it.
xiv
Author biographies
Andy Buckley
Andy Buckley is a Senior Lecturer in particle physics at the
University of Glasgow, Scotland. He began his career in collider
physics as a PhD student at the University of Cambridge,
developing Cerenkov-ring reconstruction and CP-violation data-
analyses for the LHCb experiment. Later he worked on MC event
generation, MC tuning, and data-preservation as a postdoctoral
researcher at the Institute for Particle Physics Phenomenology in
Durham, UK, before resuming experimental physics work on the
ATLAS experiment at University College London, the University of Edinburgh,
CERN, and finally Glasgow. His work on ATLAS has included responsibility for
MC event modelling, from detector simulation to event generation; measurements of
soft-QCD, b-jets and top-quark physics; and searches for the Higgs boson and new-
physics H → bb¯ decay channels. Outside ATLAS, he is leader of the Rivet and
LHAPDF projects, a work-package leader in the MCnet research & training
network, and active in BSM phenomenology studies via the Contur, TopFitter,
and GAMBIT collaborations.
Christopher White
Christopher White is a Reader in Theoretical Physics at Queen Mary
University of London. He obtained his PhD at the University of
Cambridge, specialising on high-energy corrections to the structure
of the proton. He then moved to Nikhef, Amsterdam (the Dutch
National Centre for Nuclear and High Energy Physics), where he
broadened his research into hadron collider physics, including
Monte Carlo simulation, the description of low momentum (‘soft’)
radiation in QCD, and various aspects of top-quark and Higgs
physics. Following further appointments at the Universities of Durham and
Glasgow, Chris moved to his current position, where he has continued his
collider-physics research alongside more formal work looking at relationships
between theories like QCD, and (quantum) gravity.
xv
Practical Collider Physics
Martin White
Martin White is a Professor in particle astrophysics, and Deputy
Dean of Research in the Faculty of Sciences at the University of
Adelaide, Australia. He obtained a PhD in high energy physics at
the University of Cambridge, working as a member of the ATLAS
experiment with interests in silicon-detector physics, supersymme-
try searches, and supersymmetry phenomenology. He then moved
to the University of Melbourne before arriving at Adelaide in 2013
to start a particle astrophysics group. Within the ATLAS collab-
oration, he spent several years performing tests of the silicon tracker before and after
installation, whilst also searching for supersymmetric particles. Most recently, he
developed a new theoretical framework for modelling resonance searches in the
diphoton final state. Outside of ATLAS, he is Deputy Leader of the GAMBIT
collaboration, a team of 70 international researchers who perform global statistical
fits of new physics models, and is also a project leader in the DarkMachines
collaboration, which seeks to find new applications of machine learning techniques
in dark-matter research. He has broad interests in particle astrophysics phenomen-
ology and data-science, including the development of new data-analysis techniques
for collider and dark-matter search experiments, and studies of a wide range of
Standard Model extensions. He is the author of the Physics World Discovery book
What’s Next for Particle Physics?
xvi
Part I
Theory and methods for hadron colliders
IOP Publishing
Chapter 1
Introduction
The nature and fate of our universe have fascinated humankind for millenia, and we
remain fascinated by the big questions underlying our existence: why is there
something rather than nothing? How did the Universe get here, and how will it end?
What are the basic building blocks of Nature, that operate at the smallest possible
distance scales? Nowadays, such questions fall within the remit of fundamental
physics. In particular, particle physics deals with the basic constituents of matter and
their fundamental interactions, and forms part of the larger framework of high-
energy physics, which may include other exotic objects such as strings or branes.
These subjects overlap with astrophysics and cosmology, which describe the
Universe at its largest scales, since we know that the Universe expanded outwards
from a finite time in the past (the Big Bang). As we turn the clock back, the Universe
gets smaller and hotter, such that high-energy physics processes become relevant at
early times, whose imprints can be measured today.
Thousands of years of scientific research have culminated in our current under-
standing: that the Universe contains matter, that is acted on by forces. All forces that
we observe are believed to be a consequence of four fundamental forces: electro-
magnetism, the strong nuclear force (that holds protons and neutrons together), the
weak nuclear force (responsible for certain kinds of nuclear decay), and gravity. For
the first three forces, we know how to include the effects of special relativity and
quantum mechanics. The consistent theory that combines these two ideas is quantum
field theory (QFT). For gravity, our best description is general relativity (GR). It is
not yet known whether or not a sensible quantum theory of this force exists, but
there are hints that it is needed to describe extreme regions of the Universe where
classical GR breaks down, such as the centre of black holes, or the Big Bang itself. In
particle physics, we can safely neglect gravity for the most part, as it will be much
too weak to be noticed.
The matter particles can be divided into two types, First, there are the leptons,
that feel the electromagnetic and weak forces. These are the electron e−, together
Table 1.1. The matter particle content of the Standard Model, where the mass units are explained in the text.
All particles have spin 1/2.
Quarks Leptons
with its heavier partners (the muon μ− and tauon τ −). Associated with each of these
is a corresponding neutrino νi , whose name charmingly means ‘little neutral one’ in
Italian. Next, there are the quarks, which feel the electromagnetic, strong and weak
forces. These have odd historical names: up (u), down (d ), charm (c), strange (s), top
(t) and bottom (b). We can arrange all of these particles into three generations, where
the particles in higher generations have exactly the same quantum numbers as the
lower generations, apart from the mass. We summarise all masses and charges for
the quarks and leptons in table 1.1. Each of them also has a corresponding anti-
particle: this has the same mass, but all other quantum numbers (e.g. charge)
reversed.
As we will see later on, quarks carry a type of charge (not related to electro-
magnetic charge) called colour. Furthermore, free quarks are never observed—they
are instead confined into composite particles called hadrons, whose net colour charge
is zero. We may further subdivide hadrons into baryons, which contain three
quarks1, and mesons, containing a quark/anti-quark pair. Examples of baryons
include the proton p(uud ) and neutron n(udd ). For mesonic examples, you may have
heard of the pions π +(ud¯ ), π 0(a quantum superposition of uu¯ and dd¯ ) and π −(du¯ ).
The basic idea of QFT is that fields are described by equations that can have
wave-like solutions. The quantum aspect tells us that these waves cannot have
continuous energy, but instead travel in distinct particle-like ‘quanta’. Put more
simply, forces are themselves carried by particles2. When matter particles interact
due to a given fundamental force, they do so by exchanging the appropriate force-
carrying particle. The carrier of the electromagnetic force is the photon, usually
written γ (n.b. ‘gamma rays’ are a certain kind of high-energy electromagnetic wave).
The weak force turns out to be carried by three particles: the W −, W + and Z0 (‘W and
1
One can also have anti-baryons, containing three anti-quarks.
2
Likewise, matter particles also arise as quanta of fields, which are no more or less fundamental than the fields
corresponding to the forces.
1-2
Practical Collider Physics
Table 1.2. The force−particle content of the Standard Model, where all particles have spin 1.
Electromagnetism γ (photon) 0 0
Strong g (gluon) 0 0
Weak W ±, Z0 80.4 (W bosons), 91.2 (Z boson) + 1, − 1, 0
Z bosons’), where the superscript denotes the electromagnetic charge. The W + and
W − are mutual antiparticles, whereas the γ and Z0 are their own antiparticle. The
strong force is carried by the gluon3. Finally, if a quantum theory of gravity exists,
we would call the appropriate force carrier the graviton. The masses of the force
carriers are summarised in table 1.2. In particular, we see that the carriers of the
weak force are massive, but all other force carriers are massless. In most quantum
gravity theories, the graviton is also massless, a property which is heavily con-
strained by astrophysical data.
The theory that describes the various matter (anti-)particles and all forces
(excluding gravity) is called the Standard Model of Particle Physics, or SM for
short. It is a QFT, where all matter and force ‘particles’ are described by fields. As
well as the above content, it needs one more thing. It turns out that the equations of
the SM are mathematically inconsistent unless there is an additional field called the
Higgs field. The field has a particle associated with it—the Higgs boson (H), which
was discovered as recently as 2012. We will discuss these topics in much more detail
in later Chapters. For now, however, we note that a number of open puzzles remain,
including the following non-exhaustive list:
Many theories Beyond the Standard Model (BSM) have already been proposed to
answer some of these questions, but to date there is no unambiguous signature of
any of them. Our main way of testing such theories is to use particle accelerator, or
collider experiments. The history of particle accelerators goes back many decades,
and indeed such machines were crucial in establishing the Standard Model itself.
The current flagship experiment in the world is the Large Hadron Collider (LHC)
3
So called because it ‘glues the proton together′. Yes really.
1-3
Practical Collider Physics
at CERN, near Geneva. It is the most complicated machine ever built, and
contains a number of experiments looking at various aspects of high-energy
physics. It has already discovered the Higgs boson and, at the end of its second
run, has yet to take 96% of its projected data. At the time of writing, possible new
facilities are being discussed around the world, with a view to a new facility
coming online after the next decade or so. It is thus a good time to be a collider
physicist!
The aim of this book is to examine the theory that is used at modern particle
colliders, and to describe how modern experimental analyses are actually carried
out. This is a vast subject, and thus we will restrict our attention to the two general
physics detector (GPD) experiments at the LHC, namely ATLAS and CMS. We will
not focus so much on heavy-ion physics experiments (e.g. ALICE), flavour physics
(e.g. LHCb) or neutrino physics, which are very exciting facilities, but are distinct in
many respects and are described in detail elsewhere.
(a) (b)
1-4
Practical Collider Physics
there is little incentive to build a colliding beam experiment unless you really have to.
The great disadvantage of a fixed-target experiment, however, is that most of the
energy in the initial state goes into kinetic energy in the final state, and thus is not
available for making new particles. Thus, colliders designed to discover heavy new
particles have to involve colliding beams, in which all of the energy (in principle) is
available for new particle creation.
Colliders can be linear (e.g. SLAC, mentioned above), or circular (e.g. HERA,
the Tevatron, the LHC). In the latter case, powerful magnets are used to deflect the
beams: at the LHC, for example, the so-called ‘dipole magnets’ used for this purpose
weigh a couple of tons each4! The advantage of circular colliders is that the particle
can go round the circle multiple times, getting faster each time. Thus, one can get
away with less powerful electric fields to accelerate them. The main disadvantage of
circular colliders is that charged particles undergoing circular trajectories emit
synchotron radiation, and lose energy at a rate
dE
∝ m−4r −2 ,
dt
where m is the mass of the particle, and r the radius of the circle. Because this effect
involves a high power of inverse mass, the effect is much worse for light particles,
such as electrons and positrons. For this reason, modern (anti)-proton colliders are
circular, but future e ± colliders are more likely to be linear (although circular e ±
collider proposals are currently on the table). Note that the above formula suggests
we can mitigate the effects of synchotron radiation by having a very large radius.
This is why the circumference of the LHC (which occupies the same tunnel as the
former e ± collider LEP) is 27 km!
In this book, we will not worry about how beams are produced and manipulated.
This belongs to the field of accelerator physics, which would easily fill an entire tome by
itself. Rather, we will only care about what particles each beam contains. Given that
these particles will be moving very fast, we will need to describe them using the
appropriate language of special relativity. This is the subject of the following Section.
4
If you visit CERN, you can see a spare dipole magnet on the grass outside the main cafeteria. The size is such
that you will often see visiting children (and sometimes grown adults) climbing on it.
1-5
Practical Collider Physics
in which we have adopted Cartesian coordinates for the spatial components. Here μ
is an index that goes from 0 to 3, where 0 labels the time component, i.e.
x 0 = ct , x1 = x , x 2 = y, x3 = z.
Given these components, we can also define the related components
xμ = (ct , −x , −y , −z ). (1.2)
That is, a four-vector with a lower index has its spatial components flipped with
respect to the corresponding four-vector with an upper index. This is just a definition,
but is convenient when we talk about combining four-vectors to make dot products
etc. Note that the vector x μ describes the location of an ‘event’ in four-dimensional
spacetime, also known as ‘Minkowski space’. That this is the right language to use
essentially follows from the fact that Lorentz transformations in special relativity
mix up space and time, so that it no longer makes sense to separate them, as we do in
Newtonian physics.
Given the position four-vector, we can define the four-momentum of a particle
dx μ
pμ ≡ m , (1.3)
dτ
where x μ is the position of the particle in spacetime, m its rest mass, and τ its proper
time, namely the time that the particle experiences when at rest. If the particle is
moving, the proper time is dilated according to the formula:
⎛ v2 ⎞
−1/2
t = γτ , γ = ⎜1 − 2 ⎟ , (1.4)
⎝ c ⎠
where v = ∣v∣ is the magnitude of the three-velocity of the particle. This allows us to
write
dx μ
p μ = mγ = (γmc , γmv), (1.5)
dt
which is usually written instead as
⎛E ⎞
pμ = ⎜ , p ⎟, (1.6)
⎝c ⎠
1-6
Practical Collider Physics
where on the right-hand side we have used the component notation. The definition
of equation (1.2) (which generalises to any four-vector) allows us to write the dot
product neatly as
a · b = a μbμ ≡ b μaμ, (1.9)
where we have adopted Einstein’s famous summation convention, in which repeated
3
indices are summed over (i.e. there is an implicit ∑ μ=0 in equation (1.9)). Note that the
dot product vector is not the same as the usual dot product for three-dimensional
Euclidean space. It includes timelike and spacelike components for a start, and also has
a relative minus sign between the timelike and spacelike terms. One may then prove that
the dot product of any two four-vectors is invariant under Lorentz transformations. In
particular, this means that any dot product of four-momenta can be evaluated in any
Lorentz frame we like, as the answer will always be the same. Following convention, we
will often write the dot product of 5 a four-vector with itself as
a 2 ≡ a · a = (a 0)2 − (a1)2 − (a 2 )2 − (a 3)2 . (1.10)
From equation (1.6), we find (in a general frame where E, ∣p∣ ≠ 0)
E2
p2 = − ∣p∣2 . (1.11)
c2
In the rest frame of a massive particle, however, we have
p μ = (mc , 0) ⇒ p 2 = m2c 2 , (1.12)
so that combining equations (1.11)) and (1.12) yields the energy–momentum relation
E 2 − c 2∣p∣2 = m 2c 4. (1.13)
From now on, we will adopt natural units, which are ubiquitous throughout
relativistic quantum field theory. This involves ignoring factors of ℏ and c, which is
often stated by saying that one sets ℏ = c = ϵ0 = μ0 = 1, where the latter two
quantities are the permittivity and permeability of free space, respectively. This
simplifies virtually all equations we will come across in this book. However, natural
units can also be horribly confusing if you actually have to convert to numbers in SI
units. The rule is: given any quantity in natural units, put in the right factors of ℏ and
c to get the dimensions right for the quantity you are converting to. One can often
check which factors are needed by appealing to example equations relating the
quantities of interest. For now, note that equation (1.13) simply becomes
E 2 − ∣p∣2 = m 2 . (1.14)
Note that one might encounter numerical instabilities in the calculation of m when
∣p∣ ≫ m; we provide some helpful tips for avoiding this issue in appendix A.2.
5
The notation here is unfortunately ambiguous, as a2 looks like the square of a number. Whether or not we are
talking about the square of a number or a four-vector will hopefully always be clear from the context.
1-7
Practical Collider Physics
Another quantity we will use throughout is the invariant mass of a set of particles
with four-momenta {pi , pj , …pl } (i.e. where indices label particle numbers):
From equation (1.14), we see that mass and momentum both have dimensions of
energy in natural units. Also, the formula for the de Broglie wavelength
hc
λ=
p
implies that length has dimensions of (energy)−1. We can thus choose dimensions of
energy to express any quantity we encounter. It is conventional in high-energy physics to
measure energy in electron volts (eV), where the definition of 1 eV is the energy gained by
an electron moving through a potential difference of 1 volt. In SI units, 1eV amounts to
1.6 × 10−19 J. For particle physics, 1eV is a pathetic amount of energy6. For example, the
mass of the electron (a very light particle in the scheme of things) is 0.5 MeV , and the mass
of the proton is around 1 GeV . The current energy of an LHC beam is 6.5 TeV!
To make the first of these more formal, consider a purely classical situation in which
you throw tennis balls at a target, as depicted in Figure 1.2(a).
6
To give some context, the energy levels of an atom are typically separated by a few eV.
1-8
Practical Collider Physics
(a) (b)
Figure 1.2. (a) Throwing a tennis ball at a classical target; (b) one quantum mechanical particle scattering on
another.
The probability for the tennis ball to hit the target is clearly proportional to the
cross-sectional area of the latter. We want to talk instead about the collision of two
beams made of quantum mechanical ‘particles’. The details of this are lot more
complicated and fuzzy than the classical collisions we are familiar with from
everyday life. However, we can simplify this in our minds by considering something
like Figure 1.2(b), where we identify the particle on the left as being the incident
particle, and the particle on the right as being the target. There is no well-defined
sharp edge for the target particle as there is in the tennis ball example. Instead, there
is some region of space such that, if the incident particle enters it, it will interact with
the target particle and thus possibly be deflected. This will proceed via the exchange
of force-carrying particles. Such a situation corresponds to an elastic collision, in
which the two scattering particles remain intact. Another possibility is an inelastic
collision, in which the particles may break up, or annihilate each other. In this case
too, however, the incident particle must enter some region around the target particle
in order to interact.
It follows from the above discussion that each target particle has an effective
cross-sectional area for interaction. More formally, the scattering probability is
related to some quantity σ, with dimensions of area (or (energy)−2 in natural units).
It is called the scattering cross-section, and depends on the properties of both
colliding particles (e.g. the probability may depend on the charges of both
particles, or their masses, etc). The rules of QFT tell us, at least in principle,
how to calculate the cross-section for a given scattering process, and we will see
how to do this in detail later in the book. For now, we can simply note that if N is
the total number of scattering events (i.e. excluding events where the particles do
not interact), the event rate, which is directly related to the interaction probability,
must satisfy
dN dN
∝σ ⇒ = L(t )σ , (1.16)
dt dt
which defines the instantaneous luminosity L(t ). This depends on the parameters of
the colliding beams, and thus can depend on time in general. Given that N is
dimensionless, L has dimensions of (area)−1(time)−1. We see also that if L increases,
then the event rate increases. Thus, L measures the ‘brightness’ of the beams in some
sense. To quantify this further, consider a beam of particles of type a, incident on a
1-9
Practical Collider Physics
va vb
A
a b
Figure 1.3. A beam of a-type particles incident on a target beam of b-type particles.
target beam of particles of type b, as depicted in Figure 1.3. Let each beam i have
cross-sectional area A and uniform speed vi, with ni particles per unit volume. In
time δt , each a-type particle moves a distance vaδt to the right, and each b-type
particle moves a distance vbδt to the left. Thus, there is a volume
V = (va + vb)Aδt (1.17)
in which the particles encounter each other, in time δt . The total number of b
particles in this volume is
Nb = nbV = nb (va + vb)Aδt , (1.18)
and the total probability of interaction is given by the fraction of the total cross-
sectional area taken up by the cross-sections of each b-type particle:7
σ
Nb .
A
This is the probability that a single a-type particle interacts, and thus the total
interaction probability is given by multiplying this by the number of a-type particles
in the volume V, Na. Furthermore, the probability will be equal to the number of
expected events in time δt . This follows from the fact that processes with a constant
probability per unit time are described by a Poisson distribution, where the mean
number of events per unit time is equal to the probability per unit time, as we discuss
in more detail in Chapter 5. Provided we take δt small enough, we can treat the
probability per unit time as constant. We thus have that the expected number of
events in time δt is given by
NaNb σ
δN =
A (1.19)
= Nb na(va + vb)σδt ,
7
Here we are assuming that the beams are sufficiently diffuse that the cross-sections associated with each
particle do not overlap, i.e. the particles scatter incoherently.
1-10
Practical Collider Physics
as the flux of incident particles (i.e. the number per unit time, per unit area). To see
this, note that the number of a-type particles is given by
Na = naV = na(va + vb)Aδt
and thus
Na
Fa = ,
Aδt
consistent with the above interpretation. Then, we can rearrange equation (1.19) to
find that the cross-section is given by
1 δN
σ= . (1.21)
NbFa δt
In words: the cross-section is the event rate per unit target particle, per unit flux of
incident particle. This is a very useful interpretation, and also starts to tell us how we
can calculate cross-sections in practice. Note that we considered the b-type particles
as targets above. We could just as well have chosen the a-type particles, and thus
interpreted the cross-section as being the event rate per unit a-type particle, per unit
flux of b-type particle. The denominator factor in equation (1.21) is symmetric under
interchanging a and b (as can be seen from the above derivation), and thus it does
not matter which interpretation we choose.
From the definition of luminosity (equation (1.16)), we find that the number of
events in some small time δt is
δN = Lσδt,
and equation (1.19) then implies
NaNb
L dt = . (1.22)
A
Thus, the luminosity is proportional to the number of particles in each beam, and is
inversely proportional to the beam area. This makes sense given that we expected L
to somehow measure the ‘brightness’ of each beam: a brighter beam has more
particles in it, or has the particles concentrated in a smaller cross-sectional area.
The above discussion is clearly very schematic and simplified. However, it is a lot more
correct than you might think, when applied to real beams. At the LHC, for example,
protons collide in bunches with frequency f. If the beams are taken to have a Gaussian
profile with width and height σx and σy respectively, then the luminosity is given by
fN1 N2Nb
L= , (1.23)
4πσxσy
where Ni is the number of protons per bunch in beam i, and Nb is the number of
bunches. The denominator is a measure of the area of each beam, and thus equation
(1.23) looks a lot like equation (1.22).
1-11
Practical Collider Physics
A given type of event i (e.g. top-pair, Higgs…) has its own associated cross-
section σi , such that the total cross-section is given by the sum over each (mutually
exclusive) event type:
σ= ∑ σi . (1.24)
i
It is perhaps not clear how to think about each σi as representing an area. However,
we do not in fact have to do this: following equation (1.16), we can define each
partial cross-Section in terms of the event rate for each process i, i.e.
1 dNi
σi = . (1.25)
L dt
Ultimately, the information content of this equation is that the event rate for process
i factorises into a part which depends only on the nature of the colliding particles
(the cross-section), and a part which depends on the structure of each beam (the
luminosity). That the cross-section happens to have units of area is interesting, but
we do not have to actually think about what this area means.
Given how complicated real beams are, the luminosity is never calculated from
first principles. Rather, it can be measured experimentally. Theorists never have to
worry about any of this: experimentalists usually present results for cross-sections
directly, for comparison with theory. For bizarre historical reasons8 the conven-
tional unit of cross-section is the barn, where 1 b ≡ 10−24 cm2 = 10−28 m2. For most
colliders, this is a stupendously large cross-section. For example, the total cross-
section at the LHC (which includes the vast number of events in which the protons
remain intact and don’t do anything particularly interesting) is about 0.1b.
Luminosity gives us a convenient way to talk about how much data a collider has
taken. From equation (1.16), we may define
N = σL, (1.26)
where
T
L= ∫0 L dt (1.27)
is the integrated luminosity. This is a measure of how much potential data has been
taken in time T, as the luminosity in some sense measures the possible number of
particle collisions. Conventionally, L is quoted in units of inverse femtobarns (fb-1)—as
we shall see, this odd formulation in which smaller SI prefixes correspond to larger
amounts of integrated luminosity combines conveniently with the measurement of
process cross-sections in (femto)barns to allow easy estimation of event counts.
8
During the Manhattan project which created the first atomic bomb, American physicists chose the code word
‘barn’ to represent the approximate cross-sectional area presented by a typical nucleus, in order to obscure the
fact that they were working on nuclear structure. A barn was considered to be a large target when using
particle accelerators that had to hit nuclei.
1-12
Practical Collider Physics
No. events
0
Figure 1.4. Schematic depiction of a differential cross-section, for the angle between two top quarks.
9
We will more formally introduce concepts from statistics and probability in Chapter 5.
1-13
Practical Collider Physics
∫ dO σ1i ddOσi = 1.
Beam Beam
1-14
Practical Collider Physics
ECAL
Tracking
detector
HCAL
thus give a pattern of dots which can be joined up to make a track. A magnetic field
is applied, and the curvature of each particle track then provides information on the
momentum of the particle. The calorimeters measure energy, and by comparing
deposits in the calorimeters with tracks, we can unambiguously identify most
particles. Note that almost all particles that pass through a given layer would leave
some deposit there, even if the layer is not designed to measure them. Thus, a muon
(as well as being seen in the tracker) would typically leave a small energy deposit in
the ECAL and HCAL, even though it is in the inner tracker and muon detectors that
are designed to measure the muon properties. The exceptions to this rule amongst
Standard Model particles are the (anti)-neutrinos, which are so weakly interacting
that they pass out of the detector completely without being seen. This gives rise to a
missing four-momentum, with both energy and three-momentum components. The
detector will also miss particles that pass down the beampipe, or through cracks/
gaps in the detector. Experimentalists typically have to understand the geometry of
their detector extremely well, to be able to correct for such effects where necessary.
How can we classify a given scattering event? Again from a highly simplified
theorist’s point of view, we can think of there being a list of particles (hadrons,
leptons, photons etc), each with a four-momentum as measured by the detector, plus
an additional four-vector representing the missing momentum. We need to choose a
coordinate system in which to talk about the four-momenta, and it is conventional
to choose this such that the incoming beams are in the +z and −z directions. The
(x , y ) plane is then transverse to the beam axis, as shown in Figure 1.7. It is then
convenient to split the three-momentum of a particle p = (px , py , pz ) into a two-
dimensional momentum transverse to the beam axis:
and the longitudinal component pz. Note that the notation for the transverse
momentum p T is the same as if this were a three-vector rather than a two-vector.
Furthermore, people often say ‘transverse momentum’ to mean the magnitude of this
two-vector, rather than the vector itself. What is being talked about will usually be
clear from the context.
1-15
Practical Collider Physics
z
y (beam axis)
Figure 1.7. Coordinate system conventionally used for colliding beam experiments.
p p
gluon quark
Figure 1.8. Collision of a gluon and quark, which emerge from two colliding protons.
1-16
Practical Collider Physics
where γ = (1 − v2 )−1/2 , and β = v in natural units, with v the velocity associated with
the boost. We see that px and py do not change, and thus neither does p T . However,
pz does change (it mixes with the energy E), and thus it is convenient to replace it by
another variable. To this end, it is common to define the rapidity
1 ⎛ E + pz ⎞
y= ln ⎜ ⎟. (1.32)
2 ⎝ E − pz ⎠
y=0
y<0 y>0
y=−
Figure 1.9. Various detector directions, and how they correspond to rapidity.
1-17
Practical Collider Physics
where θ is the angle between the particle direction and the beam axis, as shown in
Figure 1.9. Equation (1.32) then becomes
1 ⎛ E (1 + cos θ ) ⎞ 1 ⎛ 1 + cos θ ⎞
y≃ ln ⎜ ⎟ = ln ⎜ ⎟, (1.33)
2 ⎝ E (1 − cos θ ) ⎠ 2 ⎝ 1 − cos θ ⎠
ΔR = Δy 2 + Δϕ 2 , (1.36)
y=− y=0 y=
(b)
(a)
Figure 1.10. (a) Each particle direction cuts a cylindrical surface surrounding the beampipe at a single point;
(b) we can unwrap this cylinder and use coordinates (y, ϕ ) in the resulting plane.
1-18
Practical Collider Physics
which is dimensionless (i.e. both y and ϕ are dimensionless). This distance will be
useful later in the book, particularly when we talk about jet physics. Note also that
you will often see this formula with η replacing y.
Further reading
Excellent introductions to particle physics at an undergraduate level can be found in
the following books:
• ‘Modern Particle Physics’, M Thomson, Cambridge University Press.
• ‘An Introduction to the Standard Model of Particle Physics’, D A
Greenwood and W N Cottingham, Cambridge University Press.
Exercises
1.1 For four-vectors, we draw a distinction between upstairs and downstairs
indices x μ and xμ. Why do we not usually bother with this for three-vectors
in non-relativistic physics (e.g. with components xi)?
1.2
(a) Consider a fixed-target experiment, in which a beam of particles of
mass m1 and energy E1 is incident on a stationary target of particles
of mass m2. Let p1 and p2 denote the four-momenta of the initial
state particles.
(i) If we want to make a new particle of M in the final state,
explain why the invariant mass s = (p1 + p2 )2 satisfies
s ⩾ M 2.
1-19
Practical Collider Physics
1-20
IOP Publishing
Chapter 2
Quantum field theory for hadron colliders
So far we have seen how to describe colliders, at least roughly. However, we have
not described in detail what actually happens when the particles collide with each
other. According to equations (1.16) and (1.25), this is given by the cross-section σi
for a given process, and thus we need to know how to calculate it. This involves
understanding the quantum field theory (QFT) for each force in nature as described
by the SM, as well as any possible BSM interactions that may correspond to new
physics. In particular, we need to know how to describe the strong force in great
detail, for two main reasons:
1. The LHC collides protons, made of quarks and gluons. Thus, any scattering
process involves the strong force somewhere.
2. Even at e± colliders, there is lots of quark and/or gluon radiation, given that
the strong force, as its name suggests, is the strongest force in nature. Thus,
radiation accompanying a given scattering process is typically dominated by
quarks and gluons, and to a lesser extent by photons.
The part of the Standard Model (SM) describing quarks and gluon is quantum
chromodynamics (QCD). It is a type of QFT called a non-abelian gauge theory, which
refers to the abstract mathematical symmetry underlying its structure. We will build
up to this complicated theory by first considering a simpler case, namely that of
quantum electrodynamics (QED). This will introduce some of the ideas necessary
for the more complicated QCD case, and we will then proceed to the latter theory,
and examine its consequences in detail. After we have done so, we will briefly explain
how similar ideas can be used in the rest of the SM, namely the combination of the
electromagnetic and weak forces to make the electroweak theory.
Our aim in this chapter is to introduce the minimal working knowledge that is
required for a good understanding of issues affecting contemporary experimental
analyses. We do not claim to be a specialist resource on quantum field theory, for
which we refer the reader to the further reading resources collected at the end of the
chapter.
1
In a slight abuse of notation, we follow convention in omitting an implicit (four-dimensional) identity matrix
in spinor space on the right-hand side.
2-2
Practical Collider Physics
Figure 2.1. Representation of the phase of the electron field at a given point in spacetime, as an arrow on the
unit circle.
(a) (b)
Figure 2.2. The red arrows correspond to the phase of the electron field throughout spacetime. The black
arrows correspond to a gauge transformation that is (a) global; (b) local.
spacetime, with one at each point. An example is shown in Figure 2.2(a), where the
red arrows might correspond to some set of phases at different points.
For constant α, the field of equation (2.3) is obtained from ψ by rotating all the
arrows by an angle α, and we are obviously allowed to do this. That is, we are always
free to redefine where on the unit circle the zero of phase is. This corresponds to
rotating the unit circle, or alternatively to keeping the circle fixed, but rotating each
arrow. This is called a global gauge transformation, where ‘global’ refers to the fact
that we change the phase by the same amount everywhere in spacetime. The word
‘gauge’ is an old-fashioned term to do with calibrating measurements, which relates
in this case to choosing where we set the zero of phase.
Global gauge symmetry is an interesting property by itself. But it turns out that
QED has a much more remarkable symmetry than this. Namely, the theory is
invariant under the transformation
ψ → e iα(x )ψ (2.5)
2-3
Practical Collider Physics
i.e. where we have now allowed the phase shift α (x ) to be different at every point in
spacetime. This corresponds to rotating the arrows at each point by different
amounts, as shown in Figure 2.2(b). By analogy with the word ‘global’, this is
called a local gauge transformation. Let us try to impose it on the electron field and
see what happens. Again we have
ψ → e iα(x )ψ ⇒ ψ¯ → e−iα(x )ψ¯ .
The Dirac Lagrangian now transforms as
We thus see that the electron theory by itself is not invariant under local gauge
transformations. However, something interesting happens if we try to patch this up.
If we look closely, the problem is that ∂μψ does not transform particularly simply:
∂μψ → ∂μ(e iαψ ) = e iα[∂μψ + i(∂μα )ψ ].
If we could instead find some modified derivative Dμ such that
Dμψ → e iα(x )Dμψ (2.7)
2-4
Practical Collider Physics
2-5
Practical Collider Physics
1
LQED = − (∂μAν − ∂νAμ )(∂ μAν − ∂ νAμ ) − ψ¯ [iγ μ∂μ − m]ψ − Aμ j μ , (2.15)
4
where we introduced
j μ = eψγ
¯ μψ . (2.16)
The equation of motion for Aμ is given by the Euler–Lagrange equation
∂L ∂L
∂α = . (2.17)
∂(∂αAβ ) ∂Aβ
It can then be shown that
∂L ∂L
= −∂ αAβ + ∂ βAα = −F αβ ; = −j β ,
∂(∂αAβ ) ∂Aβ
so that equation (2.17) becomes
∂αF αβ = j β . (2.18)
Furthermore, the definition of the field strength tensor, equation (2.12), implies the
so-called Bianchi identity
∂αFμν + ∂νFαμ + ∂μFνα = 0. (2.19)
Equations (2.18) and (2.19) constitute the known forms of the Maxwell equations in
relativistic notation. If they are new to you, then we can use them to derive the usual
form (i.e. non-relativistic notation) by writing the components of the gauge field
explicitly as
Aμ = (ϕ , A), (2.20)
where ϕ and A are the electrostatic and magnetic vector potential respectively, and
we have used natural units. The physical electric and magnetic fields are given by
∂A
E = −∇ϕ − , B = ∇ × A. (2.21)
∂t
We can also define
j μ = (ρ , j ), (2.22)
where we will interpret ρ and j in what follows. We can then find the components of
F μν . Firstly, the antisymmetry property F μν = −F νμ implies that the diagonal
components
F 00 = F ii = 0,
where i ∈ {x , y, z}, and for once we do not use the summation convention. For the
other components, recall that
2-6
Practical Collider Physics
⎛∂ ⎞ ⎛∂ ⎞
∂μ = ⎜ , ∇⎟ ⇒ ∂μ = ⎜ , −∇⎟ .
⎝ ∂t ⎠ ⎝ ∂t ⎠
Then we have
⎡ ∂A ⎤
F 0i = ∂ 0Ai − ∂ iA0 = ⎢ + ∇ϕ⎥ = −Ei = −F i 0.
⎣ ∂t ⎦i
In words: the component F 0i represents the ith component of the electric field vector.
Next, we have
F ij = ∂ iA j − ∂ jAi = −F ji .
As an example, we consider
F xy = −∂xAy + ∂xAy = −[∇ × A] z = −Bz = −F yx.
Similarly, one may verify that
F yz = −F zy = −Bx, F xz = −F zx = By.
Putting everything together, the field strength tensor has upstairs components
⎛ 0 − Ex − Ey − Ez ⎞
⎜ ⎟
⎜ Ex 0 − Bz By ⎟
F μν =⎜ . (2.23)
Ey Bz 0 − Bx ⎟
⎜ ⎟
⎝ Ez − By Bx 0 ⎠
Now let us examine the first of our supposed Maxwell equations, equation (2.18).
We can split this into equations for each value of ν. The case ν = 0 gives
∂μF μ0 = ∂ 0F 00 + ∂iF i 0 = ∇ · E = j 0 = ρ .
The case ν = x gives
∂μF μx = ∂ 0F 0x + ∂jF jx
∂E
= − x + ∂xF xx + ∂yF yx + ∂zF zx
∂t
∂Ex
=− + ∂yBz − ∂zBy
∂t
⎡ ∂E ⎤
= ⎢− + ∇ × B⎥
⎣ ∂t ⎦x
= jx .
Note that in the final two lines, the subscript x refers to the component of a three-
vector, and thus can be written upstairs or downstairs. Similarly, one finds
⎡ ∂E ⎤
∂μF μi = ⎢ − + ∇ × B ⎥ = ji
⎣ ∂t ⎦i
2-7
Practical Collider Physics
2-8
Practical Collider Physics
The above discussion suggests that, if this is true, we can introduce other types of
gauge invariance, and thus more force fields!
Indeed, such an extra arrow exists for quarks. They carry a type of conserved
charge called colour, not to be confused with electromagnetic charge. Whereas the
latter has two types ( +, −), colour charge has three types, conventionally called
(r, g, b ) (for ‘red’, ‘green’ and ‘blue’). Note that these are just labels, and have
nothing to do with actual colours. Any other arbitrary labels would have sufficed.
Quarks have spin 1/2, and thus can be described by a Dirac spinor field, as for
electrons. We can then write three such fields, one for each colour. It is then
convenient to collect these into a single vector-like object
ψi = (ψr , ψg , ψb), (2.25)
where i ∈ {r, g, b} is a colour index. The vector we have formed lives in an abstract
colour space at each point in spacetime, where an arrow in this space tells us how
much redness, greenness and blueness the (vector) quark field has at that point
(Figure 2.3).
The size or magnitude of the arrow then tells us the overall colour charge. This is
sometimes called an internal space (i.e. internal to the quark field), to avoid
confusion with actual spacetime. Similar to the phase of the electron field, we are
always free to redefine what we mean by redness, greenness and blueness. In other
words, the theory of quarks should be invariant under ‘rotations’ of the arrow in
colour space, at all points in spacetime simultaneously. This is a global gauge
transformation, directly analogous to the phase in QED. Any such rotation must act
on the quark field as
ψi → ψ ′i = Uijψj , (2.26)
r
Figure 2.3. The quark field can be thought of as carrying different components, one for each colour. This leads
to an abstract vector space at each point in spacetime, where the arrow specifies how much of each colour is
present.
2-9
Practical Collider Physics
for some numbers {Uij}. Let us now denote by Ψ the three-dimensional column
vector whose components are given in equation (2.25). We can then write equation
(2.26) as
Ψ′ = UΨ , (2.27)
where U is a constant three-by-three matrix. Given ψi ∈ , U will also be complex.
Furthermore, if colour is conserved, the size of the arrow cannot change under these
transformations. We also want to exclude reflections, which correspond to inver-
sions of one or more of the axes. Thus, we require
det(U) = 1. (2.28)
Following what we learned in QED, we can now demand local gauge invariance, by
promoting U → U(x ), corresponding to different rotations of the colour arrow at
different spacetime points. To see how to do this in the present case, though, we need
to learn a bit more about the mathematics underlying such symmetries. This is the
subject of the following section.
(i) Closure. The product of any two elements in the group is also an element of
the group:
ab ∈ G , a, b ∈ G. (2.29)
These four properties are clearly satisfied by the phase rotations we encountered
in QED. We can also see that they apply to the colour rotations acting on the quark
2-10
Practical Collider Physics
field. First, note that we may write the product of two rotations acting on a quark
field as
U1U2Ψ ,
which means ‘do rotation 2, then rotation 1’. The effect of this will be equivalent to
some overall rotation
U3 = U1U2 ,
and the determinant of the latter is given by
det(U3) = det(U1) det(U2) = 1.
That is, the product of two matrices of unit determinant itself has unit determinant,
which fulfils the closure property of the group. For associativity, it is sufficient to
note that matrix multiplication is itself associative. For property (iii), the identity
element is simply the identity matrix I . Finally, for property (iv), we can use the fact
that all (complex) rotation matrices are invertible, where the inverse is given by the
Hermitian conjugate,
U † = (U T)*.
To see this, we can first use the fact that (complex) rotations do not change the size
of the colour arrow (Ψ†Ψ ) by definition. Under a rotation this transforms as
Ψ †Ψ → Ψ′† Ψ′ = (UΨ)†(UΨ)
= ψ †(U †U)Ψ ,
which then implies
U †U = I ⇒ (U)−1 = U †,
as required. Such matrices (i.e. whose inverse is the Hermitian conjugate) are called
unitary, and we have now shown that these matrices fulfil all the properties necessary
to form a group. To summarise, it is the group associated with 3 × 3 complex unitary
matrices of unit determinant. This is conventionally called SU(3), where the ‘S’
stands for ‘special’ (unit determinant), the ‘U’ for ‘unitary’, and the number in the
brackets denotes the dimension of the matrix. Likewise, we can also give a fancy
name to the phase rotations of QED, by noting that any number of the type eiα is a
unitary 1 × 1 matrix, however daft it may feel to say so. Thus, the group associated
with QED is U(1).
In general, groups can consist of discrete or continuous transformations, or both.
For example, rotations about a single axis have a continuous parameter (an angle)
associated with them, but reflections do not, and are therefore discrete. In the
rotation example, the number of elements of the group is (continuously) infinite, but
the number of parameters needed to describe all the group elements (i.e. an angle) is
finite. Continuous groups are known as Lie groups, and the number of parameters
needed to describe all the group elements is called the dimension of the group. It is
2-11
Practical Collider Physics
not the number of group elements—this is infinite! Let us clarify this further with a
couple of examples.
1. U(1) is the group of rotations about a single axis, and there is a single
rotation angle associated with this, so that we have
dim[U(1)] = 1.
An important theorem says that any element of a Lie group G can be written in
terms of matrices called generators, where the number of generators is equal to the
dimension of the group (i.e. the number of independent degrees of freedom in each
group element). Following physics conventions, we will label the set of generators
{Ta}, where a = 1, … , dim(G ). Then the expression relating group elements U to
the generators is
U = exp [iθ a T a], (2.33)
where we have used the summation convention. Equation (2.33) is called Lie’s
theorem, and the numbers {θ a} are continuous parameters—one for each generator
—such that a given set of values uniquely corresponds to a given group element.
Note that the exponential of a matrix is defined by its Taylor expansion, where the
zeroth-order term is understood to be the identity matrix, i.e.
∞
1
U=I+ ∑ (iθ a T a)n . (2.34)
n=1
n!
There is not a unique choice for the generators Ta : all that matters is that they have
the right degrees of freedom to be able to describe every group element (e.g. they
must be linearly independent). Any change in the generators can be compensated by
a change in the definition of the continuous parameters, and vice versa.
2-12
Practical Collider Physics
The above discussion is very abstract, so let us give a concrete example, and
perhaps the simplest that we can consider. We saw above that the group U(1) has
elements that can be represented by
e iα = exp [iθ1T1], θ1 = α , T1 = 1.
On the right we see that each group element has precisely the form of equation
(2.33), where there is a single generator (which we may take to be the number 1), and
a single continuous parameter θ1, which we may identify with the parameter α. A
more complicated example is SU(3), where it follows from the above discussion that
we expect eight generators. A popular choice is the set of so-called Gell-Mann
matrices, and you will find them in the exercises. For actual calculations in QCD, we
will never need the explicit form of the generators, which stands to reason: they are
not unique, so the effect of making a particular choice must cancel out in any
physical observable.
For any given Lie group G, the generators obey an important set of consistency
relations. To see why, note that the group closure property
U1U2 = U3,
together with Lie’s theorem (equation (2.33)) implies
a a b b c c
e iθ1 T e iθ2 T = e iθ3 T , (2.35)
where {θia} is the set of continuous parameters associated with the group element Ui .
On the left-hand side, we can rewrite the factors as a single exponential, provided we
combine the exponents according to the Baker–Campbell–Hausdorff formula2
1
e XeY = e X+Y+ 2 [X, Y]+…, (2.36)
where
[X , Y] ≡ XY − YX
is the commutator of X and Y , and the neglected terms in equation (2.36) contain
higher-order nested commutators. Applying equation (2.36) to equation (2.35), we
find
⎡ 1 ⎤
exp ⎢i(θ1a + θ 2a )T a − θ1aθ 2b[T a , T b] + …⎥ = exp [iθ3c T c] . (2.37)
⎣ 2 ⎦
By taking the logarithm of both sides, the only way this equation can be true is if
the commutator on the LHS is itself a superposition of generators, such that all
higher order nested commutators also ultimately reduce to being proportional to a
single generator. That is, we must have some relation of the form
[T a , T b] = if abc T c , (2.38)
2
If you are unaware of this result, you should find a description of it in any good mathematical methods
textbook.
2-13
Practical Collider Physics
where the summation convention is implied as usual, and the set of numbers {f abc }
(where all indices range from 1 to the dimension of the group) is such that the indices
match up on both sides of the equation. Equation (2.38) is indeed true for the
generators of any Lie group, and is called the Lie algebra of the group. The numbers
{f abc } are called the structure constants of the group. Whilst the generators are non-
unique, the structure constants are fixed for a given Lie group. That is, we may take
a given Lie algebra or group as being defined by its structure constants3, so that
different structure constants define different Lie groups. From equation (2.38) and
the fact that the commutator is antisymmetric, we see that
f abc = −f bac .
In fact, one may prove that f abc is antisymmetric under interchange of any two of its
indices. In full this implies
f abc = f bca = f cab = −f bac = −f acb = −f cba . (2.39)
One of the reasons that the Lie algebra is useful is that it allows us to check
whether a given choice of generator matrices is indeed an appropriate basis for a
given Lie group. We can choose any set of matrices that satisfy the Lie algebra, and
such a set is then called a representation of the group generators. There is clearly a
large amount of freedom involved in choosing a representation, which then in turn
implies a representation for the group elements themselves. For SU(N ), for example,
we can use N × N complex matrices of unit determinant, which corresponds to the
defining property of the group. However, this is not the only possibility. It is possible
to find generator matrices that obey the Lie algebra of SU(N ), even though they are
not N × N, which appears confusing at first sight. However, the confusion is
(hopefully) removed upon fully appreciating that a group is only defined by its Lie
algebra. Then the defining property of SU(N ) is that its Lie algebra is the algebra
associated with generators of N × N complex matrices of unit determinant. This does
not mean that a given representation of the group always has N × N matrices, as
some higher dimensional matrices may obey the same Lie algebra. Thus, the
representations of SU(N ) have many possible dimensionalities, only one of which
corresponds to the original defining property of the group.
Choosing an N × N representation (in line with the original group definition) has
a special name: it is called the fundamental representation. For example, the
fundamental representation of SU(3) consists of 3 × 3 complex matrices of unit
determinant. It is the representation that is relevant for quark fields, which live in a
complex three-dimensional colour space. Another commonly used representation
for a given Lie group G is the so-called adjoint representation, in which one uses
matrices whose dimension is equal to the dimension of the group, dim(G ). Indeed,
one may show (see the exercises) that the matrices
(T a )bc = −if abc (2.40)
3
Strictly speaking, there may be more than one Lie group associated with the same Lie algebra, a complication
that will not bother us here.
2-14
Practical Collider Physics
obey the Lie algebra4. The adjoint representation acts on vectors whose dimension is
that of the Lie group, and we will see such vectors later.
In any given representation R, the generators satisfy the following identities (not
necessarily obvious!),
tr[T aT b] = TRδ ab, ∑ T aT a = CR I , (2.41)
a
Here we have assumed a single flavour of quark of mass m for now, and used vector
notation to describe the three components of the field in colour space. As in QED,
the key to making this Lagrangian locally gauge invariant is to replace the derivative
∂μ with a suitable covariant derivative, such that the latter acting on the quark field
transforms nicely under gauge transformations. In this case, the covariant derivative
4
Note that for colour indices, index placement (upstairs or downstairs) is irrelevant, and we therefore follow
existing conventions.
2-15
Practical Collider Physics
where the factor of gs is the analogue of e in QED, and is called the strong coupling
constant—we will see why later. The first term in equation (2.46) generates a
translation as desired, and contracting the second term with some four-vector a μ
corresponds to an infinitesimal gauge transformation with
θ a = gsa μA μa . (2.47)
To see if the covariant derivative of equation (2.46) works, first note that equation
(2.45) implies
5
Here and in the following we have left identity matrices I implicit, as is done in many textbooks.
2-16
Practical Collider Physics
DμΨ → UDμU−1UΨ
and hence
Dμ → UDμU−1. (2.48)
One may show that this is satisfied provided the matrix-valued quantity Aμ(x )
behaves under a gauge transformation as
i
A μ → A′μ = UA μU−1 + (∂μU)U−1. (2.49)
gs
It is certainly possible to fulfil this condition, such that equation (2.48) then implies
that if we promote the partial derivative in equation (2.44) to be a covariant
derivative:
Lquark = Ψ̄(iγ μDμ − m)Ψ , (2.50)
Thus, we have found a quark Lagrangian that is invariant under local gauge
transformations, provided the covariant derivative of equation (2.46) is such that
equation (2.49) is true. In QED, we resolved the issue of how to find the quantity Aμ
by requiring that it be a field that actually exists, whose equations of motion provide
a solution for it. We can do exactly the same thing here, and say that
A μ = A μa t a (2.52)
is a field present in Nature: the gluon. A major difference with respect to QED is that
the field has an extra index a = 1, … , dim(G ), where G is the Lie group behind our
gauge transformations. This is commonly referred to as a ‘gauge index’. In
particular, G = SU(3) for QCD, and thus a = 1, … , 8. It is relatively straightfor-
ward to see why this must be the case. The reason the gauge field arises is that, if we
rotate the colour arrow by different amounts at every point in spacetime, we can
compensate for this by defining an additional quantity (the gauge field) at every
point, such that the total theory is gauge invariant. A given local gauge trans-
formation has dim(G ) independent degrees of freedom from equation (2.33). Thus,
the gauge field needs to have the same number of degrees of freedom in order to
compensate for all possible local gauge transformations.
We can in fact view QED as a special case of the general framework discussed
above, valid for any Lie group G. For QED, G = U(1), which has a single generator
t1 = 1. We can then write the gauge field as
A μ = A μa t a = A μ1 ≡ Aμ
2-17
Practical Collider Physics
i.e. it has only a single component, so we can drop the superscript. Furthermore, the
transformation of equation (2.49) reduces to
1
A′μ = Aμ − ∂μα , if U = e iα ,
gs
consistent with equation (2.11).
So far we have constructed a locally gauge invariant quark Lagrangian. To
complete the theory, we need to add the kinetic term for the gluon field Aμ, where for
QED we made use of the field strength tensor of equation (2.12), which is gauge
invariant by itself. This will not be true in QCD, as Fμν becomes matrix-valued in
colour space (given that the gauge field itself is). It turns out that we also need to
generalise the definition of equation (2.12), to something that transforms more nicely
under gauge transformations. To this end, consider the commutator of two QED
covariant derivatives acting on an arbitrary electron field:
[Dμ, Dν ]Ψ = (∂μ + ieAμ )(∂ν + ieAν )Ψ − (μ ↔ ν )
= [∂μ∂ν + ieAμ ∂ν + ie(∂μAν ) + ieAν ∂μ − e 2Aμ Aν ]Ψ − (μ ↔ ν )
= ie(∂μAν − ∂νAμ )Ψ
= ieFμν Ψ .
We therefore see that applying two covariant derivatives to the electron field, then
reversing the order and taking the difference, is the same as multiplying the electron
field with the electromagnetic field strength tensor (up to an overall factor).
Furthermore, the fact that the electron field we are considering is arbitrary means
that we can formally define the field strength tensor as
[Dμ, Dν ] = ieFμν. (2.53)
The reason this particular definition is so useful is that it is very straightforward to
generalise to QCD. Upon doing this, one may consider the commutator of two QCD
derivatives:
[Dμ , Dν] = (∂μ + igs A μ)(∂ν + igs A ν) − (μ ↔ ν )
= ⎡⎣∂μ∂ν + igs(A μ∂ν + A ν∂μ) + igs(∂μA ν) − gs2 A μA ν⎤⎦ − (μ ↔ ν )
= igs (∂μA ν − ∂νA μ + igs[A μ, A ν])
= igs Fμν ,
2-18
Practical Collider Physics
where we have used equation (2.38) in the second line. We may thus write
a a
Fμν = F μν t , (2.55)
Note that this reduces to the QED definition of equation (2.12) if f abc = 0, which
makes sense: as we discussed above, for the group U(1) there is only a single
generator t1 = 1, and thus
[t a , t b] = 0
for all choices of a and b (both of which must be 1!). There are then no non-trivial
structure constants, so that the final term in equation (2.56) vanishes. Lie groups
where the elements all commute with each other (including the generators) are called
Abelian. Conversely, if f abc ≠ 0 for any a, b or c, the group is called non-Abelian.
Thus, QCD is an example of a non-Abelian gauge theory, and the form of the field
strength tensor, equation (2.56), is our first hint that QCD is much more complicated
than the Abelian gauge theory of QED!
Let us now see how to make a kinetic term out of the non-Abelian field strength
tensor. One may show (see the exercises) that
1
Lkin = − tr[FμνFμν] (2.57)
2
is gauge-invariant, where the trace takes place in colour space. Using equations
(2.42) and (2.55), we may write this in terms of the component fields as follows:
1 a μνa
Lkin = − F μνF . (2.58)
4
To construct the complete Lagrangian of QCD, we must add our kinetic term for the
gluon to the quark Lagrangian, where we sum over six copies of the latter: one for
each quark flavour out of (u, d , c, s, t, b ). We then finally arrive at
1 a μνa
LQCD = − F μνF + ∑ Ψ̄f (iγ μDμ − mf )Ψf . (2.59)
4 flavours f
• Once we say how may quark fields there are, and the fact that these are in the
fundamental representation of a gauge group G, the Lagrangian is completely
fixed by local gauge invariance!
• The Lagrangian of equation (2.59) may look simple, but do not be fooled! QCD
is a spectacularly complicated theory, that nobody fully understands. Much of
this lack of understanding relates to the fact that we cannot solve the quantum
equations of the theory exactly in many situations of physical interest.
2-19
Practical Collider Physics
• Equation (2.59) is the pure QCD Lagrangian, including only the strong
interactions of the quarks. The latter also have weak and electromagnetic
interactions, which need to be added separately.
To analyse the theory in more detail, we can write out all terms in full to obtain
1
LQCD = −
4
(∂μAνa − ∂νAμa )(∂ μAνa − ∂ νAμa ) + ∑ Ψ̄f (iγ μ∂μ − mf )Ψf
f
gs2 (2.60)
+ gsf abc A μb Aνc ∂ μAνa − f abc f ade A μb Aνc Aμd Aνe
4
− gs ∑ Ψ̄f γ μt aΨf A μa .
f
The quadratic terms look similar to the corresponding terms in QED, and there is
also a term (in the third line) describing interactions of the gluon A μa with the (anti-)
quarks, similar to the coupling of the photon to electrons and positrons in QED. In
QCD, however, this interaction is more complicated as it involves a colour
generator. The physical meaning of this is that (anti-)quarks can change their
colour by emitting/absorbing a gluon.
The second line in equation (2.60) contains cubic and quartic terms in the gluon
field. Thus, the gauge field can interact with itself! This is consistent with what we
said above: if (anti)-quarks can change colour by emitting a gluon, then gluons
themselves must carry colour charge if colour is to be conserved overall. Then gluons
can interact with other gluons. We can see that the gluon self-interaction terms all
involve structure constants, and indeed vanish if f abc = 0, which is the case in QED.
This is why the photon does not interact with itself, and thus carries no electro-
magnetic charge. The gluon self-interactions may not look too innocuous here, but
such terms are the main reason why non-Abelian gauge theories are much more
complicated than Abelian ones.
2-20
Practical Collider Physics
where V (∣ϕ∣) is a potential energy. We have chosen that this depends only on the
magnitude of the field, in which case equation (2.61) is invariant under the global
gauge transformation
ϕ → e iαϕ ⇒ ϕ* → e−iαϕ*, (2.62)
which may be compared with the similar gauge transformation which acts on
fermions (equations (2.3) and (2.4)). Using the methods of Section 2.1, we may then
promote this symmetry to a local gauge symmetry, by replacing the ordinary
derivatives in equation (2.61) with the covariant derivative
Dμ = ∂μ + igAμ ,
where we use a different symbol g for the coupling constant, to emphasise the fact
that we are in a different theory to QED. We must then also add the usual kinetic
terms for the gauge field Aμ, upon which we arrive at the Lagrangian
1 μν
L = (D μϕ)† (Dμϕ) − V (∣ϕ∣) − F Fμν. (2.63)
4
This describes the interaction of a charged scalar field with a photon-like particle.
Note that the gauge transformations in equation (2.62) crucially relied upon the fact
that the field was complex, which makes perfect sense: a complex field has two
degrees of freedom, so can give rise to the positive and negative charge states that
represent a scalar particle and its antiparticle.
So far we have left the form of the potential function V (∣ϕ∣) unspecified, but let us
now consider the specific choice
V (∣ϕ∣) = −μ2 ϕ*ϕ + λ(ϕ*ϕ)2 , (2.64)
for arbitrary parameters μ and λ. For a potential energy bounded from below, we
require λ > 0. If the quadratic term were to have opposite sign (or μ2 < 0), this
would act like a normal mass term for the scalar field, and the overall potential
energy would have a minimum for ϕ = ϕ* = 0. However, very different behaviour is
obtained with the term as written (where we assume μ2 > 0). In that case, the
minimum potential energy for the scalar field does not occur for ∣ϕ∣ = 0, but for the
vacuum expectation value (VEV)
v μ2
∣ϕ∣ ≡ = , (2.65)
2 2λ
where the constant v has been introduced for later convenience. This minimum can
be seen in Figure 2.4(a), which shows the potential as a function of ϕ. One sees the
presence of what look like two discrete minima on the plot, but it must be
remembered that these are in fact continuously connected: the condition of equation
(2.65) describes a circle in the (ϕ, ϕ*) plane, such that the true form of the potential is
that of Figure 2.4(b).
The vacuum of the theory will be associated with the minimum of the potential
energy. However, unlike the previous cases of QED and QCD, we now have a
2-21
Practical Collider Physics
Figure 2.4. (a) The scalar potential of equation (2.64); (b) form of the potential in the (ϕ, ϕ*) plane, showing
the continuously connected circle of minima.
situation in which there is no unique vacuum. Instead, there are infinitely many
possible vacua, corresponding to all of the points on the circle in the (ϕ, ϕ*) plane
such that V (∣ϕ∣) is minimised. To make this more precise, note that the general
complex field ϕ that satisfies equation (2.65) can be written as
ϕ = ϕ0 e iΛ (2.66)
where the arbitrary phase Λ ∈ [0, 2π ] labels all possible points on the circle. Note
that different points on the circle—and thus different vacua—are related by the
gauge transformations of equation (2.62). Choosing a vacuum state then amounts to
fixing a particular value of Λ, after which gauge invariance is no longer manifest in
the Lagrangian of the theory. The conventional terminology for this is that the
gauge symmetry is spontaneously broken, although in practice the gauge invariance
takes a different form: we are still allowed to gauge transform the fields, provided we
also change the vacuum upon doing so.
Let us now examine the consequences of the above discussion, where without loss
of generality we may choose Λ = 0 to simplify the algebra. We can then expand the
complex field ϕ as follows:
1
ϕ(x ) = (v + χ )e iη/v , χ, η ∈ (2.67)
2
Here both χ and η represent deviations from the modulus and phase of the vacuum
field, respectively, where the inverse factor of v in the exponent keeps its argument
dimensionless, and factors of 2 have been introduced so as to simplify later
equations. We may substitute the expansion of equation (2.67) into equation (2.63),
yielding
1 μν g 2v 2 μ
L=− F Fμν − gvAμ ∂μη + A Aμ
4 2 (2.68)
1 1
+ ∂ μχ ∂μχ − λv 2χ 2 + ∂ μη ∂μη + … ,
2 2
2-22
Practical Collider Physics
where the ellipsis denotes interaction terms (i.e. cubic and higher in the fields). In the first
and second lines, we can recognise the usual kinetic term for the gauge field, together
with kinetic terms for the two (real) scalar fields χ and η. In the second line, we see a mass
term for the field χ, where m χ2 = 2λv2 is the squared mass. There is no such term for the η
field, indicating that the latter is massless. This turns out to be an example of a general
result known as Goldstone’s theorem: the spontaneous breaking of a continuous
symmetry inevitably gives rise to massless scalar fields known as Goldstone bosons,
where there is one such boson for every degree of freedom of the original symmetry that
has been broken. Here, the choice of vacuum broke a rotational symmetry with one
degree of freedom, thus there is only a single Goldstone boson.
There is a simple physical interpretation of Goldstone’s theorem, which we can
examine in the present case by looking at Figure 2.4(b). A given vacuum state
corresponds to a point on the circle of minimum energy, and the two scalar fields χ
and η correspond to perturbing the field in the radial and azimuthal directions,
respectively. The first of these takes one away from the minimum by climbing a
potential well that is quadratic to a first approximation: this is associated with the
second term in the second line of equation (2.68), i.e. the mass term for the χ field.
On the other hand, perturbations in the azimuthal direction take one around the
circle, costing no energy. This explains the lack of a mass term for η, and hence the
presence of a massless Goldstone boson.
As well as possible mass terms for the scalar fields, we also see a mass term for the
photon (the third term in the first line of equation (2.68)). Comparing with the
conventional mass term for a vector field, we find
1
mAAμ Aμ ⇒ mA = gv .
2
This mass is conceptually very different from the case discussed in Section 2.1, in
which a mass term was added by hand. We saw that such a term was not gauge
invariant, which is not the case here: we have started with a fully gauge-invariant
theory of a scalar field, and broken the symmetry spontaneously. This generates the
mass for the gauge field, as can be seen explicitly in equation (2.5) given that the
mass is proportional to the vacuum expectation value of the scalar field. At high
energies, the field will no longer be constrained to have minimum energy, and the
manifest gauge symmetry will be restored.
We have not yet drawn attention to the second term on the right-hand side of
equation (2.68), which is puzzling in that it constitutes a mixing between the gauge field
and azimuthal perturbation η. This suggests we have not fully understood the physical
particle spectrum of the theory, and we can verify this by counting degrees of freedom.
Before the gauge symmetry is broken, we have a massless vector field, which has two
polarisation states. Combined with the complex scalar ϕ, this gives four degrees of
freedom in total. After the breaking, we have a massive vector field (three polarisation
states), and two real scalars, giving five degrees of freedom. One of these must then be
spurious, and can be gotten rid of. To do this, we can make the gauge transformation
ϕ → ϕ e−iη/v ,
2-23
Practical Collider Physics
6
That η can be completely removed should not surprise us: equation (2.67) tells us that it is the phase of the ϕ
field, and it is precisely this phase that can be completely fixed by a gauge transformation.
7
Here we must carefully distinguish between vectors and pseudo-vectors such as the angular momentum
L → x × p . The latter necessarily remain invariant under a parity transformation.
2-24
Practical Collider Physics
for left- and right-handed states, respectively. Parity transformations turn left-
handed spinors into right-handed ones and vice versa, and thus a violation of parity
invariance tells us that the left- and right-handed states of massive fermions in nature
must behave differently in the equations describing the theory. In particular, they
may behave differently under gauge transformations. One may write the Dirac
Lagrangian of equation (2.1) for a single fermion species of mass m in terms of the
spinors of equation (2.70), as follows:
L = Ψ̄L(iγ μ∂μ)ΨL + Ψ̄R(iγ μ∂μ)ΨR − m(Ψ̄R ΨL + Ψ̄L ΨR). (2.72)
Here, we see that the left- and right-handed components decouple from each other as
far as the kinetic term is concerned. However, the mass term mixes the two
components, which potentially violates gauge invariance if they transform differ-
ently. The upshot is that fermion masses, similarly to gauge boson masses, must be
generated by spontaneous symmetry breaking.
A third complication we will see is that the electromagnetic and weak forces
cannot be separated from each other, but must be combined into a single electro-
weak theory. Based on the above discussion, the recipe to be followed to construct
this theory is as follows:
It is clear that there are many possible choices at each stage of the above procedure,
and it is ultimately experiment that must decide which is correct. For this reason, the
correct electroweak theory took many decades to establish. It was first written down
in the 1960s, and rewarded with a Nobel Prize (to Sheldon Glashow, Abdus Salam
and Steven Weinberg) in 1979. Rather than reconstruct the long and rather tortuous
steps that led to the theory, we will simply summarise the main details here.
The gauge group of the electroweak theory is found to be SU(2) × U(1). Unlike
the previous examples we have seen, this is called a semi-simple Lie group, containing
two distinct subgroups, each with their own associated charges. The charge
associated with the SU(2) subgroup is called weak isospin, and the various matter
particles form multiplets which are acted on by SU(2) transformations. To find what
2-25
Practical Collider Physics
these multiplets are, we can make an analogy with the study of angular momentum
in quantum mechanics. In the latter case, a particle with total angular momentum J
(in units of ℏ) is associated with a multiplet of (2J + 1) independent states. Each
component of the multiplet can be labelled by the component of the (vector) angular
momentum in some direction, which is conventionally taken to be the z-direction.
Writing this component as Jz, one has
Jz ∈ { −J , −J + 1, … , J − 1, J }.
A special case is that of J = 1/2, in which case one finds a doublet of states:
⎛ ↑ ⎞
ζ=⎜ z⎟
⏐
,
⎜ ⏐
⏐ ⎟
⎝ ↓ z⎠
where ↑ ⏐
⏐
⏐ z and ↓ z denote states with Jz = ±1/2 respectively. These will mix with
each other if the coordinate axes (x , y, z ) are rotated. More specifically, a rotation
acts on the doublet according to
⎡ 3 ⎤
ζ → Rζ, R = exp ⎢i ∑ σaτa ⎥ , (2.73)
⎢⎣ a = 1 ⎥⎦
where θi are rotation angles about the individual axes, and σi = τi /2, with {τi} the
Pauli matrices
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
τ1 = ⎜ 0 1 ⎟ , τ2 = ⎜ 0 − i ⎟ , τ3 = ⎜1 0 ⎟ . (2.74)
⎝1 0 ⎠ ⎝i 0 ⎠ ⎝ 0 − 1⎠
2-26
Practical Collider Physics
In the electroweak theory, the left-handed fermions form doublets, thus have weak
isospin T = 1/2. First, there is one doublet for each generation of leptons:
⎛ νμ ⎞
(eν ) ,
e
−
L
⎜ ⎟ ,
⎝ μ−⎠L (τν ) ,
τ
−
L
where the subscript L reminds us that this applies to the left-handed components of
the fields only. We see that each left-handed neutrino has T3 = 1/2, and the left-
handed electron, muon and tauon all have T3 = −1/2. Next, each generation of left-
handed quarks also forms a doublet:
2-27
Practical Collider Physics
Table 2.1. Electroweak quantum numbers for all SM matter particles, where L (R) denotes a left-handed
(right-handed) state, T the total weak isospin, T3 its component along the three-axis, Y the weak hypercharge
and Q the electromagnetic charge.
Field T T3 Y Q
differing quantum numbers in table 2.1. For the weak isospin doublet fields, we
have8
Y
Dμ = ∂μ + igσaW μa + ig′ Bμ, (2.77)
2
Here the {W μa} are components of a gauge field associated with the SU(2) (weak
isospin) symmetry, where a is an index in weak isospin space. This is the analogue of
the gluon field in the non-Abelian gauge theory of strong interactions and, as in that
case, the gauge field must be contracted with generators in the appropriate
representation (here the fundamental representation {σa} that acts on fields with
weak isospin T = 1/2). We have also introduced an Abelian-like gauge field Bμ
associated with the U(1) symmetry, where Y is the weak hypercharge. Finally, g and
g′ are coupling constants.
We also need the form of the covariant derivative for the singlet fields (i.e. with
zero weak isospin). This simply corresponds to equation (2.77) without the second
term on the right-hand side, given that there is no need to generate weak isospin
transformations on fields that have no weak isospin:
Y
Dμ = ∂μ + ig′ Bμ. (2.78)
2
To write the kinetic terms for the fermions, we simply use the Dirac Lagrangian of
equation (2.1), but with all partial derivatives replaced by their covariant counter-
parts. To write this compactly, let us introduce the vectors of doublets:
8
We have here included a conventional factor of 1/2 in the definition of the weak hypercharge. Other
conventions exist in the literature in which this factor is absent. The values of Y in table 2.1 should then be
halved.
2-28
Practical Collider Physics
⎡ ν ⎤
L = ⎢ e−
⎣ e ( ) , ⎛⎝ μν ⎞⎠ , (τν ) ⎥⎦
L
⎜
μ
−⎟
L
τ
−
L
(2.79)
for the quarks. In index notation, Li and Qi then refer to the left-handed doublets for
each generation of leptons and quarks, respectively. Let us further introduce the
vectors of right-handed fields
⎛ eR ⎞ ⎛ νe,R ⎞ ⎛ uR ⎞ ⎛ dR ⎞
ER = ⎜⎜ μR ⎟⎟ , νR = ⎜⎜ νμ,R ⎟⎟ , UR = ⎜⎜ cR ⎟⎟ , DR = ⎜ sR ⎟ . (2.81)
⎜ ⎟
⎝ τR ⎠ ⎝ ντ,R ⎠ ⎝ tR ⎠ ⎝ bR ⎠
where we have utilised the fact—which can be verified from table 2.1—that the
groups of fields in each term share a common value of weak hypercharge.
As well as the fermion kinetic terms, we also need the kinetic terms for the newly-
introduced gauge fields Wμ and Bμ, where
Wμ ≡ W μa σ a
is matrix-valued in weak isospin space. By analogy with the QED and QCD cases of
equations (2.13) and (2.58), we would write these as
1 1
Lkin. = − B μνBμν − W a,μνW μν
a
, (2.83)
4 4
where Bμν and W μν a
are the field strength tensors associated with Bμ and Wμ
respectively. However, we know that such kinetic terms give rise to massless gauge
fields, whereas in fact we want to capture the fact that some of the gauge bosons in
the electroweak theory are massive. As discussed above, this in turn means that we
must introduce a scalar field (or set of them) with non-zero vacuum expectation
values, so that we can generate the non-zero gauge boson masses through sponta-
neous symmetry breaking. We mentioned that Goldstone’s theorem implies that
there is a massless scalar field (or Goldstone boson) for each degree of freedom of a
continuous symmetry that is broken. Furthermore, these Goldstone bosons are then
identified with the extra polarisation states of the gauge bosons that become massive.
2-29
Practical Collider Physics
In the electroweak theory, we require three massive gauge bosons (the W ± and Z0
bosons), and one massless (the photon). As we saw in Section 2.1, the gauge group of
electromagnetism is U(1), and so the symmetry breaking that must take place is
SU(2) × U(1) Y → U(1)EM . (2.84)
We have here appended a suffix to the U(1) groups to emphasise that they are not
the same: the U(1) group on the left-hand side corresponds to weak hypercharge Y,
whereas that on the right-hand side is associated with electric charge Q. The
dimension of the complete gauge group on the left (i.e. the number of independent
degrees of freedom needed to specify an arbitrary gauge transformation) is four, and
on the right-hand side is one, so this is consistent with three gauge bosons becoming
massive. To break the symmetry, we must introduce a set of scalar degrees of
freedom in a well-defined multiplet associated with the gauge group. It turns out that
the minimal choice is to introduce an SU(2) doublet of complex scalar fields
⎛ ϕ+ ⎞
Φ = ⎜ 0 ⎟, ϕ+ , ϕ0 ∈ , (2.85)
⎝ϕ ⎠
known as the Higgs multiplet. This has weak isospin T = 1/2, such that ϕ+ (ϕ0) has
isospin T3 = 1/2 (T3 = −1/2). Furthermore, the superscripts on the fields ϕ+,0 indicate
what turns out to be their electric charge. Requiring that the Higgs field satisfies the
Gell-Mann–Nishijima relation of equation (2.76) then implies that the Higgs
doublet has weak hypercharge Yϕ = 1, and we can then write down the gauge-
invariant Lagrangian
1 1
Lkin. = − B μνBμν − W a,μνW μν
a
+ (D μΦ)†(DμΦ) − V (Φ), (2.86)
4 4
where the covariant derivative acting on the Higgs doublet is
⎛ ig′ ⎞
DμΦ = ⎜∂μ + igW μaσ a + Bμ⎟Φ , (2.87)
⎝ 2 ⎠
and we have introduced the Higgs potential9
V (Φ) = −μ2 Φ†Φ + λ(Φ†Φ)2 . (2.88)
This is a generalisation of the potential in the Abelian Higgs model of equation
(2.64). However, the fact that Φ is now a doublet of complex fields means that there
is a four-dimensional space of real scalar degrees of freedom, making the potential
hard to visualise. Nevertheless, it remains true that for μ2 > 0 and λ > 0, the
minimum of V (Φ) does not occur at zero field values, but at a non-zero VEV
given by
9
Equation (2.88) turns out to be the only potential function consistent with SU(2) × U(1) Y gauge invariance,
and renormalisability, where the latter concept is discussed in Section 2.11.
2-30
Practical Collider Physics
μ2 v
∣ϕ∣ = ≡ , (2.89)
2λ 2
where we have again introduced a factor of 2 when defining the constant v, for
convenience. By analogy with equation (2.67), we may write a general Higgs field,
perturbed about the vacuum, as follows:
1 ⎡i ⎤⎛ ⎞
Φ( x ) = (v + h(x )) exp ⎢ Θa(x )σ a ⎥⎜ 0 ⎟ , (2.90)
2 ⎣ v ⎦⎝1 ⎠
We have chosen an arbitrary direction in the space of complex scalar fields, which is
then subjected to an arbitrary SU(2) gauge transformation, plus an arbitrary
perturbation of the overall normalisation. This gives four degrees of freedom in
total, as is appropriate for the Φ field. Were we to substitute equation (2.90) into
equation (2.86) and expand, we would find the fields Θ a(x ) (associated with the
broken SU(2) symmetry) appearing as Goldstone bosons, as occured with the field
η(x ) in equation (2.67). As in the former case, these are spurious degrees of freedom,
and we may choose to remove them from the Lagrangian by working in the unitary
gauge, in which one sets Θ a(x ) = 0 for all a. The Higgs doublet in the unitary gauge
is thus
1 ⎛ ⎞
Φ( x ) = (v + h(x ))⎜ 0 ⎟ . (2.91)
2 ⎝1 ⎠
Upon substituting this into the Lagrangian of equation (2.86) and carrying out a lot
of messy algebra, one finds the following terms for the gauge fields (in addition to
their kinetic terms):
v2 ⎡ ⎤
LM =
8⎣
( )
gW μ3 − g′Bμ (gW 3μ − g′B μ) + 2g 2Wμ−W +μ⎦ , (2.92)
2-31
Practical Collider Physics
+ Z ν( −W +μ∂νW μ−+Wμ−∂νW μ+
+ W +μ∂μWν− − W μ−∂μWν+⎤⎦ )
(2.97)
− ie⎡⎣∂ μAν (W μ+Wν− − Wν+Wμ−)
+ Aν ( −W +μ∂νWμ− + W μ−∂νW μ+
+ W +μ∂μWν− − W μ−∂μWν+⎤⎦ , )
and the quartic terms
1 e2
L4 = − W +μWμ−W ν+Wν−
2 sin2 θ W
1 e2
+ W +μW −νW μ+Wν−
2 sin2 θ W
(2.98)
− e 2 cot2 θ W(Z μW μ+Z νWν− − Z μZμW ν+Wν−)
+ e 2(Aμ W μ+Aν Wν− − Aμ Aμ W +νWν−)
+ e 2 cot θ W(Aμ W μ+W −νZν + Aμ Wμ−Z νWν+ − W +μWμ−Aν Zν ) .
Here Fμν amd Zμν are field strength tensors for the Aμ and Zμ fields, and we have
defined
gv mW
mW = , mZ = , (2.99)
2 cos θ W
2-32
Practical Collider Physics
2-33
Practical Collider Physics
2h ⎛ 2 + μ − 1 2 μ ⎞
LVh = ⎜mW W Wμ + m Z Z Zμ⎟
v⎝ 2 ⎠
(2.102)
⎛h⎞ ⎛
2
1 ⎞
+ ⎜ ⎟ ⎜mW2 W +μWμ− + m Z2 Z μZμ⎟ .
⎝v⎠ ⎝ 2 ⎠
Together with equations (2.96)–(2.98) and equation (2.101), this completes all terms
that arise from the Lagrangian of equation (2.86), after spontaneous symmetry
breaking. We have also provided kinetic terms for the matter particles, in equation
(2.82). Our final task is to provide mass terms for the latter.
Let us start with the lepton sector. In the original SM, the neutrinos are massless,
and so we only have to provide mass terms for the (e−, μ− , τ −), all of whose left-
handed parts sit in the lower components of the doublets in equation (2.79). Above,
we saw that we cannot simply introduce fermion mass terms by hand, as this will not
be gauge-invariant by itself. However, we can form the gauge-invariant combination
Ll,Yuk. = −∑ yi L¯ i ΦeRi + h.c., (2.103)
i
where ‘h.c.’ refers to the Hermitian conjugate, and each yi is an abitrary real
parameter. This is called a Yukawa interaction, and it will indeed generate lepton
masses after electroweak symmetry breaking. To see this, we can substitute in the
form of the Higgs doublet in the unitary gauge, equation (2.91), to obtain
⎡ 1 ⎤ yl v
Ll,Yuk. = −∑⎢ml l¯L lR + hl¯L lR⎥ + h.c., ml = , (2.104)
⎣ v ⎦ 2
l
where the sum is over the leptons l ∈ (e, μ, τ ). The first term in the brackets is a mass
term, where the mass of each lepton is proportional both to the Higgs VEV v, and
the Yukawa coupling yl. Note that this does not explain why we see such a range of
lepton masses in the SM: instead this question is simply rephrased in terms of why
the Yukawa couplings are so different.
The same mechanism as above can be used to give masses to the down-type
quarks, given that these are also associated with the lower components of SU(2)
doublets. However, the up-type quarks also need masses, and they are projected out
by the above construction, so that it must be generalised. To this end, we may use the
conjugate Higgs doublet
(v + h) ⎛⎜1 ⎞⎟
Φ̃ = iτ2Φ* = , (2.105)
2 ⎝0⎠
where τ2 is a Pauli matrix, and the second equality holds only in the unitary gauge. It
turns out that equation (2.105) corresponds to a charge conjugation operation acting
on the Higgs doublet, and thus reverses its quantum numbers. In particular, Φ̃ has
weak hypercharge Y = −1, given that Φ has Y = 1. Armed with equation (2.105), the
most general set of mass terms we can write down for the quark sector is
2-34
Practical Collider Physics
Table 2.2. Electromagnetic and weak parameters entering the fermion Lagrangian of
equation (2.107).
Fermion Qf Vf Af
2 1 4 1
(u , c , t ) 3 2
− 3
sin2 θ W 2
(d , s , b ) 1 1 2 1
−3 −2 + 3
sin2 θ W −2
(νe, νμ, ντ ) 0 1 1
2 2
(e , μ , τ ) −1 1
−2 + 2 sin2 θ W −2
1
where summation over repeated indices is involved, and {Yijd , Yiju} are arbitrary
(complex) parameters for the down- and up-type mass terms, respectively. These
coefficients are not fixed from first principles, and must ultimately be measured from
experiment. At first sight, this looks like a large number of parameters: given that
there are three generations, two separate 3 × 3 complex matrices constitute 18
parameters. However, it may be shown that many of these parameters can be
eliminated by performing linear transformations of the fields, the net effect of which
is that the complete Lagrangian for the fermion sector can be written as
⎡ ⎛ mf h ⎞
Lf = ∑⎢Ψ̄f ⎜iγ μ∂μ − mf − ⎟Ψf − eQf Ψ̄f γ μAμ Ψf
f ⎣ ⎝ v ⎠
g ⎤
− Ψ̄f γ μZμ(Vf − Af γ5)Ψf ⎥
2 cos θ W ⎦ (2.107)
g
− ∑⎡⎣l¯ γ μ(1 − γ5)W μ+ νl + h.c.⎤⎦
2 2 l
g ⎡
⎣Vij W μ U¯ γ (1 − γ5)D + h.c.⎤⎦ .
+ i μ j
+
2 2
Here the first line contains the kinetic and mass terms for all fermion fields Ψf , as
well as their coupling to the Higgs boson and photon, where Qf is the relevant
electric charge. The second line contains the coupling of all fermions to the Z boson,
where Vf and Af are the so-called vector and axial-vector couplings, respectively.
Their values are listed in table 2.2. There are also the couplings of the leptons (and
neutrinos) to the W ± bosons, where we have made the role of γ5 explicit.
The fourth line of equation (2.107) contains the couplings of the up- and down-
type quarks to the W bosons10, where {Vij} is a set of parameters called the Cabibbo–
Kobayashi–Maskawa (CKM) matrix. If this matrix were diagonal (i.e. such that only
the terms Vii are non-zero), a given up-type quark would couple only to the down-
10
Our notation for the up- and down-type quarks follows that of equation (2.81), where the absence of the R
subscript implies that both right- and left-handed components are included.
2-35
Practical Collider Physics
type quark within the same generation. However, this is not observed experimen-
tally, so that an up-type quark can change into an arbitrary down-type quark by
emitting a W boson (and vice versa), where the probability for this to happen
depends upon the appropriate CKM matrix element.
We have finally finished our description of the electroweak theory, whose
complete Lagrangian is
LEW = L2 + L3 + L 4 + LHiggs + LVh + Lf . (2.108)
The full SM is then obtained by adding in the couplings of the quarks to the gluon,
and the kinetic terms for the latter, as described in Section 2.4. There is no need to
add the quark mass terms discussed previously, as these are already contained within
equation (2.108).
The SM is a curious theory. On the one hand, it is strongly constrained by
mysterious and powerful theoretical underpinnings (e.g. gauge invariance). On the
other, it contains parts that—to be frank—are merely cobbled together to fit
experiments. Indeed, the latter must be the case, given that it is impossible for
theory alone to tell us what the gauge group of Nature has to be, or which multiplets
the fermions have to sit in. Nevertheless, it is difficult to shake off the impression
that the SM, with its cumbersome structure, cannot be the final answer for what is
really happening in our universe at the most fundamental level. We will return to this
subject later in the book.
p1 p3
p2 pn
Figure 2.5. A scattering process, in which two beam particles interact, and then produce n final state particles.
2-36
Practical Collider Physics
associated with a free QFT. Likewise, the final state consists of a bunch of outgoing
plane waves at time t → +∞, which is also a state of a free QFT. At intermediate
times, the particles interact (shown as the grey blob in Figure 2.5), and there we must
take the full Lagrangian of the QFT into account. We never have to worry, though,
about what the states look like at intermediate times, as the only states we measure
are associated with the incoming and outgoing particles.
For our given initial and final states, the probability of a transition from state ∣i 〉
to state ∣f 〉 is given by the usual rules of quantum mechanics, as
∣〈f ∣i〉∣2
P= . (2.109)
〈f ∣f 〉 〈i∣i〉
That is, there is some inner product between the states that measures the overlap
between them. One then takes the squared modulus of this to give a real number.
Finally, the denominator corrects for the fact that the states themselves may not
have unit normalisation. Indeed, it is common to normalise single particle states of
given four-momentum p and spin λ according to the formula
〈p′ , λ′∣p , λ〉 = (2π )32Eδλλ′δ (3)(p − p′), E = p0 . (2.110)
Here the delta functions tell us that states of different momenta and/or spins are
mutually orthogonal. Furthermore, the normalisation includes some conventional
numerical constants, plus a factor of the energy whose role will become clearer in the
following. Equation (2.110) implies that the inner product of a state with itself is
given by
〈p , λ∣p , λ〉 = (2π )32Eδ (3)(0)
d3x i(p−p ′)·x (2.111)
= 2E (2π )3 lim
p ′→ p
∫ (2π )3
e ,
To make sense of the apparently infinite delta-function integral, let us assume that
our particle is contained in a cubic box whose sides have length L, such that the
volume of the box V = L3. At the end of our calculation we will be able to take
L → ∞, recovering the physical situation we started with. With this trick, equation
(2.111) becomes
〈p , λ∣p , λ〉 = 2EV . (2.113)
This is for a single particle state. For an m-particle state, carrying through similar
arguments leads to
m
〈p1 , λ1; p2 , λ2 ; …;pm , λm∣p1 , λ1; p2 , λ2 ; …;pm , λm〉 = ∏ 2E i V . (2.114)
i=1
2-37
Practical Collider Physics
Here the first term on the right-hand side is non-zero only if the initial and final
states are equal, and thus corresponds to nothing particuarly interesting happening.
In the second term, the factors of i and (2π )4 are conventional, and the delta function
imposes overall momentum conservation, where
n
Pi = p1 + p2 , Pf = ∑ pk
k=3
are the total four-momenta in the initial and final states, respectively. The remaining
quantity A is called the scattering amplitude, where we have suppressed indices i and
f (showing that it depends on the nature of the initial and final states) for brevity. We
will see later how to calculate A , but for now simply note that the probability of
something interesting occuring (i.e. non-trivial scattering between the initial and
final states) is given, from equations (2.109), (2.115), (2.116), by
[(2π )4δ (4)(Pf − Pi )]2 ∣A∣2
P= n .
(2.117)
4E1E2V 2
∏ 2EkV
k=3
where we have again used equation (2.112), and introduced a finite time T for the
interaction, as well as the finite volume V. Hence, we can rewrite equation (2.117)
as
[(2π )4δ (4)(Pf − Pi )]T ∣A∣2
P= n . (2.119)
4E1E2V ∏ 2EkV
k =3
2-38
Practical Collider Physics
Note that this is the probability for fixed momenta {p3 , p4 , …pn } in the final state
(for a given initial state). In a real experiment, we have no control over what
happens, and so we should sum over all possible values of the final state momenta
that sum to the same Pf, to get a total probability per unit time. To see how this
works, recall that we have normalised our plane wave states to be in a box with sides
of length L. From the de Broglie relationship between momentum and wavenumber
ℏ→1 (2.121)
p = ℏk ⎯⎯⎯⎯⎯⎯⎯→ k ,
it follows that the momenta in each direction are quantised, with
2π
p= (nx , n y , nz ), ni ∈ (2.122)
L
(i.e. we can only fit complete multiples of the particle wavelength in the box).
Summing over all possible momentum states then amounts to summing all possible
values of the (integer) vector n = (nx , ny , nz ), where
⎛ L ⎞3 3
∑ = ⎜⎝
2π
L→∞
⎟ ∑ ⎯⎯⎯⎯⎯⎯⎯⎯⎯→ V
⎠ ∫ (2dπp)3 .
n p
We then have
We said before that we should only sum over final state momenta that give the right
total final state momentum. This is fine though, as the delta function will enforce the
correct result. Interestingly, we see that the integral over the three-momentum of
each final state particle is accompanied by a factor (2Ek )−1, whose origin can be
traced back to the normalisation of the particle states. It turns out that this is
particularly convenient: as you can explore in the exercises, the combination
d3pk
∫ Ek
turns out to be Lorentz-invariant by itself. Given that final results for cross-sections
have to be Lorentz invariant, it is useful to break the overall calculation into Lorentz
invariant pieces.
From known properties of Poisson statistics, the above total probability per unit
time will be the same as the event rate for transitions between our initial and
2-39
Practical Collider Physics
momentum-summed final states. To clarify, it is the event rate for a single incident/
target particle pair. In equation (1.21), we saw that the cross-section is the event rate
per target particle, per unit flux of incident particle. Thus, we can convert equation
(2.123) into the cross-section by dividing by the flux factor for a single particle. From
equation (1.20) (converted to the present notation), thus is given by
F1 = n1(v1 + v2 ),
where11
∣pi ∣
vi =
Ei
is the speed of particle i, and n1 is the number of particles in beam 1 per unit volume.
Here we have considered a single particle in volume V, and thus
1
n1 = .
V
Putting things together, we have
1 ⎛ ∣p1 ∣ ∣p ∣ ⎞
F1 = ⎜ + 2 ⎟.
V ⎝ E1 E2 ⎠
This will combine with the factor 4E1E2V in equation (2.123) to make the
combination
F = 4E1E2V F1 = 4(E1∣p2 ∣ + E2∣p1 ∣).
For collinear beams that collide head-on, one may show (see the exercises) that this
can be written as
F = 4[(p1 · p2 )2 − m12m 22 ]1/2 . (2.124)
This is called the Lorentz invariant flux. It is not the actual flux measured in a given
frame, but a combination of factors occuring in the cross-section calculation that
happens to be Lorentz invariant, and thus useful! Finally, we have arrived at an
important result, namely that the cross-section is given by
1
σ=
F
∫ dΦ∣A∣2 , (2.125)
where
⎛ d3pi ⎞ (4)
∫ dΦ = (2π )4⎜⎜∏ ∫ ⎟δ (Pf − Pi )
(2π )32Ei ⎟⎠
(2.126)
⎝ i
11
If you are not familiar with the expression for the speed, recall that the relativistic energy and momenta are
given in natural units by E = mγ and p = γmv respectively.
2-40
Practical Collider Physics
All of this is easier said than done of course, but we have at least broken down the
problem into what appear to be manageable ingredients. In order to go further, we
need to know how to calculate the scattering amplitude A .
space
time
Figure 2.6. An example of a Feynman diagram.
2-41
Practical Collider Physics
Figure 2.7. Feynman diagram symbols for: (a) a fermion (e.g. lepton or quark); (b) a gluon; (c) a photon, W or
Z boson; (d) a scalar (e.g. Higgs boson).
The importance of Feynman diagrams is that they are much more than handy
pictures to help us visualise scattering processes. Each one can be converted into a
precise mathematical contribution to the amplitude A , using so-called Feynman
rules. We will not derive the Feynman rules here. Instead, we will simply quote and
use the rules we need. We will work in momentum rather than position space, given
that the incoming/outgoing particles at a collider are typically set-up and measured
in terms of their definite energy and momentum. Then, the Feynman rules consist of
factors associated with each external line, internal line and vertex.
Beginning with the external lines, the rules state that each incoming or outgoing
fermion of four-momentum p contributes a basis spinor (or adjoint spinor), whose
spinor index will combine with other spinor indices in other parts of the diagram.
2-42
Practical Collider Physics
_
u(p) u(p)
_
v(p) v(p)
Figure 2.8. Feynman rules for incoming and outgoing (external) particles.
p i
2 2
p i (p/ + m)
2 2
This is summarised in Figure 2.8, where we use conventional notation for the basis
spinors. Each incoming photon or gluon of four-momentum p contributes a
polarisation vector ϵμ(p ), which includes information on its polarisation state.
Outgoing photons/gluons contribute the Hermitian conjugate of the appropriate
polarisation vector. Typically, in all calculations we will end up summing over the
possible polarisation and spin states of the external fermions and vector bosons. If
we did not do this, we would be calculating an amplitude with fixed spins/helicities.
Note that there are no external line factors for scalars.
For each internal line, we need to include a function called the ‘propagator’, and
the relevant functions are shown in Figure 2.9, where in the fermion case we have
used the Feynman slash notation
p ≡ pμ γ μ, (2.127)
with γ μ a Dirac matrix. For completeness, we have also included a small imaginary part
iε , ε≪1
in the denominator of each propagator. This ensures that the position-space
propagator has the correct causal properties. Throughout this book, we will safely
be able to ignore this issue, and so will omit the +iε from now on.
The rules for the propagators involve an inverse power of the squared four-
momentum of the particle being exchanged, minus the mass squared. Thus, they have
p2 ≠ m2 in general. In other words, virtual particles do not necessarily obey the
physical relation
2-43
Practical Collider Physics
p 2 = E 2 − ∣p∣2 = m 2 .
A common term used for such particles is that they are ‘off the mass shell’, or just
off-shell for short. Conversely, real particles are on-shell. You will see these terms
being used in many research papers and textbooks.
We have so far seen the propagators for scalars and fermions. This also covers
anti-fermions, as we simply reverse the direction of the four-momentum. However,
what about vector bosons such as photons and gluons? This is a bit more
complicated, and in fact the propagators for such particles turn out not to be
unique: they change if we make a gauge transformation of the photon or gluon. We
can get round this by breaking the gauge invariance, or ‘fixing a gauge’. This
amounts to imposing extra constraints on the field Aμ, where we consider the
(simpler) QED case for now. Then the photon propagator—and thus the results of
individual Feynman diagrams—will depend upon the choice of gauge. However,
provided we fix the gauge in the same way for all possible Feynman diagrams,
gauge-dependent factors will cancel out when we add all the Feynman diagrams
together, leaving a gauge-invariant result for the amplitude A . Common choices of
constraint that are imposed include
n μAμ = 0, (2.128)
where n μ is an arbitrary four-vector. This is called an axial gauge, as the vector n μ is
associated with a definite axis in spacetime. An alternative choice is
∂ μAμ = 0, (2.129)
which is known as the Lorenz gauge. In practice, these constraints can be
implemented by adding a so-called gauge-fixing term to the QED Lagrangian.
Examples include
λ2 μ 2
Laxial = − (n Aμ ) (2.130)
2
for the axial gauge, and
λ2
Lcovariant = − (∂μAμ )2 (2.131)
2
for the family of so-called covariant gauges, which includes the Lorenz gauge as a
special case. In each example, the arbitrary parameter λ acts as Lagrange multiplier,
enforcing the constraint. Another way to see this is that gauge invariance requires
that we be insensitive to the value of λ, which implies
∂L
= 0,
∂λ
where L is the full Lagrangian. The only dependence on λ is in the gauge-fixing term,
and thus this equation enforces the gauge-fixing condition. A given virtual photon in
an arbitrary Feynman diagram will have Lorentz indices μ and ν associated with its
2-44
Practical Collider Physics
p p
ab
D D
Figure 2.10. (a) The photon propagator will depend on the Lorentz indices at either end of the photon line;
(b) the gluon propagator will also depend on the adjoint indices a and b at either ends of the line.
endpoints, as shown in Figure 2.10. The propagator function then depends upon the
gauge, as discussed above. In the axial gauge, it turns out to be
i ⎡ p 2 + λ2n 2 nμ pν + pμ n ν ⎤
Dμν = − ⎢ημν + p p − ⎥. (2.132)
p2 ⎣ λ 2 (n · p ) 2 μ ν
n·p ⎦
Note that we can choose to absorb λ into n μ if we want, a fact that is already evident
from the gauge-fixing term of equation (2.130). In the covariant gauge, we have
i ⎡ pp⎤
−2 μ ν
Dμν = − ⎢ημν − (1 − λ ) ⎥. (2.133)
p2 ⎣ p2 ⎦
A special case of this is the so-called Feynman gauge (λ = 1), which has the
particularly simple propagator
iημν
Dμν = − . (2.134)
p2
This is by far the simplest choice for practical calculations, and so we will adopt the
Feynman gauge from now on. In QCD, the gluon propagator in the Feynman gauge
is
ab
iημνδ ab
Dμν =− . (2.135)
p2
This depends on the spacetime indices as before, but also depends in principle on the
adjoint (colour) indices associated with the endpoints of a given gluon line (see
Figure 2.10). However, we see from equation (2.135) that this colour dependence is
trivial, in that the colour charge of the gluon does not change as it propagates.
Such is the simplicity of the Feynman gauge propagator compared with e.g. the
axial gauge propagator of equation (2.132), that you may be wondering why people
ever bother discussing alternative gauge choices at all. One use of them is that it can
be useful to have the arbitrary parameters λ (in covariant gauges) or n μ (in axial
gauges) in the results of individual Feynman diagrams. If any dependence on these
quantities remains after adding all the diagrams together, we know we have made a
mistake somewhere, as the resulting calculation for the amplitude is not gauge
invariant. Thus, alternative gauge choices allow us to check calculations, at least to
some extent.
There is also another complication associated with non-axial gauges. The
propagator describes how a gluon or photon of off-shell four-momentum p travels
2-45
Practical Collider Physics
p ab
2 2
Figure 2.11. The propagator for the ghost field in the Feynman gauge.
l q i j i j
a
f ij s ji
f
Figure 2.12. Feynman rules for the interaction of a lepton l and quark q with a photon/gluon, where i and j
denote the colours of the incoming/outgoing quark. Here Qf is the electromagnetic charge of the fermion in
units of the magnitude of electron charge e, gs the strong coupling constant, γ μ a Dirac matrix (spinor indices
not shown), and t jia an element of an SU(3) generator in the fundamental representation.
throughout spacetime. As p2 → 0 the particle goes on-shell, and we then expect that
only two physical (transverse) polarisation states propagate. In covariant gauges
(and in particular the Feynman gauge), however, this does not happen: there is an
unphysical (longitudinal) polarisation state, that leads to spurious contributions in
Feynman diagrams with loops. In non-Abelian theories, these contributions survive
even when all diagrams have been added together, such that they have to be
removed. It is known that one can do this by introducing a so-called ghost field,
which is a scalar field, but with fermionic anti-commutation properties. The latter
leads to additional minus signs relative to normal scalar fields, that end up
subtracting the unwanted contributions, to leave a correct gauge invariant result.
Furthermore, the ghost field has an adjoint index a (like the gluon), so that it can
cancel spurious contributions for all possible colour charges.
From the above discussion, it is clear that ghosts are not needed in QED, as the
spurious contributions only survive in non-Abelian theories. In QCD, ghosts turn
out to be unnecessary in the axial gauge, which makes sense: the gauge condition
n μAμ = 0 means that only states transverse to nμ can propagate, i.e. only physical
transverse states are kept in the on-shell limit. In the Feynman gauge, ghosts are
indeed required, and the relevant propagator in the Feynman gauge is shown in
Figure 2.11.
As well as the Feynman rules for external particles and internal lines, we also need
the rules for vertices. A full list of Feynman rules can be found in any decent QFT
textbook. Here, we provide a partial list of the more commonly used rules. Those
involving fermions and photons/gluons can be found in Figure 2.12. Vertices for the
gluon self-interactions, and ghost-gluon interactions, are shown in Figure 2.13. With
adjoint indices, four-momenta, and Lorentz indices labelled as in the figure, the
three-gluon vertex rule is
2-46
Practical Collider Physics
1 2 1 2
b c p
p p2
1
p p2
1
p3 p4
p3
3 3 4
Figure 2.13. Vertices for gluon self-interactions, and the ghost-gluon interaction, in the Feynman gauge. Here
{a, b, c} label adjoint indices, {pi , p } four-momenta, and {μ, μi } Lorentz indices.
Earlier we referred to gs as the strong coupling constant. We can see why from the
fact that all vertices involving gluons have at least one power of gs. Thus, it
represents the strength of interaction between quarks and gluons (or between gluons
and other gluons). We can also directly verify the statement made earlier that ghosts
decouple in QED: the ghost-gluon vertex of equation (2.137) contains the structure
constants f abc , which are zero for an Abelian theory.
Having stated more than the Feynman rules we will need for later, the complete
procedure for calculating the scattering amplitude A is as follows:
2-47
Practical Collider Physics
−
e− p1 p3 e− p1 p3 q
p p1 + p2 p4 p
2 2 j
p4 _
+
e+ e+ q
(a) (b)
Figure 2.14. (a) Feynman diagram for muon production via electron-positron annihilation; (b) a related
Feynman diagram for quark/antiquark production, where i and j label colours.
6. Add all diagrams together to get the scattering amplitude A , where some
relative minus signs are needed for fermions (again, see your QFT books).
Propagator
vertex
μ
±
× u¯(p3 ) × ( −ieγ ν ) × v(p4 ) ,
⏟
Outgoing μ−
⏟
Outgoing μ +
2-48
Practical Collider Physics
ie 2Qqδij
A ij = [v¯ (p2 )γ μu(p1 )][u¯(p3 )γμv(p4 )]
s (2.140)
= δijQq A QED,
where Qq is the electromagnetic charge of the quark q, and we have labelled by A QED
the muon amplitude from above.
where we have used the fact that each fermion or antifermion has two independent
spin states. For photons or gluons, we may use the fact that these have (d − 2)
transverse spin states in d spacetime dimensions, so that averaging gives a sum
1 1
(d − 2)
∑ ⎯d⎯⎯⎯⎯⎯⎯⎯
→4
→ ∑,
2
pols pols
where the subscript on the summation stands for ‘polarisations’. We will see later
why we might want to consider values of d that are not equal to 4.
In any given experiment, we have no control over the spin and/or colour of the
produced particles. Thus, analogous to the sum over final state momenta that we
performed earlier, we have to sum over final state spins and colours. It is then
conventional to label by ∣A∣2 the squared amplitude averaged/summed over initial/
final state spins and colours, as distinct from equation (2.141), which is for fixed
spins and colours. Following equation (2.125), we may write the unpolarised cross-
section as
1
σ=
F
∫ dΦ ∣A∣2 . (2.142)
2-49
Practical Collider Physics
∑ u(p)u¯(p) = p + m, p ≡ pμ γ μ. (2.144)
spins
Here we have not shown explicit spinor indices, but note that both sides are a matrix
in spinor space. If we do insert spinor indices, it has the form
Here μ is a Lorentz (spacetime) index, and {α , β} are indices in spinor space. The
confusing nature of multiple indices is the primary reason why we do not show
spinor indices more often. Another completeness relation concerns the sum over the
spin states of an antifermion12:
For a massive vector boson, the sum would be over three physical polarisation states:
two transverse, and one longitudinal. This gives the known completeness relation
pp
∑ ϵμ†(p)ϵν(p) = −ημν + Mμ 2ν , (2.149)
pols
where M is the mass of the vector boson. To get the corresponding result for
(massless) photons and gluons, note that we cannot simply take M → 0 in equation
(2.149)! Something strange happens if we do, which shows up as a weird
divergence. What this ultimately tells us is that there is no unambiguous answer
for the spin sum for massless vector bosons, and indeed this makes sense based on
12
Again we follow convention in omitting an identity matrix in spinor space in the second term.
2-50
Practical Collider Physics
what we saw earlier, namely that for virtual particles, the states that propagate
depend on which gauge we are in. Unphysical states can be cancelled by adding
ghosts, and one way to proceed for external photons/gluons is indeed to sum over
all polarisations (the physical transverse ones, and the unphysical longitudinal
one), which turns out to give
We can then correct for the inclusion of the unphysical longitudinal polarisation
state by adding external ghosts, using the appropriate vertices. An alternative to this
procedure is to introduce an arbitrary lightlike vector n μ such that
n μAμ (p ) = p μ Aμ (p ) = 0, (2.151)
where the second equality follows from the covariant gauge condition of equation
(2.129). These two conditions ensure that only polarisation states transverse to both
n μ and p μ survive, which indeed gives the correct number of degrees of freedom
((d − 2) in d spacetime dimensions). Then it can be shown that, if we sum over
physical polarisations, we get
pμ n ν + pν nμ
∑ ϵμ†(p)ϵν(p) = −ημν + p·n
. (2.152)
pols
2-51
Practical Collider Physics
(n.b. each quantity in square brackets is a number, such that the ordering of the
sqaure bracketed terms is unimportant). Next, we can use the fermion completeness
relations of equations (2.144) and (2.146). Consider for example the combination
∑ [v¯ (p2 )γ μu(p1 )] [u¯(p1 )γ νv(p2 )] = ∑ v¯α(p2 )γαβμuβ(p1 )u¯ γ(p1 )γγδνvδ(p2 ),
spins spins
where we have inserted explicit spinor indices. Upon doing so, we can move factors
around in this equation, given that each component (labelled by specific indices) is
just a number. In particular, we can rearrange the expression as
∑ [vδ(p2 )v¯α(p2 )]γαβμ[uβ(p1 )u¯ γ(p1 )]γγδν = ( p 2 )δα γαβμ( p1 )βγ γγδν,
spins
where we have used the completeness relations in the second line, and neglected the
electron and muon masses for convenience (me ≃ mμ ≃ 0 in any experiment at current
energies). We can now note that the remaining spinor indices have arranged themselves
into a string representing a matrix multiplication, where the initial and final index δ is the
same (and summed over). Thus, we can recognise the final result as a trace in spinor space:
(again neglecting the fermion masses). As you practice with more squared ampli-
tudes, you will see that spinor factors and Dirac matrices always combine to make
traces in spinor space, and in fact this must be the case: the squared amplitude is a
number, and thus cannot carry any free spinor indices.
Our averaged and summed squared amplitude is now
e4 1
∣A∣2 ≃ tr[ p 2 γ μ p1γ ν ]tr[ p 3γμ p 4 γν ]
s2 4
e4 1
= 2 ⎡⎣4(p2μ p1ν − p1 · p2 η μν + p2ν p1μ )⎤⎦⎡⎣4 p3μ p4ν − p3 · p4 ημν + p4μ p3ν ⎤⎦
( )
s 4
e4
= 2 [2(u 2 + t 2 ) + (d − 4)2 s 2 ],
s
where we have used equation (2.147) in the second line, and in the third line
introduced the Mandelstam invariants
s = (p1 + p2 )2 ≃ 2p1 · p2 ,
t = (p1 − p3 )2 ≃ −2p1 · p3 , (2.156)
2
u = (p1 − p4 ) ≃ −2p1 · p4 .
2-52
Practical Collider Physics
Note that we can also use this to get the squared amplitude for quark–antiquark
production. The only difference in the amplitude for the latter with respect to muon
production is the presence of the electromagnetic charge of the quark, and a
Kronecker symbol linking the colours of the outgoing quark and antiquark, as we
saw in equation (2.140). Then the squared amplitude, summed over colours, is
where Nc is the number of colours. For QCD, we have Nc = 3, but many people
leave factors of Nc explicit so that the origin of such factors is manifest. This makes it
easier to check calculations.
This is Lorentz invariant as we saw before, but to actually carry out the integral we
must choose a frame. Let us choose the lab frame, in which the incoming electron
and positron have equal and opposite momenta. This corresponds to the frame in
which the collider’s detector is at rest, and examples of such experiments include the
old LEP experiment at CERN, and all proposed linear colliders. In this frame, the
incoming beams of energy E have four-momenta
p1μ = (E , 0, 0, E ), p2μ = (E , 0, 0, −E ), (2.160)
where we have taken the z direction to correspond to the beam axis as usual, and
also used the fact that we are neglecting masses. For the outgoing momenta, we can
parametrise p3 in a spherical polar coordinate system:
p3μ = (E ′ , E ′ sin θ cos ϕ , E ′ sin θ sin ϕ , E ′ cos θ ), 0 ⩽ θ ⩽ π , 0 ⩽ ϕ ⩽ 2π , (2.161)
2-53
Practical Collider Physics
where E ′ is the energy. We can then find p4 from the momentum conservation condition
p1 + p2 = p3 + p4 , (2.162)
which gives
p4μ = (E ′ , −E ′ sin θ cos θ , −E ′ sin θ sin ϕ , −E ′ cos θ ). (2.163)
We then find
(2π )4 d3p3 d3p4 (4)
∫ dΦ(2) =
4(2π )6
∫ E′
∫ E′
δ (p3 + p4 − p1 − p2 )
1 d3p3
= ∫ δ (p30 + p40 − p10 − p20 )
16π 2 (E ′)2
1 π 2π ∞ ∣p ∣2 sin θ
=
16π 2
∫ 0
dθ ∫ 0
dϕ
0
∫d∣p3 ∣ 3
(E ′)2
δ(2E − 2E ′),
where we have used the momentum-conserving delta function to carry out one of the
momentum integrals, and inserted the usual volume element in spherical polars. We
can use the fact that ∣p3 ∣ = E ′, as well as the useful delta-function relation
1
δ(ax ) = δ(x ), (2.164)
a
to get
π 2π ∞
∫ dΦ = 321π 2 ∫0 dθ ∫0 dϕ ∫0 dE ′ sin θδ(E − E ′). (2.165)
e 4E 4 1
= 2 2 2π
2π s −1
∫
dx(1 + x 2 ), x = cos θ ,
where we have used the fact that the integrand is independent of ϕ to carry out the
azimuthal angle integral. Carrying on, we find
2-54
Practical Collider Physics
e 4E 4 1
∫ dΦ ∣A∣2 = ∫− 1 dx(1 + x 2 )
πs 2
e 4E 4 ⎡ x3 ⎤
1
= 2 ⎣
⎢x + ⎥
πs 3 ⎦−1
e 4E 4 8
= .
πs 2 3
Next, note that
s = (p1 + p2 )2 = (2E )2 = 4E 2,
so that we finally obtain
4
∫ dΦ(2) ∣A∣2 = 6eπ .
To get the cross-section, it remains to divide by the Lorentz invariant flux factor,
which in the present case (from equation (2.124)) is
1/2 m1, 2 → 0
F = 4⎡⎣(p1 · p2 )2 − m12m 22⎤⎦ ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯→ 2s .
2-55
Practical Collider Physics
e− q e− q e− q
_ _ _
e+ q e+ q e+ q
(a) (b)
Figure 2.15. (a) Radiation of a real photon or gluon in quark/antiquark production; (b) a virtual correction.
4πNcQ q2α 2
σqq¯ = . (2.168)
3s
Summing over quark flavours, we can define the convenient ratio
∑q σqq¯
R= = Nc ∑ Q q2. (2.169)
σ μ+μ− q
Historically, this ratio was measured using the cross-sections for e +e− to hadrons and
muons respectively, and was an important verification of QCD, given that it is
sensitive both to the number of quark colours and the charges of the quarks.
Note that we have so far only considered Feynman diagrams with the smallest
number of vertices for a given initial/final state (i.e. the diagrams of Figure 2.14).
However, in general we also need to consider the emission of additional radiation,
such as the Feynman diagrams in Figure 2.15(a). This looks like a different process
than before due to the presence of the extra gluon or photon. But we may not see the
extra particle in the detector if its energy is too low, or if it is very close to one of the
other particles, or goes down the beampipe etc. Then the process looks similar to the
one we started with. We may also wish to consider an observable in which we sum
over any number of extra radiated particles. Then we would have to include all the
relevant radiative Feynman diagrams.
As well as real radiation, there are also loop diagrams corresponding to extra
virtual particles. An example is shown in Figure 2.15(b), which shows a gluon being
emitted and absorbed by the outgoing quark/antiquark pair. This process has the
same initial and final state as the one we started with, and so is certainly some sort of
correction to it.
Both real and virtual diagrams with extra radiation involve more vertices, and
thus more powers of the coupling constants e and gs. Hence, drawing Feynman
diagrams with more loops/legs amounts to an expansion in the coupling, and this is
called ‘perturbation theory’. It is valid only if e and gs are numerically small enough,
otherwise the series is not convergent. The simplest diagrams for any process give the
leading order (LO) in the coupling. Corrections are then called next-to-leading order
(NLO), next-to-next-to-leading order (NNLO) and so on. For most processes, LO in
the QED coupling is sufficient, given that e is numerically small. However, the QCD
coupling is larger (it is the strong force after all), so that we definitely have to include
higher orders in practical cross-section calculations, in order to get accurate results.
Unfortunately, it turns out that as soon as we go to NLO, quantum field theory
2-56
Practical Collider Physics
suddenly becomes infinitely weirder and more complicated! We will study this first
for the simpler case of QED, before discussing how to generalise the results to QCD,
which is of more practical importance. The results will then allow us to calculate
higher-order Feynman diagrams, and thus to begin to describe measurable quanti-
ties at particle colliders. We will also have to understand how to calculate processes
with incoming (anti)-protons, rather than electrons or positrons.
i( p1 − k + m)
× γ αu(p1 )u¯(p3 )γ νv(p4 )
(p1 − k )2 − m 2
e4
= [u¯(p3 )γ μv(p4 )]v¯ (p2 )γ αIγαu(p ),
s
where the integral in the first line is over the undetermined loop momentum k, and
we have defined the integral (matrix-valued in spinor space)
dd k ( p 2 + k + m)γμ( p1 − k + m)
I= ∫ (2π )d k 2[(p2 + k )2 − m 2 ][(p1 − k )2 − m 2 ]
. (2.170)
The full integral is very complicated, but we can see roughly what happens if all
components of the loop momentum k μ are large. To do this, we can let K denote a
generic component of k μ, and pretend k ∼ K and k μ ∼ K are just numbers. We can
−
e− p1
p3
p1 − k
k
p4
p2 + k
+
e+ p2
Figure 2.16. Virtual correction to muon production in electron/positron scattering.
2-57
Practical Collider Physics
then understand how the integrand scales with K. This procedure is called ‘power
counting’, and taking K large yields
K2
I∼ ∫ d4K K6
∼ ∫ d4K K −4, (2.171)
as K → ∞. We see that the integrand behaves like k −4 as k μ → ∞, but there are four
integrals over the components of k μ. By comparing with the integral
∫ dx x−1 ∼ ln x,
we thus expect that the integral I is logarithmically divergent for large momenta. A
more rigorous treatment confirms this, which is why the Feynman diagram of
Figure 2.16 evaluates to infinity. Such divergences are called ultraviolet (UV), as they
are associated with high energies/momenta.
Having seen the origin of the infinity, we can now ask what it actually means, and
thus whether there is a genuine problem that prevents us using QFT at higher orders.
To this end, note that the infinity arises because we integrate up to arbitrarily high
values of ∣k μ∣, which itself involves a particular assumption, namely that we
understand QED up to infinite energies or momenta. It is surely quite ridiculous
to assert this, given that any real particles that we can measure have some finite
momentum, and thus no previous experiments allow us to claim that we understand
QED at very high energies. Indeed, at some high enough energy/momentum
(equivalent, from the uncertainty principle, to very short spacetime distances),
QFT may get replaced with a completely different theory.
The above discussion suggests that we put a cutoff Λ on the momenta/energies
that we consider, below which scale we trust QED, but above which we make no
assumptions. This will work provided that physics at collider energies is independent
of physics at much higher energies. This sounds reasonable, and indeed there is a
wide family of theories—including the SM—where this ‘separation of scales’ works.
However, the precise cutoff is arbitrary: given a cutoff Λ in momentum/energy, we
can clearly choose a different value Λ′ < Λ , provided this is still much higher than
the collider energy. Then the only way that low energy physics can be independent of
the cutoff choice is if the behaviour in Λ can be ‘factored out’ of the parameters of
the theory.
To make this idea more precise, consider writing our original QED Lagrangian as
1
LQED = − F0μνF0μν + Ψ̄0(i ∂ − m 0)Ψ0 − e0Ψ̄0A0μ γμΨ0. (2.172)
4
What we are then saying is that each field, mass or coupling can be written
2-58
Practical Collider Physics
renormalised fields. Likewise, mR and eR are the renormalised mass and coupling,
respectively. By contrast, the original fields (A0μ, Ψ0) and parameters (m 0 , e0 )
occuring in the classical Lagrangian are called bare quantities. They are infinite,
but this is not actually a problem: provided we can set up perturbation theory purely
in terms of (finite) renormalised quantities, everything we calculate will be finite.
Furthermore, there is no cheating or black magic involved in this procedure: if we
believe that physics at collider energies should be independent of much higher
energies, then it must be possible that we can redefine the bare quantities in this way!
The redefinitions of equation (2.173) are all very fine, but they are not unique. The
quantities {Zi} depend on the regulator we used to make the loop integral finite (in
this case the cutoff scale Λ). Furthermore, we are free to absorb any finite
contributions into the {Zi}, and not just the contributions that become infinite as
the regulator is removed (Λ → ∞). Put another way, the renormalised quantities
(ARμ , ΨR, mR , eR ) are not unique. Different choices are called different renormalisa-
tion schemes. Once we fix a scheme, we can measure the (finite) renormalised
quantities from experiment. We thus lose the ability to predict e.g. the value of the
electron charge and mass from first principles. But this is after all expected: their
values would depend on the underlying high energy theory, which we do not claim to
know about.
The complete renormalisation procedure has several steps:
2-59
Practical Collider Physics
and
i[ p δ 2 − (δm + δ 2 + δmδ 2 )mR ]
respectively.
13
You may also see a different terminology in some textbooks, in which the coefficients of each of the terms in
equation (2.179) is called a counterterm, rather than each δi . However, the two sets of counterterms are simply
related by a redefinition.
2-60
Practical Collider Physics
(a) (b)
Figure 2.17. Example counterterm Feynman rules.
(a) (b)
Figure 2.18. (a) A UV divergent diagram at NLO; (b) A counterterm diagram at the same order.
2-61
Practical Collider Physics
From now on, we will always be working with renormalised fields, masses and
couplings, and so we will drop the subscript R, for brevity.
where d < 4, there will be no (UV) divergence. Normally, it only makes sense to
calculate such integrals if d is an integer, i.e. it represents the number of components
of a position in spacetime. However, it turns out that the results of all Feynman
integrals in QFT end up being a continuous function of d. Thus, even if we start with
d only being defined at integer values, we can choose to use any non-integer value of
d we like! A common choice is to set
d = 4 − 2ϵ , ϵ > 0, ϵ ≪ 1. (2.180)
2-62
Practical Collider Physics
Here ce and cm are arbitrary constants, corresponding to the fact that one may
absorb finite contributions into the counterterms, alongside the formally divergent
part (i.e. the terms ∼ϵ−1). Different choices for (ce, cm ) constitute different renorm-
alisation schemes, and common choices include
ce = cm = 0,
which is known as minimal subtraction, or the MS scheme. More commonly used is
modified minimal subtraction (also known as the MS) scheme, which has
ce = cm = ln(4π ) − γE,
where γE = 0.5772156… is the Euler–Mascheroni constant (a known mathematical
quantity). These numbers look a bit bizarre when plucked out of the air like this, but
it turns out that all integrals in dimensional regularisation involve factors of
(4πe−γEμ2 )ϵ ,
2-63
Practical Collider Physics
which generates the above terms at O(ϵ 0 ). Choosing to absorb these contributions
into the counterterms then simplifies the form of higher-order perturbative
corrections.
Leaving the factors of (ce, cm ) arbitrary for generality, the bare coupling is related
to its renormalised counterpart by
⎡ e2 ⎛ 1 ⎞⎤
e0 = μϵe(1 + δe ) = μϵe⎢1 + ⎜ + ce⎟⎥ + O(e 4). (2.184)
⎣ 24π ⎝ ϵ
2 ⎠⎦
However, the bare coupling corresponds to the original coupling entering the
classical Lagrangian, before any quantum corrections have been computed. It
thus knows nothing about the regulators we use to make loop integrals finite, which
amounts to the condition
de 0
μ = 0, (2.185)
dμ
where we have included an additional factor of μ to match the way this equation is
usually presented. As you can explore in the exercises, equations (2.184) and (2.185)
then imply
de ( μ )
μ ≡ β(e ), (2.186)
dμ
where (taking ϵ → 0)
e3
β (e ) = + O(e 5). (2.187)
12π 2
This is called the beta function, and is defined by equation (2.186). Formally
speaking, equation (2.186) tells us how the renormalised coupling changes if we
make a different choice of the parameter μ entering regulated Feynman integrals.
However, we can interpret this a lot more physically as follows. Imagine we make
some choice μ = Q0, where Q0 is some energy scale. Then we can measure the value
of e(Q0 ) from experiment. We could instead have chosen μ to be some different
energy scale Q1, in which case we could have measured e(Q1). From equation
(2.186), the two results are related via
e(Q1) Q1
de
∫e(Q )
0 β (e )
= ∫Q 0
d ln μ,
We see that the renormalised ‘coupling constant’ e is not constant after all, but varies
with energy. The absolute value of e at some scale cannot be predicted from first
2-64
Practical Collider Physics
principles—we gave up the ability to do this when we claimed that high energy
physics should not influence low energy physics, in setting up renormalisation.
However, how the coupling changes with energy is completely predicted by QFT
perturbation theory! To see what the dependence on energy looks like, the shape
implied by equation (2.188) is shown in Figure 2.19. We see that the QED coupling
increases with energy scale or, from hand-wavy uncertainty principle arguments,
decreases with distance scale. There is even a nice (albeit rather silly) physical
interpretation of this. Consider a point charge (e.g. a single real electron) located
somewhere in space. In a quantum theory, virtual e ± pairs can be created in the
vacuum, provided they disappear after a suitable timescale dictated by the
uncertainty principle. These e ± pairs would form dipoles, that would line up, and
effectively ‘screen’ the charge as we move away from it, as shown in Figure 2.20. In
other words, quantum effects lead to the coupling (i.e. the strength of the charge of
the electron) decreasing with distance, which is precisely the behaviour we found
above.
Note from equation (2.188) that the coupling apparently diverges when
e 2(Q 0) ⎛ Q1 ⎞ ⎡ 6π ⎤
ln ⎜ ⎟ = 1 ⇒ Q1 = Q 0 exp ⎢ 2 ⎥.
6π ⎝ Q0 ⎠ ⎣ e (Q 0 ) ⎦
Experiments tell us that
e 2 (Q 0 ) 1
≃ at Q 0 ≃ 91 GeV,
4π 137
which in turn implies that e diverges at ≃1 × 1091 GeV . This is known as the
‘Landau pole’, and does not tell us in practice that the coupling would diverge: as the
coupling gets larger, we would have to take higher orders into account in the beta
e 2 (Q)
0.85
0.80
0.75
0.70
0.65
Q
0.2 0.4 0.6 0.8 1.0
2-65
Practical Collider Physics
−
− −
+ + +
− + e− + −
+ +
+
− −
−
Figure 2.20. Virtual electron–positron pairs can screen a real charge, so that the coupling decreases at large
distances.
function, which may modify the behaviour. At some point, perturbation theory may
cease to be valid at all. This argument does, however, act as a rough guide as to the
energy scale at which QED may break down, and has to be replaced with something
else (that may or may not be a QFT). In the SM, QED does not exist in isolation, but
mixes with the weak force at energy scales O(102) GeV . However, there remains a
coupling constant which diverges at high energy. This energy is much higher than
the Planck scale at which we expect quantum gravitational effects must be taken into
account, and thus the Landau pole is not a problem in practice.
We have seen that the renormalised coupling depends on an energy scale, and the
same argument can be used for renormalised masses. That is, the bare mass must be
independent of the dimensional regularisation parameter μ:
dm 0
μ = 0, (2.189)
dμ
which turns out to imply an equation of the form
dm(μ)
μ = γmm(μ), (2.190)
dμ
where m is the renormalised mass, and γm is called its anomalous dimension. From
similar arguments to those applied to the coupling, we see that masses also vary with
energy scale, where this variation is entirely predicted by perturbation theory.
The variation of couplings and masses with energy is also known as running, so
that e(μ) and m(μ) are sometimes referred to as the running coupling and running
mass, respectively.
2-66
Practical Collider Physics
14
Confusingly, both αS and gs are commonly referred to as the strong coupling constant.
2-67
Practical Collider Physics
α s (Q)
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05 Q
1 5 10 50 100 500 1000
Figure 2.21. Behaviour of the QCD running coupling αS(Q ) with energy scale Q.
p p
⎪
⎬ ⎟
α μ 2 ⎪ i
· j ⎪⎟ , (2.195)
⎝⎩ μR ⎭ ⎩
⎪ ⎪
n=0 ⎭⎠
where {pi } are the four-momenta entering the scattering process. At NmLO in
perturbation theory, we should use the running coupling calculated at N m−1 LO (e.g.
at LO the coupling does not run at all; the running starts at NLO). Then the μR
variation is a N m+1 LO effect.
This is all very fine, but the question remains of what we should choose for the
arbitrary scale μR , and at first sight it looks like our predictions are completely
arbitrary. However, the key point to realise is that the scale dependence is directly
related to the fact that we have neglected higher orders in perturbation theory, and
so the size of the scale variation should be comparable with the size of the terms we
neglect. A common practice is then to choose μR2 = Q 2 , where Q2 is some
representative squared energy scale in the process (e.g. the centre-of-mass energy
2-68
Practical Collider Physics
in a e ± collider, or the top quark mass in top quark pair production). One can then
vary μR within some range of the default choice e.g.
Q
⩽ μR ⩽ 2Q , (2.196)
2
and use the resulting envelope for observable X to estimate the size of higher-order
corrections, and thus a theoretical uncertainty.
Here σ̂ij is the cross-section for (anti)-quarks and/or gluons, where i and j label the
species of colliding particle. The function fi (xk ) labels the probability to find a
particle of type i with momentum fraction xk inside incoming proton k. Then to get
the total cross-section, we have to sum over all possible momentum fractions, which
takes the form of integrals over the variables {xi}. Equation (2.198) predates QCD,
and indeed was first proposed by Feynman in the 1970s. He called it the ‘parton
model’, where his ‘partons’ were some set of postulated constituents of the proton. It
was later realised that these partons correspond to the (anti-)quarks and gluons of
QCD, but to this day the word ‘parton’ is used to mean a general (anti)-quark or
gluon, as it is useful to have such a word! Likewise, σ̂ij in equation (2.198) is referred
to as the partonic cross-section. The functions fi (xk ) are called parton distribution
functions (PDFs). They involve strongly interacting physics (e.g. they are sensitive to
2-69
Practical Collider Physics
P1
l− p p3
1
l−
q
_
q
P2 l+
l+ p2 p4
(a) (b)
Figure 2.22. Drell–Yan production of a vector boson, which decays to a lepton pair: (a) full process, with
incoming protons; (b) LO partonic cross-section.
how partons are confined within hadrons), and thus cannot be calculated in
perturbation theory. However, they can be measured in experiments, and then
used to predict the results of future experiments. This is a whole subfield in itself, and
there are a number of collaborations worldwide that extract parton distributions
from data.
When it was first proposed, the parton model was indeed just a model used to fit
experimental data. However, it can be fully justified (with a few caveats) from first
principles in QCD, and in particular something called the operator product
expansion. Here, however, we shall merely try to show you why trying to calculate
the partonic cross-section unavoidably leads to requiring the existence of parton
distribution functions.
As an example, let us consider so-called Drell–Yan production of a virtual photon,
that decays to a lepton pair. The Feyman diagram for this process (complete with
incoming protons!) is shown in Figure 2.22(a), and the corresponding LO partonic
cross-section in Figure 2.22(b), where we have focused on a given quark flavour q.
Note that this is almost identical to the process e−e + → μ−μ+ of Figure 2.14, whose
squared amplitude (summed and averaged over spins) is given by equation (2.157).
To get the LO amplitude for DY production, we may first note that
ADY = δijQq A QED,
where A QED is the QED process, i and j the colour indices of the incoming quark and
anti-quark, respectively, and Qq the electromagnetic charge of the quark. When
forming the summed and averaged squared amplitude, we must include an average
over the incoming colours, given that we have no control over e.g. what colour of
quark comes out of the proton. This gives us
1
∣ADY∣2 = ∑ δijδijQq2 ∣A QED∣2
Nc2 i, j
1
= 2 ∑ δiiQq2 ∣A QED∣2 (2.199)
Nc i
2Q q2e 4 u 2 + t 2
= ,
Nc s2
2-70
Practical Collider Physics
where θ is the angle between the gluon and incoming quark (shown schematically in
Figure 2.23). Then the above propagator factor becomes
1 1
= .
−2p1 · k −2∣p1 ∣∣k∣(1 − cos θ )
To get the cross-section, we have to integrate over the phase-space of all final state
particles, but the amplitude clearly diverges if
∣k∣ → 0 and/or θ → 0.
The first of these is called a soft divergence, where ‘soft’ refers to a gluon whose four-
momentum components are all vanishingly small relative to some other momentum
scale in the problem (here the energy/momentum of the incoming quark). The
second is called a collinear divergence, as it arises when the angle between the
radiated gluon and the incoming quark vanishes. Collectively, these singularities are
referred to as infrared (IR) divergences, as they are associated with a region of low
energy/momentum. For the collinear case, this is the transverse momentum of the
radiated gluon relative to the quark direction. By their nature, IR divergences are
distinct from the UV divergences that we already encountered, that are removed via
renormalisation. The latter only occured in loops, and were ultimately not a problem
because they involved high energies, which we could not claim to know anything
about. IR divergences, on the other hand, involve real particles, and also energies
that we certainly do probe at colliders. At first glance, then, we seem to have a
k
q(p1 ) e− p3 + k q(p3 )
k
_ _
q(p 2 ) p1 − k e+ q(p4 )
(a) (b)
Figure 2.23. (a) Radiation of a gluon from an incoming quark leg; (b) final state radiation from an outgoing
quark.
2-71
Practical Collider Physics
genuine problem, namely that all cross-sections involving real radiation will be
infinite!
Above we saw that infrared divergences occur when we have radiation from
incoming particles. However, the problem is more general than this, and also occurs
for radiation from final state particles. An example is shown in Figure 2.23(b), in
which a gluon is radiated off the final state quark in the process of Figure 2.14. This
produces an extra propagator, which involves a factor
1 1
2
= ,
(p3 + k ) ∣p3 ∣∣k∣(1 − cos θ )
where again θ is the angle between the quark and gluon. Studying this final state
radiation in more detail will provide clues as to how to proceed. First, note that in
the cross-section, we must integrate over the final-state phase-space, which will
behave as follows:
3
∫ dΦ ∣A∣2 ∼ ∫ 2∣kd∣(2kπ )3 ∣k1∣2 ,
where the factor on the far right-hand side comes from squaring the extra
propagator we have found above. We thus find an integral
d3k
∼
∣k∣3
as ∣k∣ → 0, whose lower limit (zero) produces a logarithmic divergence.15 As for UV
divergences, a practical way to regularise such divergences is to use dimensional
regularisation. In this case, we must raise the number of dimensions slightly, to
reduce the divergence as ∣k∣ → 0. Thus, we can set d = 4 − 2ϵ as before, but where
now ϵ < 0 rather than ϵ > 0. This is in fact one of the main reasons why dimensional
regularisation is so widely used: we can use it to simultaneously regularise UV and
IR divergences, where the necessary sign of ϵ tells us which type of divergence we are
dealing with!
Figure 2.23(b) is a contribution to the process
e− + e+ → q + q¯ + g. (2.200)
Including all contributions (e.g. radiation from the anti-quark as well as from the
quark), the total cross-section for real radiation turns out to be
αSCF ⎛2 3 19 ⎞
σreal = σ0 H (ϵ )⎜ 2 + + − π 2⎟ + O(αS2 ) , (2.201)
2π ⎝ ϵ ϵ 2 ⎠
15
There is no logarithmic divergence at high values of k , as these will be cut off by the physical centre-of-mass
energy.
2-72
Practical Collider Physics
e− q
_
e+ q
2-73
Practical Collider Physics
Given that we expect a problem if the additional gluon is collinear with one of the
incoming momenta, let us take k∥p1 and see what happens. However, we cannot just
set k ∝ p1, as then the squared amplitude will diverge. Instead, we must find a way to
parametrise k μ, which will allow us to smoothly approach the limit in which it
becomes collinear to p1. To this end, a useful trick is a so-called Sudakov
decomposition, which starts with the observation that it is generally true that we
can write
k μ = c1 p1μ + c2 p2μ + k Tμ,
_
q
Figure 2.25. An initial state involving three partons.
2-74
Practical Collider Physics
where the first integral on the right-hand side corresponds to the additional gluon,
and the remaining integral is that of the LO process, but with p1 → p1 − k . This in
turn implies that the partonic cross-section for the radiative process has the form
k∥p1 1
d∣kT∣2 ˆ
σˆqqg
¯ ⎯⎯⎯⎯→ ∫0 dz ∫ ∣kT∣2
Pqq(z )σˆqq¯(zp1, p2 ), (2.212)
where
α C (1 + z 2 )
Pˆqq(z ) = S F . (2.213)
2π 1 − z
In words: the cross-section including an extra gluon factorises in the collinear limit
k∥p1, where the additional gluon is described by the function P̂qq(z ), where z
represents the momentum fraction of the incoming quark that is carried by the
quark after the gluon emission. This multiplies the LO cross-section with shifted
2-75
Practical Collider Physics
kinematics p1 → zp1. There is then an integral over all possible momentum fractions
z, and gluon transverse momenta kT . The collinear singularity we were expecting
shows up in the ∣kT∣2 integral, as this variable approaches zero.
According to equation (2.198), the hadronic cross-section in this limit is given by
k∥p1 1 1
σ ⎯⎯⎯⎯→ ∫0 dx1 ∫0 dx 2 fq (x 1) fq¯ (x 2 )
(2.214)
⎡ 1
d∣kT∣2 ˆ ⎤
⎢σˆqq¯ (p1 , p2 ) +
⎣
∫0 dz ∫ ∣kT∣2
Pqq ( z ) σ
ˆqq
¯ ( zp1, p2 ⎥,
)
⎦
where the first term in the square brackets is the LO contribution. Given that
p1 = x1P1 (where P1 is the proton momentum), we can shift x1 → x1/z in the second
term, which after some work gives
k∥p1 1 1
σ ⎯⎯⎯⎯→ ∫0 dx 1 ∫0 dx 2 σˆqq¯ (p1 , p2 ) fq¯ (x 2 )
(2.215)
⎡ 1
dz d∣kT∣2 ˆ ⎛ x 1 ⎞⎤
⎢ fq(x 1) +
⎣
∫x 1 z
∫ ∣kT∣2
Pqq ( z ) fq ⎝ z ⎠⎥⎦
⎜ ⎟ .
where μF is called the factorisation scale. Equation (2.216) amounts to absorbing the
collinear divergence as ∣kT∣ → 0 into the bare PDF. If we then write the hadronic
cross-section in terms of modified PDFs at this order in perturbation theory, it
becomes
2-76
Practical Collider Physics
k∥p1 1 1
σ ⎯⎯⎯⎯→ ∫0 dx1 ∫0 dx2σˆqq¯(p1 , p2 )fq¯ (x2 )
⎡ (2.217)
1
dz d∣kT∣2 ˆ ⎛ x ⎞⎤
⎢fq (x1) + ∫x ∫μ Pqq(z )fq ⎜ 1 ⎟⎥ .
⎣ 1 z 2
F ∣kT∣2 ⎝ z ⎠⎦
In writing this formula, we have used the fact that we do not need to modify the anti-
quark PDF in this limit, so have simply set f q¯0 ≡ fq¯ . We see that equation (2.217) no
longer has a collinear singularity as k∥p1. It has been removed by the factorisation
scale μF , which acts as a cutoff on the transverse momentum integral. However,
there is still a potential problem in equation (2.217): we must integrate over all
values of z, but the function P̂qq(z ) of equation (2.213) has a divergence as z → 1. We
said that (1 − z ) represents the momentum fraction of the incoming quark
momentum that is carried by the gluon in the collinear limit, and thus z → 1
corresponds to the gluon being soft, with k μ → 0. We indeed expect a singularity in
this limit. However, it turns out to cancel when we add virtual diagrams. This is a
long calculation by itself, but the result is quite simple: virtual corrections have the
effect of modifying
α C ⎡ 1 + z2 3 ⎤
Pˆqq(z ) → Pqq(z ) = S F ⎢ + δ(1 − z )⎥ ,
2π ⎣ (1 − z )+ 2 ⎦
which contains a type of mathematical distribution called a plus distribution. This is
defined by its action on an arbitrary test function f (x ):
1 1
f (x ) f (x ) − f (1)
∫0 dx
(1 − x )+
≡ ∫0 dx
1−x
. (2.218)
2-77
Practical Collider Physics
αSCF ⎡ 1 + z 2 3 ⎤
Pqq(z ) = ⎢ + δ(1 − z )⎥ ; (2.219)
2π ⎣ 1 − z + 2 ⎦
αSCF ⎡ 1 + (1 − z )2 ⎤
Pgq(z ) = ⎢ ⎥; (2.220)
2π ⎣ z ⎦
αSTR 2
Pqg(z ) = [z + (1 − z )2 ]; (2.221)
2π
αS ⎧ ⎡ z 1−z ⎤
Pgg(z ) = ⎨CA⎢ + + z(1 − z )⎥
2π ⎩ ⎣ (1 − z )+ z ⎦
(2.222)
(11CA − 4nf TR) ⎫
+ δ(1 − z ) ⎬.
6 ⎭
We derived Pqq(z ) above, and will not bother deriving the other splitting functions
explicitly. However, they follow a similar method. One may simply replace q → q¯ in
these expressions to get the corresponding splitting functions for antiquarks. Then
the fully general expression for modifying any bare PDF is
μ F2
d∣kT∣2 1
dz ⎛x ⎞
fi 0 = fi (xi , μF2 ) − ∑ ∫0 ∫x Pij (z )f j ⎜ i ⎟ , (2.223)
∣kT∣2 i z ⎝z⎠
j
where the sum includes all splittings consistent with the chosen PDF. The
factorisation scale μF is arbitrary. However, once we have fixed a choice, we can
measure the PDFs from data. In fact, we can say a lot more than this, as we discuss
in the following section.
zp
p
(1−z)p
Figure 2.26. Possible parton splittings, and their associated splitting functions.
2-78
Practical Collider Physics
d f i0 ∂ fi (xi , μF2 ) 1 1
dz ⎛x ⎞
= − ∑ ∫x Pij (z ) f j ⎜ i , μF2 ⎟ = 0,
dμF2 ∂μF2 μF2 i z ⎝ z ⎠
j
∂ fi (xi , μF2 ) 1
dz ⎛x ⎞
μF2 = ∑ ∫x Pij (z ) f j ⎜ i , μF2 ⎟ , (2.224)
∂μF2 i z ⎝ z ⎠
j
where Q2 is some process-dependent upper limit of ∣kT∣ (i.e. in practice any emitted
gluon has a finite upper limit on its transverse momentum). If we want to keep
perturbative corrections fairly small, it makes sense to choose μF ∼ Q , i.e. to
associate the factorisation scale μF with some hard energy or momentum scale in
the process. We can then interpret the DGLAP equations as telling us how the PDFs
vary with energy scale. Here, we used a momentum cutoff in ∣kT∣ to regularise
collinear singularities, which also allowed us to interpret the factorisation scale μF.
However, we could have used a different regulator, and we would still have found
the existence of μF , and the DGLAP equations. As discussed before, the most
commonly used regulator for partonic cross-sections is dimensional regularisation.
In that case, the scale μF arises as the scale that is used to keep the coupling gs
dimensionless in d dimensions. Furthermore, one can choose to absorb an arbitrary
finite contribution to the partonic cross-section into the PDFs, as well as the pure
collinear singularity. Different choices constitute different so-called factorisation
schemes, and a common choice is to use dimensional regularisation and the MS
scheme, in which a particular choice of numerical constants is removed. Once such a
choice is fixed, the partons can be measured from data.
The above discussion will hopefully remind you of something—the idea of factorisa-
tion is very similar to renormalisation. In both cases, divergences are absorbed by
redefining bare quantities. Then, the modified/renormalised quantities cannot be pre-
dicted from first principles, but their change in energy scale is calculable in perturbation
2-79
Practical Collider Physics
theory. This analogy is nice to think about, but it should be remembered that the nature of
the divergences in factorisation (IR) and renormalisation (UV) are fundamentally
different. Consequently, the factorisation scale μF and μR are different in principle,
although can be chosen to be the same in practice. Dependence on μF cancels between the
partonic cross-section and the (modified) partons, up to the order in perturbation theory
that we are working at. Thus, at a given order O(αSn ), dependence on μF and μR is O(αSn+1).
Varying μF and μR independently in some range
μ0
⩽ μF , μR ⩽ 2μ0
2
for a suitable default choice μ0 then gives a measure of theoretical uncertainty (i.e.
an estimate of higher-order corrections).
Our original interpretation of the PDFs is that they represent the probability to find
a given parton with a defined momentum fraction inside the parent proton. Once
higher-order QCD corrections are taken into account and we choose a particular
factorisation scheme, we lose this interpretation in general. For example, the DGLAP
equations imply that at a sufficiently low scale, the PDFs may become negative. This is
not a problem in principle—hadronic cross-sections are obtained by combining the
partons with partonic cross-sections, and will be positive provided we are at energy
scales where perturbation theory is valid. Nevertheless, physical properties of PDFs
survive in the form of sum rules. For example, if we look at the up quark PDF, it will
naïvely have contributions from two sources: (i) the two up quarks that are meant to be
in the proton; (ii) additional up quarks due to the froth of uu¯ pairs that can be made
due to quantum effects. Given that matter and antimatter are always created in equal
amounts, it follows that the second source of up quarks should be equal to the anti-up
distribution ū(z, μF2 ). Thus, subtracting the latter from the up distribution, we can form
the so-called valence up-quark distribution
where integrating over all momentum fractions for a fixed scale μF2 should tell us that
there are indeed two ‘net’ up quarks:
1
∫0 dx uV (x , μF2 ) = 2. (2.226)
If the partons indeed represent probability densities (i.e. per unit longitudinal
momentum fraction), the total momentum fraction carried by all partons must be
equal to one:
1
∫0 dx x ∑ fi (x , μF2 ) = 1, (2.228)
i
2-80
Practical Collider Physics
where the sum is over all flavours of quark and anti-quark, and also includes the
gluon. This sum rule survives even at higher orders in perturbation theory in
arbitrary factorisation schemes. The reason for this is that higher terms in
perturbation theory merely constitute splitting of the partons into other partons,
which always conserves momentum. Put another way, the DGLAP equations at
arbitrary order can always be shown to preserve equation (2.228).
Despite the complicated nature of the above calculations, we have still only given
a simplified treatment of PDFs! A fully rigorous treatment shows that hadronic
cross-sections actually have the form
1 1 ⎛ Λ2 ⎞
dx2 fi x1, μF2 f j x2 , μF2 σˆij xi , μF2 , {pi · pj } + O⎜⎜
QCD
σ= ∑ ∫0 dx1 ∫0 ( ) ( ) ( ) ⎟ , (2.229)
2 ⎟
i,j ⎝ Q ⎠
which differs from the naïve parton model of equation (2.198) in that it includes terms
depending on the ratio of the energy scale Λ QCD at which the QCD coupling becomes
strong, and the typical hard energy scale Q of the scattering process. These additional
terms are called power corrections, and we will not worry too much about them.
2-81
Practical Collider Physics
below), the various groups use something like the following procedure. First, we
may note that the DGLAP equations of Section 2.16 tell us that if we know the
PDFs at some low (squared) energy scale Q02 , we can predict their values at a higher
scale Q2. This suggests the following algorithm:
1. Choose a starting scale Q02 which is high enough for perturbation theory to
be valid, but lower than the characteristic energy scale of all scattering
processes for which data is available. A typical value is Q0 = 1 GeV .
2. At the starting scale, parametrise the PDFs as a function of x. For example,
early parton fits used functions such as
xfi (x , Q 02 ) = (1 − x )ηi (1 + ϵi x 0.5 + γix )x δi . (2.230)
This is called a global (parton) fit, where the word ‘global’ here means that one
tries to fit all free parameters simultaneously, using many different types of data. It
is clearly a highly intensive computational task, not least given that contemporary
PDF fits contain a corncuopia of datasets, from the LHC and previous colliders.
Furthermore, the above algorithm can be implemented at any given order in
perturbation theory, where the same order must be chosen for the DGLAP
splitting functions as for the other theory input to the fit (e.g. partonic cross-
sections, running couplings and masses etc). The state of the art for such fits is
NNLO, but collaborations continue to produce partons at (N)LO. This is due to
the fact that theory predictions that are used for many scattering processes remain
2-82
Practical Collider Physics
limited to lower orders in perturbation theory, and one should arguably use a
consistent set of partons16.
Parton fitting is a highly dynamic subfield of high energy physics, involving
constant theoretical developments in order to improve the precision of the PDFs one
obtains. Often, groups will produce a number of different PDF sets, where differing
assumptions have been made in the fitting procedure. These PDFs will then be suited
to a particular purpose, and it is important for experimentalists to know which
PDFs they should be using, and when. It is worth dwelling on a couple of issues that
can be important when choosing and using PDFs.
PDF uncertainties
There are clearly a number of sources of error in extracting the PDF distributions
from data:
• The datasets one uses have error bars associated with each data point,
comprising both statistical and systematic uncertainties. There may also be
correlations in uncertainty between different points within the same dataset,
or between different datasets.
• Traditional goodness of fit measures (e.g. the χ 2 mentioned above) assume
Gaussian uncertainties for the data, which may not actually be the case.
Furthermore, datasets might be incompatible due to historically poorly
understood systematic effects, with no way of telling which dataset(s) one
should ultimately trust.
• There may be a systematic bias introduced due to the fact that the para-
metrisation used for the PDFs at the starting scale is insufficiently flexible to
include all features of the genuine (unknown) PDFs.
For such reasons, all collaborations that produce PDFs include a mechanism for
calculating their uncertainties. They might provide a single set representing the best
estimate of the central value of each PDF, together with additional sets that represent
the uncertainty envelopes arising from e.g. a fixed increase in the χ 2 arising in the fit.
Detailed discussion of PDF uncertainties can be found in the publications produced by
each collaboration, although it is worth noting that the various systematic uncertainties
outlined above prompted the NNPDF collaboration to use a different fitting procedure
to that described in the previous section. Their approach involves simulating a number
Nrep of replicas of the combined datasets entering the fit, obtained by randomly varying
each data point within its uncertainty band. Then, a neural network (a type of fancy
interpolating function with a large number of free parameters, see Section 5.8.2) is
trained to each data replica, so as to parametrise the PDFs at the starting scale. This in
turn results in Nrep replica sets of PDFs, which can be used to obtain central values and
uncertainty bands, thereby alleviating the problems of parametrisation bias, incompat-
ible datasets, and non-Gaussian errors. However, broadly similar results for PDF
central values and uncertainties are obtained by other fitting groups.
16
Strictly speaking, it is not incorrect to use e.g. NLO partons, when calculating a hadronic cross-section at
LO. However, the opposite way around would indeed be incorrect.
2-83
Practical Collider Physics
e−
e−
Figure 2.27. Perturbative production of a charm quark pair (double line) at an e−p collider.
2-84
Practical Collider Physics
Further reading
There are a large number of textbooks on quantum field theory, spanning many
decades. Examples include:
17
Partons for the top quark are not necessary. It is much more massive than either the charm or bottom
quarks, and is also incredibly short-lived.
2-85
Practical Collider Physics
Exercises
2.1 (a) Show that the local gauge transformation of an electromagnetic field
Aμ(x ) acts on the electrostatic and magnetic vector potentials as
1 ∂α 1
ϕ→ϕ− , A→A+ ∇α ,
e ∂t e
where α is an arbitrary function of space and time.
(b) Hence show that the electric and magnetic fields
∂A
E = −∇ϕ − , B=∇×A
∂t
are gauge invariant.
2.2 (a) Define the electromagnetic field strength tensor F μν in terms of the
gauge field Aμ.
(b) Hence prove the Bianchi identity
∂ αF μν + ∂ νF αμ + ∂ μF να = 0.
(c) Show that the Bianchi identity implies the Maxwell equations
∂B
∇×E=− , ∇ · B = 0.
∂t
2.3 (a) Explain why the set of rotations in two spatial dimensions forms a
group.
(b) Is this a discrete group or a Lie group? If the latter, how many
generators do you expect?
2-86
Practical Collider Physics
2-87
Practical Collider Physics
transforms according to
Fμν → U(x ) Fμν U−1(x ).
(d) Hence show that the kinetic Lagrangian term
1
L = − tr⎡⎣Fμν Fμν⎤⎦
2
is gauge invariant.
2-88
Practical Collider Physics
2.7 Under a boost with speed v in the z direction, the Lorentz transformation
for energy and momentum is
E ′ = γ (E − βpz )
px′ = px
py′ = py
pz′ = γ ( −βE + pz ),
2.9 (a) The gluon propagator in an axial gauge (with vector n μ) is given by
iδ ab ⎡ q 2 + n2 (nμqν + qμn ν ) ⎤
ab
Dμν (q ) = − ⎢ημν + qq −
2 μ ν
⎥.
q ⎣
2
(n · q ) n·q ⎦
Show that, as q 2 → 0, one has
ab ab
q μDμν (q ) = n μDμν (q ) = 0,
and interpret this result.
(b) What happens in the Feynman gauge?
2.10 Draw all the leading order Feynman diagrams for the process
gg → tt¯,
i.e. top pair production in gluon–gluon collisions.
2.11 The Dirac matrices {γ μ} are defined by the relation
{γ μ, γ ν} = 2η μν ,
where η μν is the metric of Minkowski space. Prove the result
2-89
Practical Collider Physics
γ μ a γμ = (2 − d ) a ,
2-90
Practical Collider Physics
(e) Use the results of parts (c) and (d) to show that at NLO one obtains
(in four spacetime dimensions)
dZe e2
= − R2 ,
d ln μ 12π
and hence
de R eR3
= β(e R), β (e R ) = .
d ln μ 12π 2
2-91
Practical Collider Physics
(e) Explain why this result is divergent as one integrates over ∣kT∣2 , and
how this divergence can be removed.
(f) Explain why the remaining result is divergent as z → 1, and how
this divergence can also be removed!
2.17 For the unregularised DGLAP splitting functions {P̂ij}, explain why
Pˆqq(z ) = Pˆgq(1 − z ),
and also why
Pˆqg(z ) = Pˆqg(1 − z ), Pˆgg(z ) = Pˆgg(1 − z ).
2.18 The bare PDFs are related to the modified PDFs via
2-92
Practical Collider Physics
μ F2
d∣kT∣2 1
dz ⎛x ⎞
fi 0 (xi ) = fi (xi , μF2 ) − ∑ ∫0 ∫x Pij (z )f j ⎜ i , ∣kT∣2 ⎟ .
∣kT∣2 i z ⎝z ⎠
j
2.19 What factorisation scale would you choose for the top pair production
process of problem 2.10?
2.20 Varying the renormalisation and factorisation scales around some default
choice μR = μF = μ0 is a common procedure used to estimate the effect of
neglected higher-order corrections. Should one always choose a common
scale, or vary them completely independently of each other?
2-93
IOP Publishing
Chapter 3
From theory to experiment
Putting together everything we have learned so far, we have the following recipe for
calculating cross-sections for hadron colliders, at arbitrary orders in perturbation
theory:
1. Take your desired beyond the Standard Model (BSM) theory or the SM, and
calculate Feynman diagrams with incoming (anti-) quarks and gluons, to get
the partonic cross-section σ̂ij ;
2. Remove UV divergences via renormalisation, and fix the renormalisation
scale μR2 ;
3. Absorb initial state collinear divergences in the parton distribution functions
(PDFs), and set the factorisation scale μ F2 ;
4. Combine σ̂ij with the PDFs, to get the hadronic cross-section.
To get differential rather than total cross-sections, we can leave some of the phase-
space integrals undone. As we have seen, all of this constitutes a huge amount of work.
And unfortunately, the results of such calculations look almost nothing like what
comes out of a particle accelerator! First of all, the order in perturbation theory to
which we are able to calculate may be insufficient for some observables that are
measured. This may apply to total cross-sections, or more subtly to certain bins or
groups of bins in differential quantities. Related to this, the incoming/outgoing partons
radiate a large number of other partons—many more than can be calculated by using
exact Feynman diagrams. Secondly, free partons do not exist, but are confined within
colour-singlet hadrons. Thus, the (anti-)quark and gluon radiation must somehow
clump together to form hadrons, of which there are many different types.
These typically form well-collimated jets of particles, which are measurable in the
silicon tracker and hadronic calorimeters. Both of these effects are shown schemati-
cally in Figure 3.1, which is still a crude simplification, as we have also ignored the
remnants of the colliding protons. Some of the remaining particles go down the
Hard
process
Figure 3.1. (Anti)-quarks and gluons emerge from incoming protons, and radiate before colliding in a hard
process. Outgoing partons also radiate, and the radiation combines to make colour singlet hadrons (shown as
blue blobs).
beam-pipe, but there remains a messy underlying event made of hadrons originating
from the break-up of the protons. A further problem is that there are multiple
protons in each colliding bunch at the LHC, so that scattering events between pairs
of protons can overlap. This is called pile-up, and the effects increase with the
luminosity (i.e. the number of protons per bunch). Finally, the detectors have a finite
resolution, and non-trivial coverage (e.g. gaps, cracks etc). Thus, if we are really
serious about comparing theory with data, we have to face a huge range of
complications!
It is clearly not possible to calculate all of these effects from first principles.
Typically, the comparison of theory with data from collider experiments involves:
Often the results of the second and third of these activities are included in general
purpose software packages called (Monte Carlo) event generators, where the ‘Monte
Carlo’ label refers to the statistical technique used in the underlying computations.
These programs mimic scattering events at colliders, producing simulated events
consisting of a list of stable particles (e.g. hadrons, photons, leptons...), with
associated four-momenta. On the experiment side, the predictions can be corrected
for complicated effects, and as a method for comparing data with theory. This often
involves running the event generators mentioned above, and thus it is important to
understand the systematic uncertainties that are implied by their use. There are
various ‘levels’ at which we can try to compare theory with data:
3-2
Practical Collider Physics
• Parton level: This consists of having theory calculations with quarks and
gluons in the final state, where additional radiation has been corrected for in
the data. Furthermore, unstable particles such as top quarks and W bosons
may have been reconstructed.
• Particle/hadron level: Such calculations have hadrons or jets in the final state,
and all unstable particles will have been decayed. There may also be some
modelling of the underlying event.
• Detector/reconstruction level: Here, the theory results differ from particle
level in including an additional detector simulation, that takes finite resolution
effects into account, and possibly also detailed aspects of particular detectors
(e.g. where the gaps are).
Clearly, the further down this list we go, the closer a theory calculation gets to what
is seen by an experiment. Many physics analysis studies are indeed performed at
reconstruction level, particularly searches for new physics as will be discussed in
Chapters 9 and 10. But as we will discuss in Chapter 11 there are also ways of taking
the ‘raw’ data, and correcting it back to particle level—or for some purposes parton
level, at the cost of some model dependence—so that it can be directly compared
with a simpler theory calculation. There are many different uncertainties and
ambiguities involved in this process, and you will often see detailed discussions
and/or arguments at conferences about what assumptions have gone into a
particular experimental analysis or theory calculation, and whether they are valid.
The aim of this chapter is to give a first taste of some of the methods and
algorithms involved in state-of-the-art tools that people actually use at colliders like
the LHC, and the theory calculations that underpin them. Although we will see
some calculations, we will take a mainly schematic approach, that will hopefully be
sufficient to help you decide what tool should be used for what purpose.
3-3
Practical Collider Physics
Table 3.1. The current highest-calculated perturbative orders for some important
processes at the Tevatron and LHC.
remove SM events that can fake the signal, which are known as background
processes, or simply backgrounds. Typically, we have to understand backgrounds
a lot more precisely than signals because they occur in greater quantities: uncer-
tainties in the prediction, if faithfully accounted for, can hence drown out analysis
sensitivity to deviations in data from the SM expectations.
The calculation of QCD LO processes has to a large extent been automated. For
example, the publicly available codes MadGraph_aMC@NLO (currently in version 5)
and Sherpa allow users to specify the scattering process they want to calculate, after
which the program can generate events for this process, and calculate total (or
differential) cross-sections. A variety of codes also exist for calculating events at
NLO for specific processes, such as MCFM, BlackHat, Rocket, and VBFNLO.
Another code, Prospino, can calculate NLO total cross-sections for supersymmetric
particle processes. NLO automation of arbitrary process types is also well advanced,
with both Sherpa and MadGraph_aMC@NLO able to automatically generate events
for almost any scattering process, including a wide range of BSM models. Not all
codes for fixed-order calculations are publicly available, which is particularly the
case for calculations at QCD NNLO and beyond. In this case, one can often contact
the authors of a particular research paper with requests for numbers, if carrying out
a particular analysis.
In practice, all fixed-order calculations must be truncated at some low power of
the coupling constant, due to obvious limitations of computing power. It is
obviously desirable to include as many orders as possible, but the question naturally
arises of whether it is possible—or indeed necessary—to try and include additional
information at higher orders, even if this may be incomplete. This is the subject of
the following section.
In discussing higher-order corrections, we have mainly focused on those involving
the QCD coupling constant αS. However, there will also be electroweak (EW)
corrections, which are becoming increasingly important at the frontier of precision
theory predictions. Although the electroweak coupling constants are much smaller
than the strong coupling constant, if one calculates to sufficiently high order in αS,
then subleading strong interaction effects can be comparable in size to NLO EW
effects, meaning that the latter also have to be included. Automation of these
computations is proceeding along similar lines to those for QCD.
3-4
Practical Collider Physics
(a) (b)
Figure 3.2. (a) LO diagram for DY production of an off-shell photon; (ii) some contributions to the NLO
cross-section in the qq¯ channel.
3.2 Resummation
There are certain cases in which perturbation theory becomes unreliable. To see how
this might happen, let us recall that the perturbation expansion for a general
observable X will look something like equation (2.195)1, where the coefficients of
powers of the coupling constant themselves depend on the momenta entering the
scattering process. There may then be regions of momentum space where the
coefficients of the perturbation series become large, even though the coupling itself
may be small. However, reliability of a perturbation expansion relies on both having
a small parameter to expand in, and well-behaved coefficients. If the latter criterion
fails, we are in trouble!
Let us examine a classic example where this happens, using a process that we have
already encountered in Section 2.15, namely Drell–Yan (DY) production of a lepton
pair, accompanied by any amount of hadronic radiation. To simplify the discussion,
we can further ignore the fact that the virtual photon decays to a lepton pair, and
simply consider the process
q(p1 ) + q¯(p2 ) → γ *(Q ) + H , (3.1)
where the notation γ * implies an off-shell photon, and H denotes any amount of
hadronic radiation. The LO Feynman diagram for this process is shown in
Figure 3.2(a), and to discuss the cross-section it is conventional to define a
dimensionless variable
Q2
z= . (3.2)
sˆ
In words, this is the ratio of the photon invariant mass, and the squared partonic
centre-of-mass energy, which can be loosely thought of as the fraction of the final
state energy carried by the photon.
Armed with this definition, the differential cross-section corresponding to
Figure 3.2(a) with a given quark flavour turns out to be2
1
Although strictly we should also add a dependence on the factorisation scale μF for a partonic quantity.
2
We have here quoted the cross-section in four spacetime dimensions. In 4 − 2ϵ dimensions (as would be used
in dimensional regularisation at higher orders), one must insert an addition factor of (1 − ϵ ) in equation (3.3).
3-5
Practical Collider Physics
(0)
dσˆ qq
¯ eq2π
= σ0 δ(1 − z ), σ0 = , (3.3)
dz 3sˆ
where eq is the electromagnetic charge of the quark. Note that this has dimensions of
area as required. Furthermore, we should not be surprised about the occurrence of
the Dirac delta function, which fixes z to be one: at LO there is only the vector boson
in the final state and thus there is no option but for it to be carrying all the energy.
At NLO, there will be a number of different choices for the initial state particles
(commonly referred to as different partonic channels). Let us focus on the qq¯ channel
(i.e. the only channel that occurs at LO), for which example Feynman diagrams are
shown in Figure 3.2(b), comprising both real and virtual gluon radiation.
Combining all contributions and removing all collinear singularities (in the MS
scheme), one finds the following result for the differential cross-section:
αC ⎡
(1)
1 dσqq¯ ⎛ ln(1 − z ) ⎞ 1 + z2
≡ K (1)(z ) = S F ⎢4(1 + z 2 )⎜ ⎟ −2 ln(z ) (3.4)
σ 0 dz 2π ⎣ ⎝ 1 − z ⎠+ 1−z
⎛ 2π 2 ⎞⎤
+δ(1 − z )⎜ − 8⎟⎥ . (3.5)
⎝ 3 ⎠⎦
We have here expressed the result in terms of the so-called K-factor, which divides
out the normalisation of the LO cross-section. We see that this contains a single
power of the strong coupling αS as expected at NLO, dressed by a function of z,
where the latter may now vary in the range
0 ⩽ z ⩽ 1.
The physical reason for this is that gluon radiation may carry away energy, so that
the virtual photon carries less than the total energy in the final state. For most values
of z, the value of the K-factor will be small, constituting a well-behaved correction to
the LO process. However, there is an issue for extremal values of z. Examining, for
example, the behaviour as z → 1, one finds
z→1 4αSCF ⎛ ln(1 − z ) ⎞
K (1)(z ) ⎯⎯⎯→ ⎜ ⎟ +…, (3.6)
π ⎝ 1 − z ⎠+
3-6
Practical Collider Physics
Unfortunately, higher orders in perturbation theory show that the problem gets
worse! Indeed, the asymptotic limit of the K-factor at O(αSn ) has the form
⎛ 2αSCF ⎞n 1 ⎛ log 2n−1(1 − z ) ⎞
¯ → z → 12⎜ ⎟
(n )
K qq ⎜ ⎟ +… (3.7)
⎝ π ⎠ (n − 1)! ⎝ 1−z ⎠+
What we see is that as the power of αS increases, so does the power of the logarithm
of (1 − z ), which more than compensates for the smallness of the coupling. If we
then try to find the total integrated cross-section for the process of equation (3.1), it
appears as if all orders in perturbation theory are becoming equally important, so
that truncation at a fixed order in the coupling is meaningless.
The solution to this problem is that one can indeed isolate the problem terms of
equation (3.7) at arbitrary order in perturbation theory, and sum them up to all
orders in the coupling constant. This is known as resummation, and is possible
because we can often understand the origin of the most sizeable terms, such that
their properties can be completely characterised. In the present case, for example, we
know that the large logarithmic terms are associated with the kinematic limit z → 1,
corresponding to the virtual photon carrying all the energy in the final state. This in
turn implies that any additional real (gluon or quark) radiation must be soft i.e.
carrying negligibly small four-momentum. From Section 2.15, we know that this is a
limit in which infrared singularities occur, which will also be the case when virtual
radiation—that is, either soft or collinear to one of the incoming partons—is
included. Upon combining all virtual and real radiation, and absorbing initial-state
collinear singularities into the parton distribution functions, the formal infrared
divergences will cancel. However, one is left with the numerically large terms of
equation (3.7), which can be thought of as echoes of the fact that infrared (IR)
singularities were once present!
The explanation of the previous paragraph—relating large logarithms to soft and/
or collinear radiation—suggests that this behaviour is not limited to DY production,
but will be present in many different scattering processes. Another example is deep
inelastic scattering at an e−p collider, in which an off-shell photon combines with a
parton from the proton to produce one or more partons in the final state:
γ *(q ) + f (p ) → ∑ fi (ki ). (3.8)
i
Here f and {fi } denote general initial- and final-state partons, respectively, and a LO
Feynman diagram is shown in Figure 3.3. In this process, it is conventional to define
the so-called Björken variable
Q2
xB = , Q 2 = −q 2 > 0, (3.9)
2p · q
and one then finds that the differential cross-section in this variable contains terms
similar in form to equation (3.7), but with z replaced by xB, and different constants
in front of each logarithm. The kinematic limit giving rise to such large contributions
3-7
Practical Collider Physics
k1
p
Figure 3.3. LO diagram for the deep inelastic scattering (DIS) process, in which a virtual photon scatters off a
quark.
Squaring both sides and implementing the definition of equation (3.9), one finds
n
Q 2(1 − xB )
xB
= ∑ ∣ki∣∣kj∣(1 − cos θij ),
i, j = 1
where ki is the three-momentum of final-state parton i, and θij the angle between
partons i and j. One now sees that the limit xB → 1 corresponds to real radiation that
is either soft (ki → 0), or collinear to an existing parton leg (θij → 0).
In general, large logarithms arising from the presence of soft and/or collinear
radiation are called threshold logarithms (e.g. in the DY case, they corresponded
precisely to the off-shell parton carrying all the energy, and thus being produced at
threshold). They will arise in any scattering process containing one or more heavy or
off-shell particles, and in all such cases we may define a dimensionless threshold
variable
0 ⩽ ξ ⩽ 1,
which will be given by ratios of Lorentz invariants in the process of interest, and is
such that ξ → 0 at threshold. In the above examples, we had ξ = 1 − z for DY
production, and ξ = 1 − xB for DIS. Note that the definition of such a threshold
variable is not unique, and different choices for a given process may exist in the
literature3. However, once a choice is fixed, the differential cross-section in the
threshold variable ξ has the following asymptotic form as ξ → 0:
3
In the above examples of DY and DIS, one may clearly reparametrise the variables z and xB in such a way
that the leading behaviour in the limit z, xB → 1 remains unchanged.
3-8
Practical Collider Physics
2n − 1⎡ ⎤
dσ
∞
⎛ log m ξ ⎞
∼ ∑ αSn ∑ ⎢⎢ cnm
(−1)
⎜
⎝ ξ ⎠+
⎟ + cn δ(ξ ) + … ⎥.
(δ )
⎥⎦
(3.10)
m=0 ⎣
dξ n=0
We have here neglected an overall normalisation factor, that may itself contain
additional factors of (non-QCD) couplings. At each order in the coupling constant,
there is a series of large logarithmic terms, each of which is regularised as a plus
distribution, and an additional delta function contribution. At any order in
perturbation theory, the highest power of the logarithm is m = 2n − 1, and such
terms are referred to as leading logarithms (LL), as they are usually the ones that are
the most numerically significant. Terms which have m = 2n − 2 are referred to as
next-to-leading logarithms (NLL), and so on. As the order of perturbation theory
increases, one becomes sensitive to progressively subleading logarithms (NmLL).
Given that the NLL terms should be numerically smaller than the LL terms as
ξ → 0 (and similarly for the NNLL terms etc), a new form of perturbation
expansion suggests itself in the threshold limit: one can sum up the LL terms to
all orders in αS, followed by the NLL terms, and so on. This is orthogonal to
conventional perturbation theory, in which one limits oneself to fixed orders in the
coupling constant, but includes all behaviour in z. Indeed, the difference between
these approaches can be understood as a different choice of small parameter: fixed-
order perturbation theory (LO, NLO, …) involves neglecting terms which are
suppressed by powers of the coupling constant, whereas resummed perturbation
theory (LL, NLL, …) involves neglecting terms which are suppressed by inverse
logarithms of the threshold variable, and keeping only particular enhanced terms to
all orders in αS. The latter thus constitutes a ‘reordering’ of the perturbation
expansion, and this is what the prefix ‘re-’ in ‘resummation’ is intended to signify.
Of course, neither of these approaches is ideal. In most situations of interest at
colliders, one is faced with the need to calculate a particular total or differential
cross-section in perturbation theory, but not necessarily in a kinematic regime where
the threshold-enhanced terms are most important. The most common approach is
then to use fixed-order perturbation theory to as high an order as possible, and then
to supplement the result with additional threshold terms to a given logarithmic order
(i.e. where the latter are included to all orders in the coupling)4. One hopes that the
latter contributions will help improve the estimate for the cross-section, and you will
often see such results quoted in experimental analysis papers. For example, if a given
theory calculation is described as at NNLO + NNLL order, this means that a fixed-
order calculation at next-to-next-to-leading order in O(αS ) (with full kinematic
dependence) has been supplemented with additional terms at arbitrarily high orders,
up to and including the third largest tower of logarithms.
There are a number of approaches for resumming threshold logarithms, all of
which ultimately make use of the fact that the properties of soft and collinear
radiation are, in a sense, independent of the underlying process: the fact that the
(transverse) momentum of the radiation is negligible in the threshold limit means, in
4
In combining the fixed-order and resummed results, one must take care not to count any radiation twice.
3-9
Practical Collider Physics
Figure 3.4. LO Feynman diagrams for top quark pair production in QCD.
position space, that the radiation has an infinite Compton wavelength. Thus, it
cannot resolve the details of the underlying scattering process, and should somehow
factor off. We have seen this already in Section 2.15, where we used the factorising
property of collinear radiation to set up the language of parton distribution
functions. This factorisation allows certain equations to be derived that describe
the structure of infrared divergences to all orders in perturbation theory. Finally, the
fact that IR divergences are intimately related to the large logarithms appearing in
perturbation theory allow the latter to be summed up. In recent years, an approach
called soft collinear effective theory (SCET) has become popular. It provides a way
to rewrite the Lagrangian of QCD to contain separate fields for soft and collinear
gluons or (anti-)quarks, and can be used to devise similar resummation formulae to
those arising from more traditional approaches. A full discussion of how to derive
and apply these resummation approaches is sadly beyond the scope of this book, but
a number of excellent review articles and textbooks are available.
Above, we have examined the resummation of large contributions in differential
cross-sections with respect to the threshold variable. This is not the only place where
you may see threshold resummation being used:
3-10
Practical Collider Physics
s4 = s + t + u − 2mt2 ,
where (s, t, u ) are the Mandelstam invariants defined in equation (2.156), and
mt the top mass. Secondly, there is the parameter
4mt2
β= 1− .
sˆ
One may show that both of these tend to zero at threshold, although their
behaviour away from the threshold limit differs.
3-11
Practical Collider Physics
Figure 3.5. The transverse momentum of lepton pairs in DY production, as measured by the ATLAS
collaboration. The green curve shows a fixed-order result at NNLO, matched with a resummed calculation at
NNNLL order, which agrees well with the data. Reproduced with permisson from Eur. Phys. J. C 80 616
(2020).
3-12
Practical Collider Physics
j
i
k
Figure 3.6. An amplitude for n partons, where one parton splits into two. All partons are shown as solid lines,
which may be (anti-) quarks or gluons.
and k are collinear (θ → 0), the partonic differential cross-section for the
(n + 1)-parton process factorises:
k max
d∣kT ∣2 zmax
dz
dσˆn+1 = dσˆn ∑ ∫k min ∣kT ∣2
∫z
min z
Pji (z ). (3.11)
i
Here kT is the transverse momentum of parton k with respect to j, and Pij the
DGLAP splitting function associated with the splitting i → jk , where we have to
sum over all possible partons i that can lead to the given final state parton j. We have
put a lower limit on the transverse momentum so that the integral converges, and the
upper limit will depend on some physical energy scale in the process. Likewise, we
have placed limits on the momentum fraction, which will also be dictated by
kinematic constraints. Equation (3.11) can be derived similarly to equation (2.212),
and clearly has a very similar form. The only difference is that there is no shift in
kinematics associated with the n-parton process, due to the fact that the momentum
fraction is associated with the final state partons, and not the internal line (in
contrast to the case of initial state radiation).
The factorised form of equation (3.11) suggests that the extra parton k is
somehow independent of the rest of the process, and indeed there is a nice hand-
wavy explanation for this, that we already saw in the previous section when
discussing threshold resummation. As the transverse momentum of parton k goes
to zero, its Compton wavelength becomes infinite. Particles can only resolve things
that are smaller than their Compton wavelength, and thus the collinear parton k
cannot resolve the details of the underlying scattering process that produced it. It
should thus factorise from the n-parton process, as indeed it does.
We have so far argued that the (n + 1)-parton process factorises in terms of the n-
parton one, but we can iterate this argument in the limit in which a whole load of
emissions are collinear, as they will all be mutually independent. This gives us a way
to approximate cross-sections with a large number of final state partons. Assume
that we have some underlying hard scattering process that produces a set of well-
separated partons in the final state. If these partons then radiate, we can describe this
radiation exactly in the limit in which it is collinear with one of the original hard
partons, using the above factorisation. In practice, extra radiation will have some
3-13
Practical Collider Physics
j1 j2 j3
t1 t2 t3 t 4 ...
i1 i2 i3 i4
z1 z2 z3 z4
Figure 3.7. A parton emerges from a hard scattering process, and cascades by emitting partons of species {jn },
changing its own identity into species {in} in the process. At each step, the cascading parton has virtuality ti,
and a fraction zi of its original momentum.
non-zero angle with the hard partons. However, we still know that radiation is
enhanced in the collinear limit, so is more likely to be close to one of the hard
partons. Then the collinear limit gives a reasonable approximation for the extra
radiation, even if it is not always strictly collinear.
To see how to generate the extra radiation in practice, we can first note that
equation (3.11), and the fact that cross-sections are related to probabilities, allows us
to interpret the combination
k max
d∣kT ∣2 zmax
dz
∑ ∫k ∣kT ∣2
∫z z
Pij (z )
min min
i
3-14
Practical Collider Physics
and papers, but this is a tad misleading—the parton can still emit, as long as we do
not resolve the emissions! In any case, let us denote this probablity by Δ(t0, t1). To
find it, note that we can write
⎡ t +δt ⎤
dt
Δi (t0, t + δt ) = Δi (t0, t )⎢1 − ∑ ∫t ∫ dzPji (z )⎥ . (3.12)
⎢⎣ t ⎥⎦
j
This is a large equation, which we can make sense of as follows. The left-hand side is
(by definition) the probability that there are no resolvable emissions between
virtualities t0 and t + δt . The first factor on the right-hand side is the probability
that there are no resolvable emissions between t0 and t. Finally, the quantity in the
square brackets is the probability of anything happening (i.e. unity) minus the
probability of a single emission between t and t + δt , where we have used the
probabilistic interpretation from above. Provided δt is small enough, we can have
only a single emission in the time δt , so that the square brackets are the probability
that there is no resolvable emission between t and t + δt . Putting everything
together, equation (3.12) simply amounts to the statement
⎛ Probability of no emission ⎞ ⎛ Probability of no emission ⎞
⎜ ⎟=⎜ ⎟
⎝ in range (t0, t + δt ) ⎠ ⎝ in range (t0, t ) ⎠
⎛ Probability of no emission ⎞
×⎜ ⎟
⎝ in range (t , t + δt ) ⎠
which is certainly true, provided the emissions are independent. We can use equation
(3.12) to solve for the function Δi (t0, t ). First, one may Taylor-expand both sides in
δt to give
⎡ ⎤
dΔi (t0, t ) δt
Δi (t0, t ) + δt + … = Δi (t0, t )⎢1 − ∑t ∫ dzPji (z )⎥ ,
dt ⎢⎣ ⎥⎦
j
This important result is called the Sudakov factor. As stated above, it describes the
probability that a parton evolves from some virtuality t0 to some other virtuality t
without emitting any resolvable partons. Note that, although it looks like we have
only used properties of real emissions in the above result, it actually includes virtual
3-15
Practical Collider Physics
corrections too. In the derivation, we explicitly used the fact that the probabilty of
no resolvable emission, plus the probability of resolved emissions, adds up to one.
We need to add virtual and real diagrams together in quantum field theory to
conserve probability, and thus using the fact that probabilities add up to one
amounts to including some appropriate set of virtual contributions in the no-
emission probability. We have not specified explicitly the limits (zmin, zmax ) on the
momentum fraction integral. These will depend on our criterion for what constitutes
a resolvable emission, and can also depend on the virtuality.
A given parton cascade will be characterised by some set of values (zi , ti ) denoting
the momentum fraction and virtuality before each branching. Armed with the
Sudakov factor of equation (3.13), we can then associate such a cascade with the
following probability:
⎡ n ⎤⎡ n ⎤
Pi1,1 i22, …, ni n({zi }, {ti}) = ⎢ ∏ Δi m(tm, tm+1)⎥⎢ ∏ (z˜m+1)⎥ .
j ,j ,…,j
⎢⎣ m = 1 ⎥⎦⎢⎣ m = 1
∫ dz˜m+1Pi m +1i m
⎥⎦
(3.14)
Here the second square bracket includes the probability of each resolvable emission,
where
z
z˜m+1 = m+1
zm
is the fraction of the momentum of the mth parton that is carried by parton (m + 1).
The first square bracket represents the probability that the parton emits no
resolvable emissions between each branching. Again, we have used the fact that
the branchings can be treated independently, which also enters the derivation of the
Sudakov factor. It follows that provided we can generate sets of numbers (zi , ti )
weighted by the probability of equation (3.14), we can obtain a list of radiated four-
momenta as if they had come from a cross-section calculation! This is quite a
remarkable achievement, so let us say it again more slowly. We have shown that
radiation in the collinear limit can be treated as multiple uncorrelated emissions,
where we expect this to provide a reasonable approximation even if the radiation is
not quite collinear with the hard partons emerging from the scattering process.
Instead of having to calculate complicated Feynman diagrams to work out how the
four-momenta of the radiated particles is distributed, we can simply generate pairs
of numbers representing the momentum fraction and virtuality of each parton after
emission, with some probability distribution (equation (3.14)). This distribution
involves fairly simple functions, which can be implemented numerically to generate
vast amounts of radiation quickly and easily.
In practice, the final state partons will never reach being completely on-shell, as
free partons do not exist. This suggests that we put some lower cutoff t0 on the
allowed virtuality of a parton after branching. Then we can use the following parton-
shower algorithm to simulate the radiation:
3-16
Practical Collider Physics
How understandable this is to you depends a bit on how familiar you are with
computational methods for generating given probability distributions. It is an
example of a Monte Carlo algorithm, and the result (after applying momentum
conservation at each branching) is a list of four-momenta of emitted particles, and
thus a simulated scattering event. This explains the name ‘Monte Carlo event
generators’ for the available software programs that implement such algorithms.
The above algorithm corresponds to evolving downwards in virtuality, for a final
state parton. One can use a similar algorithm for incoming partons, where one
typically evolves backwards from the parton entering the hard scattering process, i.e.
again from the most off-shell parton, to the most on-shell. For completeness, we
note that the Sudakov factor for incoming partons is different to that for final-state
partons, in that it includes a ratio of parton distribution functions:
⎧ ⎫
⎪ t1 1
dx′ ⎛⎜ x ⎞⎟ fi (x′ , t′) ⎪
Δi (t0, t ) = exp⎨ − ∫t dt′∑ ∫x Pij ⎬.
⎪
⎩ 0
j
x′ ⎝ x′ ⎠ f j (x , t′) ⎭⎪
If you are feeling adventurous you can in principle derive this from everything we
have said above!
So far we have used the collinear limit to estimate the effect of radiation, which
we justified by saying that radiation is anyway enhanced in this limit, so that we
should be capturing the most likely radiated partons by doing so. In fact, this is not
quite true. We saw in Section 2.15 that radiation is also enhanced when it is soft
k μ → 0, but not necessarily collinear with any outgoing partons. Such radiation is
called wide-angle soft, to distinguish it from radiation that is both soft and collinear.
For simplicity, we can consider a quantum electrodyamics example, in which a hard
virtual photon branches into an electron–positron pair. The electron and positron
3-17
Practical Collider Physics
can then radiate, as shown in Figure 3.8. The complete amplitude will be given by
the sum of these diagrams, such that the squared amplitude includes the interference
between them. If both angles satisfy θi ≪ 1, we can neglect this interference and treat
the emissions as incoherent/independent, as we have discussed in detail above.
However, if the angles are not small, then emissions from different legs do indeed
interfere with each other, thus are not mutually independent. This then looks like a
serious threat to our power shower algorithm, which assumed throughout (for both
the resolved emissions and the Sudakov factor), that all emissions are indeed
mutually independent! We could simply choose not to include wide angle soft
radiation, but the fact that it is enhanced means that we are neglecting potentially
large effects, so that our estimate of higher-order radiation is unlikely to be reliable.
It turns out that there is a remarkably simple fix: if we add and square both
diagrams in Figure 3.8, the result looks like a sum of independent emissions confined
to a cone around each external particle, as exemplified in Figure 3.9. There is a
simple (although not exactly rigorous) quantum mechanical explanation: wide-angle
photons with Δθ (labelled in the figure) necessarily have a larger Compton wave-
length, so cannot resolve the individual electron and positron. Instead, they see the
overall net charge of the system, which is zero. Thus, photon emission at large angles
is suppressed, a phenomenon known as the Chudakov effect.
There is an analogous property in QCD, with the only difference being that the
net colour charge of a system of partons can be non-zero. Consider a hard parton k
that branches into two partons i and j, as shown in Figure 3.10(a). Each of the
partons i and j can radiate, such that summing and squaring the diagrams produces
radiation that looks like:
This effect is called colour coherence, and implies that soft radiation can be modelled
by independent emissions that are strongly ordered in angle, as shown in
Figure 3.10(b). Our original PS used virtuality as an ordering variable, but one
may show that
2
∫ dtt = ∫ dθθ2 ,
3-18
Practical Collider Physics
Figure 3.9. Taking interference of wide-angle soft radiation into account, the overall effect is that radiation is
confined to a cone around each hard particle.
i
1 2 3 4
(a) (b)
Figure 3.10. (a) A parton splits into two other partons, where the lines may represent (anti-) quarks or gluons;
(b) multiple parton radiation can be modelled by independent emissions which are ordered in angle.
where t is the virtuality of the branching parton and θ the emission angle (i.e. the
angular separation of the partons after the branching). Thus, by simply replacing t
by θ in the above PS algorithm, so-called angular ordering, wide-angle soft radation
is automatically included!
This is only one way of setting up PSs with correct soft/collinear behaviour.
Another is to use an algorithm based on emissions recoiling against a pair of
partons, rather than a single parton. This is called a dipole shower, and has become
increasingly popular in recent years for various technical reasons (such as ease of
conserving energy and momentum, as well as of incorporating soft gluon interfer-
ence effects). A number of publicly available Monte Carlo Event Generators
implement various PS algorithms, either on specific scattering processes defined in
the program, or supplied by the user (e.g. from MadGraph). Such programs are
widely used at the LHC, and were also used at previous colliders. The main public
codes are Herwig (which offers the choice of an angular ordering angular-ordered
PS, or a transverse-momentum-ordered dipole shower), Pythia and Sherpa (which
both have transverse momentum-ordered dipole showers).
At the end of the above PS algorithms, the final state as predicted by theory will
contain a large number of (anti-)quarks and gluons, in addition to any colour-singlet
particles that have emerged from the interaction. However, this is still not what
3-19
Practical Collider Physics
would be seen in an actual collider experiment! For one thing, we know that no free
quarks or gluons exist in nature, but are instead confined within hadrons, with no net
colour. We must thus attempt to describe the hadronisation of final-state partons,
which is the subject of the following section.
3.4 Hadronisation
Hadronisation is the process by which a collection of partons (i.e. (anti)-quarks and/
or gluons) forms a set of colour-singlet hadrons, comprising (anti-)baryons and
mesons. Given that this process involves momentum scales of order of the confine-
ment scale Λ, where the strong coupling αS becomes strong, it cannot be fully
described in perturbation theory. The closer we get to experimental data, the more
we must rely on well-motivated phenomenological models. A number of these have
been developed, and they usually contain a number of free parameters so that they
can be tuned to data, allowing confident prediction of hadronisation effects in
subsequent experiments.
Before reviewing the models that are in widespread current use, it is worth
pointing out that a more precise approach can be taken in the simple situation in
which one considers a single final state hadron h. We can illustrate this using the
process of electron–positron annihilation to hadrons, whose leading-order diagram
is shown in Figure 2.14(b). Higher-order corrections will involve QCD radiation
from the final state (anti)-quark, which may be real or virtual. Ultimately, the final
state radiated partons will form hadrons, and in choosing to isolate the particular
hadron type h, we can consider the process
e+e−→hX , (3.15)
where X denotes any other radiated particles (e.g. photons, hadrons) that end up in
the final state. Let q2 be the squared four-momentum—the virtuality—of the
exchanged vector boson, and x the fraction of the total centre-of-mass energy
carried by the hadron h. Then one may define the fragmentation function5
1 dσ h
F h (x , q 2 ) = , (3.16)
σ 0 dx
where σ0 is the LO cross-section, and is included for a convenient normalisation. We
see that the fragmentation function is directly related to the cross-section for
production of a particular hadron, and thus gives us the expected distribution of
momentum fractions x of hadrons h in a given scattering process, and at a given
virtuality q2. Clearly we cannot calculate this quantity from first principles, as the
hadronisation process is not perturbatively calculable. However, it may be shown
that the fragmentation function of equation (3.16) may be decomposed into the
following form:
5
Here we have defined a total fragmentation function, irrespective of the polarisation of the exchanged vector
boson. One may also define individual functions for each polarisation separately.
3-20
Practical Collider Physics
1
dz ⎛ q2 ⎞ ⎛ x ⎞
F h (x , q 2 ) = ∑ ∫x Ci ⎜z , αS(μ2 ), 2 ⎟Dih⎜ , μ2 ⎟ ,
z ⎝ ⎝ ⎠ (3.17)
i ∈ q, q¯, g
μ ⎠ z
where the sum is over all species of parton (including different flavours), and Dih is a
partonic fragmentation function for each species. The coefficients Ci for each species
are calculable in perturbation theory, and at LO equation (3.17) has the interpre-
tation that the total fragmentation function splits into a number of partonic
contributions, with Dih representing the probability that a given parton i in the
final state produces the hadron h. The energy fraction z of the parton i must exceed
that of h, hence the requirement that z > x . Equation (3.17) closely resembles the
factorisation of total cross-sections into perturbatively calculable partonic cross-
sections, and parton distribution functions that collect initial state collinear
singularities, as discussed in Section 2.15. Indeed, the physics leading to equation
(3.17) is very similar. Were one to try and calculate the partonic cross-section for the
process of equation (3.15) in perturbation theory, one would find infrared singular-
ities associated with the emission of radiation that is collinear with a given outgoing
parton i that could end up in the hadron h. In the total cross-section, these
singularities would cancel upon adding virtual graphs, but that is not the case
here given that we have chosen to isolate a single particle in the final state, and
observables with a fixed number of particles in their definition are not infrared-safe.
The only way to get rid of such singularities is to absorb them into some sort of non-
perturbative distribution function, and this is precisely the role played by the
partonic fragmentation function Dih(x , q 2 ). Analogous to the case of parton
distributions, there is an energy scale μ that separates the radiation that is deemed
to be perturbative, from that which is non-perturbative and thus is absorbed into the
fragmentation function. The arbitrariness of this scale leads to evolution equations
for the partonic fragmentation functions, that are similar to the DGLAP equations
for parton distributions. One may then measure the fragmentation functions from
data.
Whilst the above fragmentation approach is completely systematic, it is clearly
insufficient to describe the complete hadronisation of a complex scattering event
including large numbers of partons that have to be hadronised. For example, it takes
no account of the fact that multiple partons from a given event must end up in a
single hadron, in order to produce a colour singlet object. To this end, one may
formulate more comprehensive descriptions, albeit at the expense of moving away
from more precise field theory statements. All hadronisation models have a number
of features in common, e.g.
3-21
Practical Collider Physics
V(r)
Figure 3.11. Potential between a qq¯ pair, as a function of the distance r between them.
• Each model must decide how to divide the colour of the final state partons
between (colour-singlet) hadrons. Usually this is done using approximations
that reduce the vast number of possible colour flows.
• Each model has a number of free parameters, that can be tuned to data. After
tuning a given model to data, it can be used to predict results in subsequent
experiments. However, models may have to be retuned if they start to
systematically disagree with data, or have additional parameters added.
The two main hadronisation models in current use are described below.
3-22
Practical Collider Physics
(a) (b)
Figure 3.12. (a) A QCD flux tube or ‘string’ connecting a quark and antiquark; (b) at large enough distances,
the string can break, creating a new (anti)-quark, and thus two strings.
_
br
b
Figure 3.13. (a) A blue quark turns into a red quark by emitting a gluon, where the latter can be thought of as
carrying a blue charge, and an anti-red charge; (b) colour flow diagram representing the same situation;
(c) colour flow that dominates for multigluon emission in the Nc → ∞ limit, where the individual colours at
each gluon emission are randomly chosen.
they separate, will have a QCD string stretching between them. As this string breaks,
it creates new (anti-)quarks, which also fly apart, and the string-breaking process
iterates until no more breaks are kinematically possible. Hadrons can then be
formed by grouping adjacent quarks and anti-quarks. Of course, this will only
produce mesons. Baryons can be formed by introducing the concept of a diquark,
namely a composite state of two quarks (anti-quarks) that has the same colour
quantum numbers as a quark (anti-quark). Grouping together a quark with a
diquark gives a baryon, and similarly for anti-baryons. Finally, one must consider
gluon radiation from quarks. From a colour point of view, a gluon carries both a
colour and an anti-colour, as illustrated in Figure 3.13. Considering the colour
charges, there will now be a flux tube stretching from the quark to the gluon, and
from the gluon to the anti-quark. This looks like a single flux tube with a kink, and
thus one may model the effects of gluons by considering the dynamics of kinked
strings. A possible complication when multiple gluons are emitted is that there are
many possible choices for where the various (anti)-colour charges end up. However,
it may be shown in the approximation that the number of colours is large (Nc → ∞)
that a single colour flow dominates, namely that of Figure 3.12(c), in which the
pattern of Figure 3.13(b) is repeated along the string. The assignment of individual
colour charges to hadrons is then unambiguous, and corrections to this approx-
imation turn out to suppressed by a factor 1/Nc2 ≃ 0.1, where we have used the
physical value Nc = 3.
There are a number of free parameters in the string model. For example, the
probability for a string to break is described by a certain fragmentation function,
which may be modified for different quark flavours (including heavy flavours), for
diquarks, or for strings of very low invariant mass. At each string break, the (anti-)
quarks are assigned a transverse momentum according to a Gaussian distribution,
3-23
Practical Collider Physics
which may also be modified in certain circumstances. All of the above functions
contain parameters whose optimal values may be fit by tuning to data.
6
The domination of a single colour flow in the PS as Nc → ∞ is the same property as that used in the string
model, in Figure 3.12(c).
3-24
Practical Collider Physics
describe below will turn out to be intertwined with the physics already included
above, so that there is no clear means of formally separating them!
As might be expected as we get closer to experiment, the issues become very
complex, and our means of saying anything concrete about them—using well-
established ideas from field theory—dwindles. Our aim here is not to give a highly
technical summary of different approaches for coping with the sheer mess of the
underlying event. Excellent reviews may be found in the further reading at the end of
the chapter, and in any case the state of the art in underlying event modelling is a
constantly changing situation. Instead, we will content ourselves with pointing out
what some of the main issues are.
Figure 3.14. The transverse momentum spectrum of lepton pairs produced in DY production at the Tevatron.
Data from the CDF experiment is shown alongside various theory predictions, where some amount of intrinsic
k T of the colliding partons is needed in order to match the data. Reproduced with permission from A. Buckley
et al, Phys. Rep. 504 145–233 (2011).
3-25
Practical Collider Physics
this would be zero, and thus this distribution is highly sensitive to the transverse
momentum of the incoming partons. One sees that allowing the latter to have
intrinsic kT gives a much better fit to the data, and that this value is not particularly
small (i.e. it is appreciably larger than the confinement scale Λ).
One of the simplest models for generating intrinsic kT for each incoming parton is
to assume a Gaussian distribution with a fixed width. One might expect such a
model to be correct if the only source of this effect was non-perturbative effects
within the proton itself. However, there is clearly a delicate interplay between
perturbative and non-perturbative physics, not least in that changing the nature or
parameters of the PS will affect the transverse momentum of the incoming partons,
thus changing the amount of intrinsic kT that is needed! More sophisticated models
have been developed in recent years to try to take such issues into account.
3-26
Practical Collider Physics
Figure 3.15. Distribution of the number of charged particles with pT > 100MeV , as measured by the ATLAS
experiment at a centre-of-mass energy of s = 7TeV . Reproduced with permssion from A. Buckley et al Phys.
Rep. 504 145–233 (2011).
Figure 3.16. Two separate hard scattering processes may occur if two partons emerge from each incoming
proton (shown in different colours for convenience). The black gluon links the two processes, so that these are
not necessarily independent.
3-27
Practical Collider Physics
dictated by the parton that has left. Furthermore, the diquark and scattering parton
must form a net colour singlet state, and must combine to reconstruct the total
energy and momentum of the incoming proton. This includes assigning a transverse
momentum to the beam remnant, to offset any intrinsic kT given to the scattering
parton.
The beam remnants must be included when applying hadronisation models to the
entire final state, as it is only upon including them that colour is conserved.
Furthermore, treatment of the beam remnants clearly overlaps with MPI modelling,
given that what looks like a beam remnant from one scattering can turn out to be the
seed for a subsequent scattering. It may be necessary, for example, to keep track of
colour flows between the beam remnants and different MPI scatterings. There is
usually more than one way to do this, and models must decide on a particular colour
assignment in any given event.
+ _
W
d
t
b
Figure 3.17. Example decay of a top quark.
3-28
Practical Collider Physics
Figure 3.18. A scattering event from the ATLAS detector. CREDIT https://atlas.cern/updates/news/search-new-
physics-processes-using-dijet-events.
3-29
Practical Collider Physics
To be more precise, a jet algorithm is a systematic procedure for clustering the set
of final state hadrons into jets. That is, it is a map
{pi } → {jl }
from the particle four-momenta {pi } (where i runs from 1 to the number of particles),
to some jet four-momenta (where l runs from 1 to the number of jets). There can be
more than one particle in each jet, so that the number of jets is less than or equal to
the number of particles we started with. As usual in this book, what looks like a
simple idea has by now become a hugely complicated and vast subject in itself.
We saw earlier that observables should be infrared (IR) safe, meaning that theory
results should not change if arbitrary soft/collinear particles are added, which
constitutes a physically indistinguishable state. Unfortunately, many early (and widely
used) jet algorithms were not IR safe, which caused serious problems when trying to
compare theory with data at subleading orders in perturbation theory (where the latter
are needed for precise comparisons). Recent years have seen the development of many
IR safe algorithms, and a large class includes so-called sequential recombination
algorithms. They involve associating a distance measure dij between any two particles i
and j. This may be as simple as their separation in the (y, ϕ ) plane of Figure 1.10(b),
or it may be more complicated. We can also associate a distance diB between particle i
and the beam axis. A typical algorithm is then as follows:
Historically, different distance measures have been used in such algorithms. Often
they are of the form
⎡ n ⎤ ΔRij
n
diB = (pTi2 ) ,
n
( )
dij = min⎢(pTi2 ) , pTj2 ⎥
⎣ ⎦ R
, (3.18)
for some integer n. Here pTi is the magnitude of the transverse momentum of particle
i, ΔRij is the distance in the (y, ϕ ) plane between particles i and j (see equation
(1.36)), and R is a constant parameter of the algorithm called the jet radius. Special
cases in the literature are n = 1 (the kT algorithm), n = 0 (the Cambridge–Aachen
algorithm), and n = −1 (the anti-kT algorithm). All such algorithms are IR-safe.
If you apply the algorithm to a set of particle four-momenta, it will indeed generate
some set of jet four-momenta. However, it is not at all obvious, having read the above
3-30
Practical Collider Physics
set of steps, what the jets will look like at the end of this! One way to visualise them is
to take an example event, and plot the resulting jets in the (y, ϕ ) plane. Each jet will
have an associated catchment area, where particles in this area will end up in that
particular jet. Plotting these areas then helps us visualise the shapes such jets carve out
in the particle detector. Results for a number of jet algorithms are shown in
Figure 3.19, including the algorithms mentioned above. In particular, we see that
jets obtained using the anti-kT algorithm have a nice circular shape, whose radius is
roughly equal to the parameter R entering the algorithm—this is because the most
energetic particles are naturally clustered first, leading to a very stable centroid around
which the soft structure is added up to ΔR = R . It is for this reason that R is often
referred to as the jet radius. Some of the other jet shapes are more jagged, however,
and their constituents can be located further than R from the eventual jet centroid, due
to its more significant migration (compared to anti-kT ) during the clustering sequence.
Circular jet shapes tend to be preferred by experimentalists, as they can be rapidly
approximated by simplified hardware algorithms used in event triggering, and the simple
shape makes it easier to correct jet energies for underlying event and pile-up effects.
However, other algorithms can be useful for different reasons, particularly since the kT
and CA algorithm correspond, respectively, to the inverses of transverse-momentum
ordered and angular-ordered PSs. In the rest of this book, we will simply refer to ‘jets of
Figure 3.19. The results of clustering a sample event using different sequential recombination jet algorithms,
where particles associated with each jet are shown as areas in the (y, ϕ ) plane. Reproduced with permssion
from Eur. Phys. J. C 67 637–86 (2010).
3-31
Practical Collider Physics
radius R’, where it is understood that these have come from a suitable algorithm with a jet
radius parameter R. Sets of jet algorithms are implemented in the publicly available
FastJet software package, which can be interfaced with event generators and event
reconstruction and analysis codes.
Given a jet, it is useful to be able to define its area in the (y, ϕ ) plane, which is
more complicated than you might think. Firstly, the natural shape of jets can be
non-circular, as discussed above. Secondly, each jet consists of a discrete set of
points in the (y, ϕ ) plane, representing the four-momenta of individual particles. We
are then faced with the problem of how to define a continuous ‘area’ associated with
a set of points, and there are different choices about how to proceed. For example,
FastJet implements the following options:
(i) Active area. Given that any sensible jet algorithm is IR safe, one may flood
any event (simulated or measured) with artificial soft ‘ghost’ particles,
uniformly distributed in the (y, ϕ ) plane. The active area of a given jet is
then defined according to the number of ghost particles that end up in it.
(ii) Passive area. Instead of using a large number of ghost particles, one can add them
one at a time, and then check which jet each ghost particle joins. The passive area is
defined according to the probability that ghosts end up in a particular jet.
(iii) Voronoi area. Given the points in the (y, ϕ ) plane defining the constituents
of a particular jet, one may form their Voronoi diagram. This divides the
plane into cells containing a single particle, such that any other point in
each cell is closer to that particle than to any other (an example is shown in
Figure 3.20). One may the define the Voronoi area of a single particle to be
the intersection of its Voronoi cell with a circle of radius R (the parameter
entering the jet algorithm). Finally, the Voronoi area of the jet is the sum of
the Voronoi areas of its constituents.
Figure 3.20. Example Voronoi diagram, for a set of points in the (y, ϕ ) plane.
3-32
Practical Collider Physics
t
W+
X t u
_
b d
_
t
(a) (b)
Figure 3.21. (a) Decay of a heavy new physics particle X into a top pair; (b) a highly boosted top, whose decay
products are contained in a single fat jet.
These area definitions are different in general, and a given definition may be chosen
for particular reasons. For example, passive area measures how sensitive a jet is to
noise that is localised in the (y, ϕ ) plane, due to its definition in terms of adding
single ghost particles. The active area instead measures sensitivity to noise that is
more spread out.
In recent years, the study of jets has become increasingly sophisticated. A
significant development is the ability to systematically look inside jets, to find
evidence of intermediate particle decays. Consider, for example, the production of a
new physics particle X, that is so heavy that it can decay into a tt¯ pair. The (anti-)top
quarks may then be highly boosted, such that their decay products are collimated
into a single ‘fat jet’, as shown in Figure 3.21. If one were to recluster the event with
a smaller jet radius, one would expect to see three smaller jets inside the original fat
jet, whose invariant mass reconstructs the top mass. However, two of these jets
(corresponding to the ud¯ pair in Figure 3.21(b)) will have an invariant mass that
reconstructs the W mass. Looking for such a pattern of subjets potentially allows
experimentalists to tag top quarks in events, with much greater efficiency than has
been done in the past. In general, methods like this are known as jet substructure
techniques. There are many different algorithms in the literature for examining jet
substructure, some of which are implemented in packages such as FastJet. Typically,
they combine reclusterings of events with jets of different radii, with additional
methods to ‘clean up’ the constitutents of the fat jet (e.g. by removing softer subjets),
in order to make the signal-like patterns of the subjets more visible. We will see
explicit examples of substructure observables later on.
7
The term ‘matrix element’ refers to the squared amplitude, and is used extensively throughout the literature
on Monte Carlo event generators.
3-33
Practical Collider Physics
Table 3.2. Relative advantages and disadvantages of two methods for estimating additional radiation: PSs,
and higher-order tree-level MEs (squared amplitudes).
Can include large numbers of partons. Limited to a few partons only, by computing
power.
Bad approximation for widely separated Exact for well-separated particles.
particles.
Exact when particles are close together. Fails for close particles (collinear singularities).
Includes only partial interference (angular Includes all quantum interference.
ordering).
MadGraph and Sherpa. There are pros and cons of both approaches, as we show in
table 3.2.
What we see is that the regions in which the PS and MEs do well are
complementary: the former strictly applies when particles are close together, and
the latter when they are well-separated. This strongly suggests that we should
somehow combine the descriptions, to get the best of both! A number of schemes for
doing this exist in the literature. Here, we will describe the one that is implemented in
MadGraph.
Naïvely, one might think of simply taking a set of MEs with different numbers
of partons, with some appropriate minimum distance between the latter, and
showering them. However, a problem arises in that some radiation will be counted
twice. Consider the simple case of DY production of a (colour-singlet) vector
boson, shown in the top left-hand corner of Figure 3.22. We can generate extra
radiation using either the PS, or the ME, and the effect of each is shown by the two
directions in the figure. If we simply add together all showered tree-level MEs, we
generate multiple configurations with the same number of partons, indicating a
double counting of contributions. This double counting is easily removed, in
principle at least, by putting a cutoff Qcut on the transverse momentum pT of each
emitted parton (relative to the emitting parton). For pT > Qcut one should use the
ME to generate the radiation, whereas for pT < Qcut one should use the shower.
However, the precise value of Qcut one chooses is arbitrary, and thus it is important
to ensure that results are insensitive to the choice. In other words, we must make
sure that the ME and PS descriptions smoothly match near the scale Qcut, when
removing the double counted contributions. This motivates the following
algorithm:
1. Start with a set of MEs generated with 0, 1, 2 ,... N additional partons, where
each one has a minimum transverse momentum with respect to other partons
to avoid collinear singularities.
2. Cluster each event into jets, and adjust its probability so that it looks more
parton-shower like.
3. Shower the event, and cluster it into jets.
3-34
Practical Collider Physics
PS
...
... ..
.
ME
Figure 3.22. Radiation can be generated from MEs and/or PSs. Naïvely adding all such contributions together
results in configurations that are counted twice (the dashed boxes).
4. If the original event (before showering) has N jets, keep events with
additional jets with pT > Qcut , unless they are harder than the softest jet
from the ME.
5. If the original event had <N jets, veto all events containing additional jets
with pT > Qcut .
Here, step 2 ensures smooth matching between the ME/PS descriptions, and thus
insensitivity with respect to the matching scale Qcut. In practice, making the ME
probability more PS-like involves reweighting the coupling αS and/or PDFs, to make
scale choices the same as in the PS. Step 4 allows extra jets beyond those generated
by the hard ME, but makes sure that the hardest (i.e. most well-separated) jets are all
generated by the ME, as they should be. Step 5 again makes sure that the hardest N
jets all come from the ME. Note that despite our best efforts, some residual
dependence on the matching scale Qcut may remain. It should be chosen so as to give
minimal sensitivity when varied.
The above scheme is a variant of so-called MLM matching. Another common
scheme you will see in the literature is CKKW matching, which is implemented e.g. in
Sherpa. For a given observable, the difference between matching schemes can be
treated as an additional source of theoretical uncertainty, given that it represents an
ambiguity in how we describe extra radiation.
3-35
Practical Collider Physics
Here the first term in the square brackets is the LO cross-section. The second term is
the exact virtual contribution, with the overlap from the PS removed. The final term
is the real radiation contribution with the PS overlap removed, integrated over the
additional phase space of the extra parton. Finally, the whole contents of the square
brackets is interfaced to the PS algorithm. One reason for showing this rather
schematic formula is that it makes clear that there are subtractions involved in
forming the modified NLO ME. Consequently, some events generated by
MC@NLO have negative probability weights. This is never a problem in practice,
as all generated distributions must be positive within uncertainties. Note that the
subtraction terms have to be worked out separately for each PS algorithm, although
this only has to be done once in each case. Impressively, an automatic implementa-
tion of this algorithm exists (called aMC@NLO), which is part of the publicly
available code MadGraph_aMC@NLO. This allows users to specify an arbitrary hard
scattering process (up to a couple of caveats), upon which the software automatically
creates an MC@NLO event generator! The Sherpa generator can also be used to
generate NLO matching to a PS, using a variant of the MC@NLO algorithm.
The second main matching prescription for interfacing NLO MEs with PSs is
called POWHEG, by Frixione, Nason and Oleari. In this approach, a special Monte
3-36
Practical Collider Physics
Carlo event generator is used to generate the hardest radiated parton, that uses the
following modified Sudakov factor:
⎡ R({pi }, k ) ⎤
Δi ∼ exp⎢ − ∫k > p d4 k ⎥. (3.20)
⎣ T T B({pi }) ⎦
Here the numerator is the exact real-emission ME, which is then normalised to the
LO contribution B ({pi }). This is then integrated over the momentum of the extra
parton, which is enforced to have a higher transverse momentum than the hardest
emission. In more simple terms: for the hardest emission, the no-emission proba-
bility includes the full NLO ME, rather than its collinear limit (which is what a
normal PS does). Events from this generator can then be interfaced with a normal
transverse-momentum ordered PS. There is no double counting of radiation (or
virtual corrections), provided one requires that the PS only radiates partons with
transverse momenta less than the hardest emission. The POWHEG approach is
implemented for a variety of processes in the POWHEG-BOX NLO generator code,
to be interfaced with various PS generators.
An advantage of the POWHEG approach is that it works for any transverse
momentum-ordered shower. However, for angular ordered showers care is needed in
order to get the matching to work. This involves so-called truncated showers—see the
original literature for details. Furthermore, the POWHEG approach is such that all
simulated events have manifestly positive probability weights. However, the exponen-
tiation of the real ME in the Sudakov factor for the hardest emission can create spurious
terms at O(αS2 ) and beyond, which can be numerically sizeable for some processes.
The difference between the MC@NLO and POWHEG approaches is of formally
higher perturbative order, and thus acts as a further source of theoretical uncer-
tainty. Some experimental analyses, for example, will run codes implementing both
algorithms, to check the systematic uncertainty due to the matching prescription.
In this section, we have seen many different examples of software tools, in
increasing order of sophistication. Which tool we should use for a particular analysis
very much depends on what we are looking at, and the general answer is that you
should use the tool that gives the most accurate answer. General rules include:
(i) If the observable we are calculating is sensitive to lots of jets, we should use
higher-order tree-level MEs to get these right.
(ii) If there are few jets (e.g. one extra), but the observable still depends on
complicated details of the final state, we should use NLO matched to a PS
e.g. MC@NLO or POWHEG.
(iii) If the observable is very inclusive (e.g. a total cross-section), then fixed
order perturbation theory may be the best thing to do. Even for differential
cross-sections, we can often reweight the integral under them so as to match
a higher-order total cross-section calculation.
You may be wondering whether it is possible to combine NNLO MEs with PSs, or
to combine multiple NLO MEs with showers etc. Indeed this is possible, and people
3-37
Practical Collider Physics
are starting to develop algorithms for doing so, which are somewhat hampered
by current computing power. The basic idea is to try to include as much QCD
theory as possible, whilst still having software tools that are fast enough to produce
theory predictions that we can compare with data in a suitable timescale (weeks or
months).
Further reading
As well as the books already listed in Chapter 2, the following books are useful for
advanced topics in collider physics (e.g. resummation, Monte Carlo generators):
Exercises
3.1 (a) What are the advantages of an NLO calculation over an LO one?
How about higher orders?
(b) What are the disadvantages of higher-order calculations?
3.2 Consider a parton i that branches into two partons j and k.
(a) Assuming the partons j and k to be approximately on-shell and
carrying momentum fractions z and (1 − z ) respectively, of the
energy Ei of parton i, show that the virtuality of parton i is given by
ti ≡ pi 2 ≃ z(1 − z )Ei2θ 2,
if θ is small.
(b) Show further that the transverse momentum of parton k relative to
parton j is given by
∣kT ∣ ≃ (1 − z )Eiθ .
3-38
Practical Collider Physics
(a) (b)
Figure 3.23. (a) Two seed directions for an iterative cone algorithm, separated by twice the cone radius; (b) the
same, but with an additional soft gluon exactly halfway between the two seed directions.
3.3 (a) Explain what is meant by the Sudakov form factor Δi (t0, t1) for
parton species i and virtualities t0 and t1.
(b) Explain why this quantity obeys the equation
⎡ t +δt ⎤
dt
Δi (t0, t + δt ) = Δi (t0, t )⎢1 − ∑ ∫t ∫ dzPji (z )⎥
⎢⎣ t ⎥⎦
j
where suitable limits for z have been imposed. What determines these
limits?
3.4 (a) When do you expect the PS approximation of additional radiation to
be accurate, and when not?
(b) Why do some PSs order successive emissions by angle, rather than
virtuality?
3.5 (a) Why are jet algorithms used in comparing theory calculations with
data?
(b) An early example of a jet algorithm is an ‘iterated cone algorithm’.
This starts by assigning so-called ‘seed four-momenta’ to the event.
Then the four-momenta within a cone of distance ΔR (in the (y, ϕ )
plane) of each seed are summed, and the resultant used to define a new
seed direction. The process is iterated until all seed directions are stable.
Consider the two seed directions shown in Figure 3.23(a), sepa-
rated by twice the cone radius ΔR . How many jets will the algorithm
return?
3-39
Practical Collider Physics
3-40
IOP Publishing
Chapter 4
Beyond the Standard Model
The Standard Model (SM) has done an amazing job of explaining all physics up to
(and including) LHC energies, and some calculations in particular match our current
experimental measurements with outrageous precision. Nevertheless, no-one in the
field of particle physics believes it is the final answer, for a variety of reasons. Firstly,
the theory is subject to a number of theoretical challenges which, although
subjective, give very clear hints that the SM is probably the low-energy limit of a
more fundamental theory. There are also a number of experimental challenges,
including both the parts of the model that have yet to be verified, and a growing
number of observed phenomena that cannot be explained by the SM as it stands.
αS a ˜ μν, a
− θQCDF μνF , (4.1)
8π
a
where θQCD is an unknown parameter, αS is the strong coupling constant, F μν is the
gluon field strength tensor with colour index a, and
μν, a a
F˜ = ϵ μναβFαβ /2
is called the dual field strength tensor. The extra term of equation (4.1) is allowed by
all symmetries of the SM, and receives a contribution from a mechanism known as
the chiral anomaly that we will not go into in detail.
You should now be wondering why we did not include this term in our earlier
discussion of the QCD Lagrangian. It transpires that such a term would give rise to an
electric dipole moment for the neutron, in violation of the very stringent observed upper
bound that tells us that such a dipole moment does not exist. The only solution in the
SM is to tune θQCD so that the term disappears. This is an example of fine-tuning, which
occurs whenever we have a parameter that can take a large range a priori, but which
must be set to a very precise value in order to be compatible with later experimental
measurements. Throughout the history of physics, such fine-tuning has usually told us
that we are missing a more fundamental explanation. In this specific case, the problem
with the missing QCD Lagrangian term is dubbed the strong CP problem, and it is the
starting point for extending the SM with mysterious new particles called axions.
Another fine-tuning problem of the SM is the so-called hierarchy problem, which is
particularly relevant to the search for new physics at TeV energy scales. To explain this,
let us first recall the results of Section 2.11, in which we found that fermion masses run
with energy scale in QFT. Furthermore, this running is logarithmic, which we can find
by putting a momentum cutoff Λ on loop integrals, and then examining how the results
depend on Λ. It turns out that for scalar particles rather than fermions, this running is
quadratic. The Higgs boson in the SM is a fundamental scalar. If its mass is governed
by some underlying new-physics theory at high energies, we therefore expect the Higgs
mass to depend on the energy scale of new physics, Λ, according to
m H2 ∼ Λ2 . (4.2)
Small changes in the new-physics scale lead to large changes in the Higgs mass, so it is
puzzling that we observe a Higgs mass of mH ≃ 125 GeV . This seems to imply a huge
amount of fine tuning: if Λ is large, the Higgs mass should also naturally want to be large.
If we take Λ being two orders of magnitude larger than mH as a rule-of-thumb for where
the SM begins to be finely tuned, this implies that we expect new physics to be prevalent
at energies of O(10 TeV), i.e. LHC energies. However, this is a purely theoretical
argument, and has a high degree of subjectivity, e.g. why 10 TeV and not 100 TeV?
4-2
Practical Collider Physics
• Is the Higgs boson really the same as that predicted by the SM? So far, the
measured Higgs boson production and decay rates match the SM predictions.
However, the uncertainty on these measurements increases dramatically for
some modes, and deviations from SM behaviour may be observed as the
uncertainties shrink with the addition of more LHC data. Recall that, in the
SM, loop effects must be included when calculating the rate of a given process
to a given accuracy. In the SM, we know exactly which particles can act in the
loops of our Feynman diagrams, and thus we generally get unambiguous
calculations of rates that can be compared to experimental data. If there are
new particles in Nature, they would be expected to appear in loops, and thus
would change the rates of processes involving particle production and decay.
Since any new physics is likely to affect the Higgs sector of the SM, we can
thus use precision measurements of Higgs quantities at the LHC to provide a
powerful indirect window on the presence of new particles.
• Is the Higgs field the same as the SM? The SM Higgs mechanism involves the
potential given in equation (2.88). We still have no idea if the Higgs field obeys
this equation or not, since the observed Higgs properties could be consistent
with a different form generated by new physics. To answer this question, it turns
out that we will need to observe the Higgs boson interacting with itself, and then
measure the strength of that self-interaction. This is unfortunately exceptionally
challenging to do at the LHC. Nevertheless, there is a growing literature that
suggests that this is not impossible by the end of the LHC’s run. Moreover, such
a measurement is a very strong motivation for a future, higher-energy collider.
• Are there extra Higgs bosons? The SM contains a doublet of Higgs fields, but
there is no reason at all why there should not be more. A model with two
Higgs doublets gives rise to two extra neutral Higgs bosons (conventionally
referred to as H and A), and two electrically-charged Higgs bosons H + and
H−. Although current searches have not uncovered evidence for these
particles, much of the predicted mass range remains unexplored at the
LHC, and will be covered in more detail over the next decade or so.
1
In a burst of acronym whimsy, WIMPs were originally named as the complement to the alternative dark-
matter hypothesis of astrophysical ‘massive compact halo objects’, or MACHOs.
4-3
Practical Collider Physics
physics associated with this particle can be expected to manifest itself at similar
energies, giving rise to a potential collection of new particles at the TeV scale.
Cosmology also tells us other interesting things. For example, most of the energy
budget of the Universe is comprised of a mysterious ‘dark energy’ component that
probably requires new particle physics at high energy scales to explain. It is more or less
accepted that the Universe also underwent a rapid expansion in its first moments, which
is called inflation. Such behaviour cannot easily be explained without extending the SM
with new fields. The growing field of particle cosmology relates cosmological observa-
tions to the various high-energy particle physics models that might explain them.
Finally, neutrinos are assumed to be massless in the SM, but it has been known
for decades that they have a small mass. Generating this mass technically requires an
extension of the SM.
It is worth noting that we could have extended this list of experimental challenges
to include tentative observations that are not yet confirmed. These include measure-
ments of various properties of b-meson decays that currently show substantial
tension with the SM, and the current best measurement of the magnetic moment of
the muon which does not agree with the SM prediction. In both cases, there are
theoretical uncertainties associated with non-perturbative QCD calculations that are
very hard to pin down, with the result that experts are still split on whether these
measurements allow us to definitively reject the SM. Nonetheless, there are some
b-meson decay observables that are much less affected by systematic uncertainties,
and these also show persistent anomalies.
4-4
Practical Collider Physics
processes with incoming and outgoing SM particles, even if they are too massive to be
produced directly in LHC collisions. Implicit searches for new physics can therefore
access energy scales much higher than those that we can access in explicit searches.
In practice, the implicit search programme at the LHC is performed by measuring
a huge range of observables in an attempt to find cracks in the SM. There are many
different types of observable that one may choose to measure, including:
• Inclusive cross-section measurements: Given the SM, one can calculate the
total cross-section for the production of specific particles in the final state (e.g.
a single W boson, a pair of W bosons, a top and anti-top quark, etc). For
each process, one can then isolate it in the LHC data, and extract a
measurement of the production cross-section. Comparison of the theoretical
and experimental results then provides a powerful test of the SM. In
Figure 4.1, we show various inclusive cross-section measurements performed
by the ATLAS experiment, and it can be observed that they match the
theoretical predictions closely. Isolation of each process typically relies on
selecting specific decay modes of the particles in the final state, for example
leptonic decay modes of the Z boson. The measurements are then corrected
for the known branching fractions of the particles in the SM. Note also that
measurements have been performed with the LHC running at different
Figure 4.1. Summary of SM total production cross-section measurements performed by the ATLAS
collaboration, as of June 2020. The measurements are corrected for branching fractions, and are compared
with the corresponding theoretical predictions. CREDIT: https://atlas.web.cern.ch/Atlas/GROUPS/
PHYSICS/PUBNOTES/ATL-PHYS-PUB-2020-010/fig_01.png
4-5
Practical Collider Physics
4-6
Practical Collider Physics
all events pass the trigger conditions and are reconstructed with 100% efficiency.
This is given by N = σNP × L int , and if we obtain only a handful of events, it is not
possible to discover that process using the assumed integrated luminosity. If, instead,
N is greater than O(100) events, then we might wish to consider the scenario in more
detail.
The next step is to consider how our new particles decay. There are three broad
categories of particle decay that must be considered separately in LHC searches:
• Prompt decay to visible particles: It might be the case that a particle decays at
the interaction point purely to SM particles that are visible to the ATLAS and
CMS detectors, possibly by first producing intermediate particles that
themselves decay visibly. In this case, the invariant mass formed from the
sum of the four-momenta of the final decay products has a peak at the mass
of the new particle, and we can perform a resonance search as described in
Chapter 9.
• Prompt decay to one or more invisible particles: Many theories of physics
beyond the SM feature a particle that has a small interaction strength with
SM fields, either due to interactions via the weak force, or via a new force
with similar strength to the weak force. Such particles would leave the
ATLAS and CMS detectors without being seen, therefore giving rise to
missing transverse momentum. Such particles may be produced directly, in
which case one must make use of initial state radiation to see them at all, or
they may be produced in the decays of SM particles or new particles. In both
cases, one must rely on special techniques for semi-invisible particle searches
that we describe in Chapter 10.
• Non-prompt decay: The new particles predicted by some BSM theories may
not decay promptly, meaning that they leave the interaction point before
decaying. If the lifetime of a particle is sufficient for it to leave the ATLAS or
CMS detector before decaying, and it does not carry colour or electric charge,
the phenomenology is identical to that of semi-invisible particle searches.
However, if the particle does carry colour or electric charge, or it decays
within the detector volume, one can generate a host of novel signatures, such
as significantly displaced decay vertices, tracks with unusual properties, or
short track segments. We briefly summarise techniques for such searches in
Chapter 10.
In all of the above cases, the signal of new particle production does not occur in
isolation, but instead is hidden in a background of SM processes that produce a
similar detector signature. To take a trivial example, if we predict that a new particle
will, once produced, decay immediately to two photons, we can look for evidence of
the particle by looking at all LHC events that contain two photons. But this sample
will be contaminated by any physics process that produces two photons, and we
must find a way to distinguish the events arising from the new particle from the more
boring events of the SM. The situation is not improved in the case of semi-invisible
decays, since missing four-momentum can be produced by any SM process that
results in neutrinos. To complicate things further, the backgrounds for new-physics
4-7
Practical Collider Physics
searches at the LHC typically give us between three and six orders of magnitude
more events than the signals we are looking for, based on the considerable hierarchy
of cross-sections for SM processes that we saw in Figure 4.1.
p1 p3
q i
X
_ j
q
p2 p4
4-8
Practical Collider Physics
q l
i ⎡ qμqν ⎤
2 ⎢ μν ⎥,
Dμν = − η −
q − MX ⎣
2
MX2 ⎦
where q is its (off-shell) four-momentum. The diagram of Figure 4.2 then leads to the
scattering amplitude
gqgl ⎡ qμqν ⎤
A= [v¯ (p2 )γ μu(p1 )][u¯(p3 )γ νv(p4 )]⎢ημν − ⎥, (4.3)
s− MX2 ⎣ MX2 ⎦
with
s = (p1 + p2 )2 , q μ = (p1 + p2 ) μ = (p3 + p4 ) μ .
The second term in the propagator in equation (4.3) (i.e. involving qμqν ) vanishes if
we substitute in for qμ and qν and use the Dirac equations
v¯ (p2 ) p 2 = p1u(p1 ) = 0,
where we have taken the quarks and leptons to be approximately massless. Then
equation (4.3) simplifies to
gqgl
A= [v¯ (p2 )γ μu(p1 )][u¯(p3 )γμv(p4 )]. (4.4)
s − MX2
Now let us look what happens if we take the mass of the new particle to be much
larger than the collision energy, i.e. MX2 ≫ s . We can then Taylor expand the
prefactor in equation (4.3):
1 1 ⎡ ⎛ s ⎞⎤
= − ⎢1 + O ⎜ 2 ⎟⎥ ,
s − MX2 MX2 ⎢⎣ ⎝ MX ⎠⎥⎦
to give
c
A= [v¯ (p2 )γ μu(p1 )][u¯(p3 )γμv(p4 )]. (4.5)
Λ2
Here we have defined the (dimensionless) constant
c = gqgl ,
4-9
Practical Collider Physics
and also introduced the energy scale Λ at which new physics occurs, which in the
present case is simply
Λ = MX .
Interestingly, the amplitude of equation (4.5) looks like it comes from a single
Feynman rule, associated with the interaction Lagrangian
c
L ∼ 2 [Ψ̄qγ μΨq][Ψ̄l γμΨl ]. (4.6)
Λ
Here Ψq and Ψl are the quark and lepton fields, respectively, and equation (4.6) is
an example of a four-fermion operator, in which a pair of fermions and a pair of
antifermions interact at the same point in spacetime. Provided we are at energies that
are much less than Λ, we can use this interaction instead of the full theory involving
the X particle. Indeed, there is actually a historical example of this: Fermi’s original
theory of weak nuclear decay in the 1930s involved just such four-fermion
interactions. However, the theory breaks down at energies around the W boson
mass, above which we have to use the full electroweak theory that was developed by
Glashow, Weinberg and Salam in the 1970s.
We can make some further useful comments about this example:
• The effective interaction at low energy involves the SM fields only. Thus, we
do not need to know the precise nature of the new physics to know that the
new physics is there! It will show up as new interactions involving the SM
fields.
• Assuming that the coefficient c ≲ 1 (i.e. that the couplings in the new-physics
theory are small enough to be perturbative), the interaction will be weak if Λ
is large. Thus, the effective interaction of equation (4.6) will constitute a small
(but non-zero) correction to the SM.
• Because of the inverse square of the new-physics scale in equation (4.6), the
mass dimension of the four-fermion operator is 6 (recall that the Lagrangian
has mass dimension 4 in four spacetime dimensions).
• Higher terms in the Taylor expansion of the propagator involve higher
inverse powers of the new-physics scale, i.e. Λ−4 , Λ−6 and so on. Thus, they can
be systematically neglected.
We have considered a particular example here, but the above story turns out to be
fully general: any new physics looks like a set of effective interactions at energies
≪ Λ , where Λ is the energy scale of the new physics. All of these interactions involve
operators of mass dimension > 4, and containing SM fields only. Thus, we can
account for the presence of new physics by modifying the SM Lagrangian as follows:
∞
ci(n) (n)
LBSM = LSM + ∑∑ Oi . (4.7)
n=1 i
Λn
Here the first sum is over all mass dimensions greater than four, and the second sum
is over a set of possible operators Oi(n) that have that mass dimension. Each operator
4-10
Practical Collider Physics
2
Here we have assumed a common new-physics scale as a way of organising the expansion. We could have
taken a different new-physics scale for each operator. However, we could then simply absorb the difference in
the scales into the coefficients ci(n) , leading to the same result as equation (4.7).
4-11
Practical Collider Physics
ci v 2
c¯i = , (4.8)
Λ2
where v ≃ 246 GeV is the Higgs vacuum expectation value. Given that new physics
is expected to have something to do with the origin of the Higgs boson, this is a
particularly convenient way to measure deviations from the SM, i.e. each c̄i is
dimensionless, and c¯ ≃ 1 would indicate new physics around the energy scale of
electroweak symmetry breaking, provided the couplings in the new physics theory
are O(1). This is just a choice, however, and constraints can be obtained using a
different normalisation. What is clear, though, is that we cannot independently
constrain the new physics scale and the operator coefficients: they always occur
together in the EFT expansion. A typical procedure for measuring the {c̄i} is then as
follows:
1. Calculate the Feynman rules from each higher dimensional operator.
2. Choose observables which are sensitive to the new Feynman rules (e.g. total
and differential cross-sections, decay widths…), and find data for them.
3. Calculate theory predictions for each observable, including the new
Feynman rules.
4. For all data points X, construct a goodness of fit measure. A common one is
(Xth({ci}) − Xexp )2
2
χ ({ci }) = ∑ , (4.9)
X
Δ2X
where the numerator contains the squared difference between the theory and
experimental values at each data point X, and the denominator contains the
theory and experimental uncertainties added in quadrature:
Δ2X = Δ2th + Δ2exp . (4.10)
Here the theory uncertainty includes all of the sources of uncertainty that we
have seen throughout the previous chapters, e.g. renormalisation and
factorisation scale variations, quoted uncertainties on the parton distribution
functions, ambiguities in matching of matrix elements to parton showers, etc.
The experimental uncertainty is sometimes quoted with separate statistical
and systematic uncertainties, which can be added in quadrature themselves to
produce a total. For differential cross-sections, experiments sometimes
present correlation matrices describing the interdependence of uncertainties
associated with different bins of the distribution. These correlations can be
simply included by modifying equation (4.9) to include the quoted correla-
tion matrix.
5. Maximise the likelihood function L = exp[−χ 2 ({c̄i})] to find the best-fit
point, using the frequentist approach to inference described in Chapter 5.
The result of such a fit will be a set of measured values {c̄i}, with some uncertainty
bands. An example is shown in Figure 4.4, from an EFT fit in the top-quark sector.
4-12
Practical Collider Physics
Figure 4.4. Constraints on operator coefficients from a fit of an effective field theory in the top-quark sector.
The second of these options is the more realistic one, given that all EFT operators
are potentially present. However, it is conventional to present results for both
choices, which itself can act as an important consistency check. Note that Figure 4.4
contains the results of both LHC and Tevatron data (left panel), and data from the
Tevatron only (right panel). This illustrates the dramatic impact that LHC data is
already having on constraining new physics.
One can also look at correlations between different operators (i.e., whether a high
value for one coefficient might induce a high value for another). To this end,
Figure 4.5 displays some two-dimensional slices through the space of operator
coefficients, where the different coloured shapes represent the bounds on the value of
coefficients arising from data, at different statistical confidence levels. The red star is
the best fit point arising from the fit which, interestingly, is not the SM! However, all
best fit points are statistically compatible with the origin: they lie within the dark
blue shape, whose edge constitutes being one standard deviation away from the
origin. It will be very interesting to see how such results evolve as more data from the
LHC is implemented in the fits. This is particularly so given that EFT effects are
expected to be more visible at higher energies. To see why, note that dimensional
analysis implies that the factors Λ−n involving the new physics scale Λ in the EFT
expansion must be compensated by extra powers of energy/momentum in the
operator itself—note that momenta appear as derivatives in position-space. These
momenta end up in the Feynman rules, and often have the effect of boosting
3
A related approach is to use all coefficients but to average rather than fit over the unshown ones: these are
called marginalised constraints.
4-13
Practical Collider Physics
Figure 4.5. Two-dimensional slices through the operator coefficient space, showing constraints on pairs of
operators. The red star is the best fit point, and the SM corresponds to the origin in each plot.
4-14
Practical Collider Physics
Figure 4.6. The invariant mass of the top-pair, and the transverse momentum of the top quark, in top pair
production. The red distributions are obtained using the SM only, and the blue distributions include an
additional four-fermion operator.
−1
14
10 GeV
Energy
Figure 4.7. Variation of the inverse couplings of the SM with energy, showing that they do not quite meet each
other at high energy.
get close to each other, but do not quite meet, at an energy scale of ≃ 1 × 1014 GeV.
If they did meet, this could indicate that the three forces that we see at low energy are
in fact multiple facets of a single force, such that the SM forces are unified into a
single theory. This is conceptually very appealing, not least because it avoids having
to answer the question of why the SM has three separate forces in it. Furthermore,
the whole history of physics is that disparate phenomena are often unified through a
few simple fundamental principles. One example of this is the unification of
electricity and magnetism, which we now know is a consequence of gauge symmetry!
The idea that the SM forces could emerge from a single theoretical description is
called grand unification, and such theories are known as grand unified theories
(GUTs). This scenario was originally explored in the 1970s, and the original GUT
theory proposed a single gauge group SU(5), which breaks down to
SU(3) × SU(2) × U(1) at low energies. We have seen that non-Abelian gauge
transformations act on vector-valued fields, and the vectors in the SU(5) theory
4-15
Practical Collider Physics
contain different SM fields in the same vector, which then mix up under SU(5) gauge
transformations. When one tries to put the SM fields into SU(5) vectors such that the
correct quantum numbers emerge at low energies, one finds that quarks and leptons
have to be in the same vector, so that e.g. a quark can change into a lepton by
emitting an SU(5) gauge boson. This leads to the decay of the proton, and one can in
principle calculate the decay rate from the SU(5) theory, which gives a direct bound
on the energy scale at which the unification happens. Unfortunately, there are very
strong constraints on proton decay, which rule out the simplest form of SU(5)
unification. However, the idea has survived in other, more general, scenarios.
Although there are a lot of prototype GUT theories, it is possible to collect some
common features. Firstly, the interactions that allow quarks to turn into leptons are
mediated by particles that correspond either to the gauge bosons of the unified gauge
group, as in our SU(5) example above, or the Higgs sector of the unified theory. The
former are called vector leptoquarks, whilst the latter are called scalar leptoquarks.
The vector leptoquark masses are typically of the order of the scale at which the SM
gauge couplings unify, and they can only be directly accessible at the LHC if this
unification scale is low enough. For the case of scalar leptoquarks, one can write a
low energy effective field theory that describes their interactions, independently of
the exact physics at higher energies. For vector leptoquarks, one must add
assumptions regarding the higher-scale physics, since the couplings of vector
leptoquarks to the SM gauge sector are not completely fixed by their quantum
numbers under the SM gauge group. If leptoquarks have a mass of the order of a few
TeV, i.e. within the LHC energy reach, they can be produced either singly or in
pairs, through strong interactions. Whereas the cross-section for pair production is
mostly model-independent, single leptoquark production always depends on details
of the GUT model parameters. Experimental searches for both pair and single
leptoquark production are based on their decay to a jet and a lepton, which can be
reconstructed as a resonance. Many grand unified theories also include heavy sterile
neutrinos that can be used to generate masses for the SM neutrinos. Such heavy
neutrinos are also popular targets for LHC searches, which target processes such as
pp → W → Nℓ → Wℓℓ → ℓℓjj , where N denotes the sterile neutrino, ℓ denotes a
charged lepton, and j denotes a jet.
A number of GUT frameworks predict the existence of charged W ′ or neutral Z′
vector bosons and/or a plethora of different scalar states, including such exotica as
doubly-charged scalars. These can all be searched for using resonance searches.
4.4.3 Compositeness
Another way to extend the SM comes from revisiting the assumption that particles
are fundamental, and thus completely point-like. This could fail at very short
distances (equivalent to high energy scales): the SM particles could have a
substructure, analogous to how the proton and neutron were once thought to be
fundamental, until they were found to contain quarks and gluons. No collider has
yet seen any evidence for fermion compositeness, but each increase in centre-of-mass
energy of a collider allows us to zoom slightly further into the SM fermions. We can
4-16
Practical Collider Physics
refer to the energy scale at which we would first start resolving substructure as Λ,
which is referred to as the compositeness scale, and which can be thought of as the
scale associated with interactions between the hypothesised particle constituents of
the SM fermions.
The LHC signatures of fermion compositeness depend on Λ. For example, if Λ is
much higher than the centre-of-mass energy of colliding quarks at the LHC, then
compositeness will appear at the LHC as an effective four-fermion contact
interaction (two of which fermions are the incoming quarks), similar to that of
the effective field theory that we described earlier. One could assume various forms
for this interaction, but it is common to restrict experimental studies to searching for
evidence of specific subsets of the general behaviour, such as assuming that one only
has flavour-diagonal, colour-singlet couplings between quarks. In that case, one can
write the effective Lagrangian
2π ⎡
Lqq = μ μ
⎣η (q¯ γ q )(q¯ γ q ) + ηRR(q¯Rγ qR )(q¯RγμqR )
λ2 LL L L L μ L (4.11)
+ 2η (q¯ γ μq )(q¯ γ q )⎤⎦ ,
RL R R L μ L
where the subscripts L and R refer to the left- and right-handed quark fields, and ηLL ,
ηRR and ηRL are parameters that can take values of −1, 0 or +1. Different choices of
the η parameters correspond to different contact interaction models, that can be
constrained separately from LHC observations. This Lagrangian supplies us with
the method of searching for quark compositeness in the case of high Λ: we look for
events with two jets, and compare the kinematics of the jets in the final state with
those expected in the case of the SM. We could do this using a variety of kinematic
variables, but a well-motivated choice is to look at the dijet angular distribution
1 dσdijet
,
σdijet dχdijet
where χdijet = exp∣y1 − y2 ∣, and y1 and y2 are the rapidities of the two jets. This looks
rather obscure, but it can shown that, in the limit of massless scattering quarks,
1 + ∣cos θ *∣
χdijet = ,
1 − ∣cos θ *∣
where θ * is the polar scattering angle in the centre-of-mass frame of the scattering
quarks. For normal QCD dijet processes, dσdijet /dχdijet is approximately uniform,
whereas composite models predict a distribution that is strongly peaked at low
values of χdijet . This makes it a very effective distribution for revealing evidence of
compositeness. Note that one could easily extend equation (4.11) with terms that
feature leptons and quarks, or leptons and leptons, in which case one can also search
for compositeness in dilepton final states.
Compositeness models also predict the existence of excited quarks and leptons,
which are analogous to the excited states of atoms or nuclei that you may be familiar
with from your undergraduate studies. These states are usually referred to as q* and
4-17
Practical Collider Physics
ℓ * for the quark and lepton case, respectively, where an excited fermion is a heavy
fermion that shares quantum numbers with one of the existing SM fermions. The
mass of the excited fermions is expected to be similar to the compositeness scale,
whilst their interactions with SM particles are model-dependent. Thus, if Λ is within
LHC reach, one could search for these excited states directly. For both excited
quarks and excited leptons, a search can be performed by looking for a narrow
resonant peak in the invariant mass distribution of the excited fermion decay
products. For excited quarks, these decay products are typically two jets (instigated
by a quark and a gluon), a jet and a photon, or a jet and a weak gauge boson. For
excited leptons, one can instead search for decays to a lepton and a photon, or to a
lepton and a weak gauge boson.
A separate class of composite theories comes from considerations of the hierarchy
problem, which ultimately results from the fact that the Higgs boson is posited to be
a fundamental scalar. It transpires that the quantum corrections which shift the
Higgs boson mass to very large values (unless we apply fine-tuning) can be evaded if
the Higgs boson is a composite particle. Several theories of a composite Higgs boson
have been developed over the past few decades, and we will here briefly describe how
a particular modern variant evades the hierarchy problem. The best way to explain it
is to take a quick detour into an effect that is encountered in QCD, called chiral
symmetry breaking.
Let us start with a quick revision of some basic QCD facts. Firstly, in the early
universe (at high temperatures), quarks and gluons existed as the relevent degrees of
freedom in the quark-gluon plasma. At some point, when the temperature of the
Universe dropped below TC ∼ Λ QCD, the quarks were confined into hadrons, and the
Universe became full of composite states of strongly-coupled particles. This is called
the QCD phase transition. The mesons contain a quark/anti-quark pair, and their
masses are typically clustered around the scale Λ QCD. The pions, however, are very
much lighter, and it will turn out that the explanation for this could also help to
explain why the Higgs boson is very much lighter than might naïvely be expected.
We will proceed by writing the QCD Lagrangian in a simplified form, including
only up and down quarks:
1 a 2
L= (F μν ) + iu¯ D u + id¯ D d − muuu ¯ .
¯ − md dd (4.12)
4
We will further assume for simplicity that these quarks are massless, which is in fact
a good approximation in the present case, since the only relevant fact is that the
masses are much smaller than Λ QCD. In that case, and using the fact that the chiral
components of the spinors are given by ψR /L = 12 (1 ± γ5)ψ , we can write the
Lagrangian as
1 a 2
L= (F μν ) + iu¯L D uL + iu¯R D uR + id¯L D dL + id¯R D dR. (4.13)
4
This form has a curious property. It is invariant under separate SU(2) rotations of
the left- and right-handed quark components:
4-18
Practical Collider Physics
⎛ uL ⎞ ⎛ uL ⎞ ⎛ uR ⎞ ⎛ uR ⎞
⎜ ⎟ → g ⎜ ⎟, ⎜ ⎟ → g ⎜ ⎟, (4.14)
⎝ dL ⎠ L
⎝ dL ⎠ ⎝ dR ⎠ R
⎝ dR ⎠
where gL and gR are SU(2) transformations from the two groups SU(2)L (acting only
on left-handed components), and SU(2)R (acting only on right-handed components).
We can write the total symmetry as SU(2)L × SU(2)R , and this is called a chiral
symmetry since it acts differently on left- and right-handed components. It can be
shown4 that a general transformation of this kind can be written as
(du ) → exp{iθ τ a
a
(du ),
+ γ5 βa τ a} (4.15)
for the doublet of Dirac spinors u and d. If we set βa = 0 and vary θa , we get a set of
transformations called isospin (not to be confused with the weak isospin that we saw
in Chapter 2!), which form a diagonal subgroup of the group SU(2)L × SU(2)R .
After the QCD phase transition, the ground state of QCD turns out to have5 a
non-zero vacuum expectation value for the quark bilinears ūu and dd ¯ given by
〈uu ¯ 〉 ∼ Λ 3QCD.
¯ 〉 = 〈dd (4.16)
No matter how this arose (and we have yet to prove this from QCD itself), we can
note that it is not invariant under the chiral symmetry, but it is invariant under the
isospin subgroup that operates on the left- and right-handed components in the same
way. Thus, the chiral symmetry of QCD is spontaneously broken as
SU(2)L × SU(2)R → SU(2)isospin . This symmetry breaking comes associated with
Goldstone bosons, and it is exactly these Goldstone bosons that we identify with
pions. In the limit of exact chiral symmetry, these pions would be massless, but the
small explicit breaking of chiral symmetry that results from the up and down quarks
having a small mass means that the pions are instead pseudo-Goldstone bosons that
have a finite (but small) mass. The beauty of effective field theory is that, as in our
previous examples, we can write a low-energy theory of the pions without knowing
the full details of the higher-energy physics, which means that we can determine
most of how the pions interact purely from the application of effective field theory,
combined with our knowledge of the symmetry breaking pattern.
How does this relate to the Higgs boson? Imagine that we have a new, strongly-
coupled sector of the theory of particle physics, which adds new QCD-like
phenomena at higher energies. Thus, there would be new fields like quarks and
gluons that kick in at high energies, but below a certain temperature those states
would confine into composite objects. Imagine also that this theory has some
accidental symmetry which is spontaneously broken to a subgroup, along with a
small degree of explicit symmetry breaking. We would get a tower of resonances that
correspond to the ‘mesons’ and ‘baryons’ of this new sector, and we would also get
4
The form of a general chiral transformation is a special case of Lie’s theorem of equation (2.33), where the
two terms in the exponent on the right-hand side of equation (4.15) constitute different generators (recall that γ5
contains a factor of i).
5
Evidence for this proposal comes from phenomenological models, as well as lattice QCD studies.
4-19
Practical Collider Physics
pseudo-Goldstone modes from the symmetry breaking that could play the role of the
Higgs boson. This is the essence of how modern composite Higgs theories work,
although the details become very complicated very quickly. The effect on LHC
physics is to modify the couplings of the Higgs boson, and also to introduce new
exotic particles such as vector-like quarks which have spin-1/2, and transform as
triplets under the SU(3) group, but which have exotic quantum numbers compared
to SM quarks. LHC searches for composite Higgs models thus partly rely on trying
to find the production and decays of these exotic new states, and partly on shrinking
the error bars on the Higgs decay modes.
4.4.4 Supersymmetry
Making the Higgs composite is one way to solve the hierarchy problem, but there is
another popular method. The SM is based heavily on symmetry, of which the two
main kinds are:
(i) Poincaré symmetry, described by the Poincaré group of Lorentz trans-
formations plus translations. These symmetry transformations are associ-
ated with spacetime degrees of freedom (e.g. positions and momenta).
(ii) Gauge symmetries, described by Lie groups acting on an abstract internal
space associated with each field.
Given how useful symmetry has been in guiding the construction of the SM, it is
natural to ponder whether one can make an even more symmetric theory, which
combines spacetime and internal symmetries in a non-trivial way. Unfortunately,
something called the Coleman–Mandula theorem tells us that this is impossible, and
that the only way to combine symmetries of types (i) and (ii) above is as a direct
product,
Symmetry group of any theory = (Poincare)́ × (Internal Lie group),
with no mixing between them. Note that both of the symmetries are described by Lie
groups (i.e. the Poincaré group is itself a Lie group). However, the Coleman–
Mandula theorem only applies if we require the total symmetry group to be a Lie
group. There are in fact other types of interesting mathematical structure, and we
can use them to build a theory instead. In particular, one can extend the Poincaré
group to something called the super-Poincaré group, whose associated algebra is
called a graded Lie algebra rather than a standard Lie algebra. It has the normal
Poincaré group as a subgroup, but also includes extra transformations that relate
bosonic and fermionic degrees of freedom. To implement this structure, we have to
extend spacetime to something called ‘superspace’, which has additional fermionic
(anticommuting) coordinates, in addition to the usual spacetime ones.
If we build the SM on such a space, we get something that looks just like the SM
in four conventional spacetime dimensions, but where every bosonic field has a
fermionic counterpart, and vice versa. Whether or not we consider ‘superspace’ to be
real or not is up to us—at the very least, we can regard it as a convenient
mathematical trick for generating QFTs with more symmetry in them.
4-20
Practical Collider Physics
4-21
Practical Collider Physics
gluino, gluon g̃ g
winos, W bosons W˜ ± W̃ 0 W± W 0
bino, B boson B̃ 0 B0
reasons for this results from the general structure of supersymmetric theories, in
which it can be shown that only a Y = +1/2 Higgs chiral supermultiplet can have the
Yukawa couplings necessary for charge +2/3 quarks, and only a Y = −1/2 Higgs can
give the right couplings for charge −1/3 quarks and charged leptons. This gives us
the two complex SU(2)L doublets Hu and Hd shown in table 4.1, containing eight
real, scalar degrees of freedom. When electroweak symmetry breaking occurs in the
MSSM three of them form Goldstone bosons which become the longitudinal modes
of the Z0 and W ± vector bosons, and the remaining five give us Higgs scalar mass
eigenstates consisting of one CP-odd 8 neutral scalar A0, a charge +1 scalar H + and
its conjugate H−, and two CP-even neutral scalars h0 and H0. Whilst the masses of
A0, H0 and H ± can be arbitrarily large, one can set an upper bound on the h0 mass;
the observed Higgs mass of 125 GeV is consistent with this upper bound. Had it been
observed to have a larger mass, the MSSM would already have been excluded.
The physical particles that exist in SUSY, and which we aim to discover at the
LHC, do not match the fields shown in tables 4.1 and 4.2. Instead, the fields mix in
the MSSM to produce the following physical eigenstates:
Hu0, Hd0, Hu+, Hd− → h0 , H 0, A0 , H ± (Higgs)
4-22
Practical Collider Physics
0 ˜ 0, H˜ u0, H˜ d0 → χ˜ 0 , χ˜ 0 , χ˜ 0 , χ˜ 0 (neutralinos)
B˜ , W 1 2 3 4
˜ ±, H˜ u+, H˜ d− → χ˜ ± , χ˜ ± (charginos)
W 1 2
where the degree of mixing in the squark and slepton sectors is typically propor-
tional to the mass of the associated fermion, and thus assumed to be largest for the
third family.
Although the original motivation was rather formal (i.e. the Coleman–Mandula
theorem mentioned above), the MSSM turns out to have some remarkable features:
• The couplings in the MSSM unify at high energy, in contrast to the behaviour
of Figure 4.7. Furthermore, there is a mechanism (albeit usually put in by
hand) to get rid of proton decay.
• There is a natural dark matter candidate, in that the lightest superpartner is
stable.
• The hierarchy problem is resolved, as Higgs mass corrections become
logarithmic rather than quadratic, thus dramatically reducing the sensitivity
of the Higgs mass to the scale of new physics.
• Our only known candidate theory for quantum gravity plus matter (string
theory) seems to require SUSY to be consistent.
Just one of these features would be nice, but to have all of them feels extremely
compelling to many people. For this reason, SUSY theories have received a massive
amount of attention over the past 30 years or so.
If SUSY were an exact symmetry of Nature, the sparticles introduced above
would have the same masses as their SM counterparts and would have been seen
already in collider experiments. Sadly, this is not the case, and there remains no
direct experimental evidence for supersymmetry. Any valid supersymmetric theory
must therefore introduce a mechanism for supersymmetry breaking, and one can
introduce spontaneous symmetry breaking in a way directly analogous to the
treatment of electroweak symmetry breaking in the SM. Many models of sponta-
neous symmetry breaking have been proposed, and there is no general consensus on
which is the correct mechanism. One option is simply to parameterise our ignorance,
and write the most general gauge-invariant Lagrangian that explicitly adds SUSY-
breaking terms, and for the MSSM it can be shown that this adds 105 free
parameters to the SM, in the form of the masses of the new particles, and various
CP-violating phases and mixing angles. Some of these are already constrained to be
near-zero by experimental measurements (such as measurements of the lifetime of
the proton), and yet others have no effect on the behaviour of superpartners at the
LHC. There are roughly 24 parameters that are required to describe LHC
phenomenology, and these are often compressed to 19 parameters by, for example,
assuming that the first and second generation of squarks and sleptons have equal
masses (which would not be clearly distinguishable at the LHC in the case of
squarks). This 19-parameter model is frequently referred to as the phenomenological
MSSM (pMSSM), and it is common to interpret the results of LHC sparticle
searches in either the pMSSM, or some subset of its parameters.
4-23
Practical Collider Physics
4-24
Practical Collider Physics
of k ∼ MPl . Only gravity can propagate in the extra dimension, and the model
consists of a 5D bulk with one compactified dimension, and two 4D branes,
called the SM and gravity branes. It can be shown that the relationship
between the fundamental Planck mass and our usual one is now given by:
MD = MPl e−kπR . (4.19)
If R = 1 × 1032 cm, we get MD ∼ 1TeV , which again eliminates the hierarchy
problem.
Theories with extra spatial dimensions turn out to have a variety of consequences
at the LHC. The first class of observations results from the propagation of gravitons
through the compact extra dimensions. It can be shown that the virtual exchange of
these states would appear as massive new resonances, called Kaluza-Klein (KK)
excitations. In the ADD model, the mass of these resonances is given by
m k2 = m 02 + k 2 / R2 , k = 0, 1, 2, 3, 4, … (4.20)
Note that these are regularly spaced and, for large R, the KK resonances are almost
continuous. At the LHC, one can search for the direct production of these KK
graviton resonances via processes such as qq¯ → gG , qj → qG and gg → gG , which
would appear as a monojet signature given that the graviton escapes without
interacting with the detector. One can also look for s-channel KK graviton
exchange, with a decay to diboson or dilepton resonances.
One can also search for KK graviton resonances in the warped extra-dimension
scenario, but the details are very different. Rather than being evenly-spaced, the KK
graviton masses are now given by mn = xnke−kπR , where xn are the roots of Bessel
functions. It is usually the case that only the first resonance (i.e. n = 1) is accessible at
the LHC, and it has a narrow width given by k /MPl . On the plus side, it has a
coupling to SM particles that is proportional to 1/MPl e−kπR , which is much stronger
than in the ADD model. Further phenomenology in the warped extra dimension
case comes from extending the simplest RS model to include the possibility that SM
fields can also propagate in the bulk. If that were possible, all of the SM fields would
create KK towers of resonances, which gives us much more to search for!
For the ADD model, there is a second class of observations that is particularly
exciting, namely the search for the formation of microscopic black holes. These are
able to form once the collision energy rises above a certain threshold Mthresh , which is
above MD, but typically well below MPl . Black holes that are produced with an energy
far above Mthresh (called the semi-classical case) would decay to a high multiplicity final
state via Hawking radiation, and one can search for the production of many high- pT
objects. If the production is instead near the threshold, the theory suggests that a
quantum black hole would form, which decays to a two-body final state. Although no
actual resonance is produced, the kinematics in the final state mimic a resonance,
producing a broad bump at a given invariant mass of the black hole decay products.
In this chapter, we have provided a brief summary of the kinds of new-physics theories
that could be measured at current and forthcoming collider experiments. In order to
4-25
Practical Collider Physics
understand how to analyse such experiments, however, we need to know much more
about how to interpret what we are doing. This unavoidably leads us to the subjects of
probability and statistics, which we examine in detail in the following chapter.
Further reading
• An excellent, LHC-centric review of grand unified theories can be found in
‘GUT Physics in the Era of the LHC’ by Djuna Croon, Tomás Gonzalo,
Lukas Graf, Nejc Kosn̆ ik and Graham White (Frontiers in Physics, Volume 7).
• Many students over the years have learned supersymmetry from “A
Supersymmetry Primer” by Stephen Martin (available for free at https://
www.niu.edu/spmartin/primer/). An excellent textbook for beginners that
covers the collider phenomenology in detail is “Weak Scale
Supersymmetry: From Superfields to Scattering Events” by Howard Baer
and Xerxes Tata (Cambridge University Press).
• A superb introduction to composite Higgs theories is given in “Tasi 2009
lectures: The Higgs as a Composite Nambu-Goldstone Boson” by Roberto
Contino, available for free at https://arxiv.org/abs/1005.4269.
• A pedagogical review of LHC searches for extra dimensions can be found in
the ever-reliable Particle Data Group review (see https://pdg.lbl.gov/2020/
reviews/rpp2020-rev-extra-dimensions.pdf for the most recent article from Y
Gershtein and A Pomarol).
Exercises
4.1 Consider a scalar particle of mass m with a quartic self-interaction, as
shown in Figure 4.8.
(a) If the Feynman rule for the self-interaction vertex is proportional to
some self-coupling λ, explain why the diagram leads to an expression
4
∼λ ∫ k 2d−km2 .
(b) Let Λ be the scale at which the theory is expected to break down (e.g.
where new physics may enter). Explain why the mass counterterm for
the scalar field behaves as
δm 2 ∼ Λ2 .
Figure 4.8. Loop diagram in a scalar theory with a quartic self-interaction term.
4-26
Practical Collider Physics
(c) Hence show that the scalar field equation from part (a) implies the
infinite set of equations
⎛ ∂2 n2 ⎞
⎜ 2 − ∇24 + m 2 + 2 ⎟Φn(x , y , z , t ) = 0.
⎝ ∂t R ⎠
where
∂2 ∂2 ∂2
∇24 = + + .
∂x 2 ∂y 2 ∂z 2
Interpret this result.
4-27
IOP Publishing
Chapter 5
Statistics for collider physics
where Ω is the set of all possible events. These axioms may be familiar to you from
your previous study of probability. What may be less familiar is the fact that these
really define an abstract, mathematical quantity, and anything that satisfies these
axioms can be said to be a probability. The definition is therefore not unique!
Kolmogorov’s analysis is based on set theory, and the three axioms above can be
used to derive a series of properties that any definition of probability must satisfy.
For example, imagine that we have two sets of events A and B, that are non-
exclusive subsets of our total event set Ω. Non-exclusive means that some of our
fundamental events Ei may be in both A and B. The probability of an event
occurring which is either in A alone, B alone, or both A and B is then given by
P(A or B ) = P(A) + P(B ) − P(A and B ), (5.1)
where A and B denotes events in both A and B, and P (A) means ‘an event has
occurred which is in the set A’. Below, we will shorten this to ‘A has occurred’. It is
necessary to subtract the last term to avoid double-counting events that are in both
A and B when we calculate the probability on the left-hand side.
5-2
Practical Collider Physics
P (B ) = ∑ P(B∣Ck )P(Ck ),
k
where the left-hand side now represents the probability of an event in the set Ci
occurring, given that B has occurred.
5-3
Practical Collider Physics
and the latter starts from the Bayesian definition, and this in turn leads to different
spheres of applicability for Bayes’ theorem.
We were very careful in the preceding paragraph to note that Bayes’ theorem is
fine for both schools if we have repeatable, discrete events. Imagine instead that we
have a theory, and we want to talk about the probability of the theory being true. A
Bayesian can simply write down P(theory), knowing that this represents the degree
of belief in the theory. A frequentist, however, cannot write anything at all, since
there is no repeatable experiment that can be used to define the frequentist
probability of the theory being correct. As a consequence, the frequentist cannot
apply Bayes’ theorem1.
Let’s think a little more about what the Bayesian can do. After the observation of
some data, Bayes’ theorem can be written as
P(data∣theory)P (theory)
P(theory∣data) = . (5.7)
P(data)
Some common terminology can be introduced here: the left-hand term is called the
posterior probability (or just ‘the posterior’) and P(theory) is the prior probability (or just
‘the prior’), sometimes denoted with a special π symbol. The other terms in the
numerator and denominator are formally known as the likelihood and the evidence,
respectively. The evidence and prior appear to be of equal importance—if they are equal
then the posterior and likelihood are identical—but in practice we only have one
observed dataset and hence P(data) is fixed, which means that the evidence is just a
normalisation factor. The likelihood is the object of primary importance for much
statistical inference, and as such is also often given a special symbol, L. Given some data,
we can usually calculate P(data∣theory), and indeed we will see examples of this later.
What we usually want to know, however, is the probability that our theory is correct
given the data, P(theory∣data). We now see that this can only be known if we have
specified our prior degree of belief in the theory, P(theory), which may be difficult or
impossible to quantify. This, in turn, makes our desired posterior probability subjective.
There does exist an objective Bayesian school, that claims to have methods for choosing
a suitable prior, but the controversy that surrounds such methods renders the point moot
for the foreseeable future. As it is, frequentist statisticians reject Bayesian methods due to
their apparent subjectivity, whilst Bayesians reject frequentist methods because we
cannot use the frequentist probability definition for many cases of real physical interest.
In this book, we will take a rather more pragmatic view. Neither the frequentist
nor Bayesian schools of thought are ‘correct’. They are simply different definitions
of probability that are consistent with the abstract mathematical definition, and a
choice of one or the other frames the sorts of question that you are able to ask and
answer. Provided you think carefully about what you are doing in any particular
analysis, it is easy to avoid criticism from genuinely knowledgeable colleagues.
However, you need to be prepared for the fact that you will be working in a world
1
However, frequentist interpretations of experiments can constrain the estimates of theory parameters without
invoking the concept of a degree of belief. The resulting expressions may hence differ on a purely philosophical
level, affecting how they can be used more than the algebraic form of the probability.
5-4
Practical Collider Physics
where not everyone has grasped the distinction between the two schools, and where
terms such as ‘likelihood’, ‘posterior’ and ‘probability’ are used interchangeably in
ways that are obviously incorrect, and where elements of frequentist likelihoods may
be informally referred to, somewhat erroneously, as ‘priors’.
In particle physics, frequentist approaches tend to be canonical because the field’s
natural scenario is repeatable experiments from high-statistics trials. One of the
main motivations for using particle colliders, rather than waiting for nature to
supply high-energy collisions, is that a high flux of essentially identical collisions
occur in a very controlled environment. By comparison with e.g. astrophysical
particle observations, which may reach higher energies but are not under human
control, the well-defined initial state reduces many systematic effects to the point
where the collisions themselves are most naturally viewed in a frequentist picture.
Following this, much of the basic statistics applied to the counting of events falling
into bins in physical observables is treated in a frequentist perspective. But for
processes where there is significant uncertainty—particularly rare ones where the
observed statistics are low—careful treatment is essential. In particular, cosmology
is concerned with our single universe, leading to a prevalence of Bayesian methods in
that field, and a wealth of interesting discussions in the rich area where collider
physics and astrophysics results intersect.
5-5
Practical Collider Physics
a value within the infinitesimal interval [x , x + dx ]2’. Note that, if the experiment is
repeatable, this has a perfectly fine frequentist definition. We can define the
probability density function (pdf) f (x ) via:
P(x is in the interval [x , x + dx ]) = f (x )dx . (5.8)
For a frequentist, f (x )dx gives the fraction of times that x is observed within the
range [x , x + dx ] in the limit that our experiment is repeated an infinite number of
times. f (x ) itself is not constrained to be less than one everywhere, but the total
integral over all x values must be one:
∫Ω f (x)dx = 1. (5.9)
The probability for the random variable to take on a value less than or equal to x
is given by the cumulative density function (cdf), defined as:
x
F (x ) = ∫−∞ f (x′)dx′. (5.10)
2
Note that we have used x here both for the random variable, and for the range that it might take. We hope
that the meaning will remain clear from the context.
5-6
Practical Collider Physics
where Ω is, as usual, the total set of outcomes. If we project a two-dimensional pdf
down to, say, the x-axis, we define the marginal density function of x as
where we have used the definition of the marginal density function of x. The
denominator exists to normalise the conditional distribution so that it gives unity
when integrated over y.
If we have a joint pdf f (x , y ), marginal density functions a(x ) and b(y ), and
conditional density functions p(x∣y ) and q(x∣y ), we can write Bayes’ theorem for
continuous variables as
p(x∣y )b(y )
q(y∣x ) = . (5.18)
a(x )
where Ω specifies the entire range of x. The left-hand side is a single number which
gives the expectation value of the function g (x ), and the right-hand side makes it
clear that the pdf f (x ) is used a weighting function to determine the contribution of
each value g (x ) to the integral. When g (x ) = x , this expectation value has a special
name, called the mean of x:
5-7
Practical Collider Physics
Expanding the bracket on the right-hand side and using equation (5.20), one finds
that the variance is equal to E [x 2 ] − μ2 . As the notation suggests, σ, the square root
of the variance, is an important quantity reflecting the degree of spread of values in
the distribution: this is called the standard deviation.
The mean and the variance are actually special cases of more general quantities called
moments, which can be used to characterise the shapes of pdfs. E (x n ) defines the n th
algebraic moment, whilst E [(x − μ) n] gives the n th central moment. We thus see that the
mean is the first algebraic moment, and the variance is the second central moment.
Note that we can now define a mean and variance for both x and y, and these are
given by
and
σx2 = E [(x − μx )2 ] (5.25)
σ y2 = E [(y − μy )2 ]. (5.26)
which generalises the one-dimensional variance to a form that captures the spread of
values between the x and y axes. A purer view of the extent of this interaction
between the variables can be found via the correlation coefficient, given by
cov(x , y )
corr(x , y ) = ρ(x , y ) = . (5.28)
σxσy
5-8
Practical Collider Physics
The denominator of this last expression ensures that the correlation coefficient is
between −1 and 1. We say that x and y are independent if their pdf factorises, and
thus f (x , y ) = f (x )f (y ). From equation (5.27), the covariance and correlation
coefficient then vanish for independent variables.
When we have more than two random variables, we can define analogous
covariances and correlation coefficients for each marginal 2D joint distribution.
We can in fact write a matrix with the elements cov(xi , xj ) (for a set of variables xi ),
which is called the covariance matrix: when we later discuss covariance and
correlation in more depth we will always be referring to this n-dimensional version.
3
We shall shortly address the generalisation to non-independent Gaussians.
5-9
Practical Collider Physics
0.5
0.4
0.3
0.2
0.1
x
2 4 6 8 10
Figure 5.1. The Gaussian distribution of equation (5.29), where μ = 5 and σ = 0.8 (blue), σ = 1 (orange) and
σ = 2 (green).
1 ⎡a (x − μ ) 2 1−a (x − μ ) 2 ⎤
= ⎢ exp + exp ⎥. (5.31)
2π ⎣ σ1 2σ22 ⎦
2
2σ1 σ2
5-10
Practical Collider Physics
0.5
0.4
0.3
0.2
0.1
x
2 4 6 8 10
Figure 5.2. The log-normal distribution, shown for various values of the width parameter σ.
Log-normal distribution: Our last distribution directly related to the Gaussian is the
log-normal: a Gaussian distribution in the logarithm of the parameter x. Like the
double-Gaussian it is rarely motivated a priori, but often used to force positivity in
cases where the infinitely long negative tail of a naïve Gaussian invalidates the
domain of the described variable—for example a positive particle momentum, mass,
or production rate. While convenient, the price paid is a skewing of the distribution,
which for distributions intended to be within a few standard deviations of the
physical boundary at zero can be very abrupt indeed. This skewing can be seen in
Figure 5.2, in which we plot the form of the log-normal distribution for various
parameter values. We will not dwell on this distribution any further, but it is good to
be aware of its availability—and side-effects if used without due care.
Poisson distribution: The Poisson distribution gives the probability density for
observation of a number of events k, produced by a stochastic process with constant
rate λ, i.e. an expected number of events λ in whatever continuous interval of time,
distance, integrated luminosity, etc is being considered. It has the form
λ k e−λ
P( k ; λ ) = . (5.32)
k!
Note the discreteness of k required by the factorial function on the denominator4:
the Poisson is strictly a probability mass function (pmf) rather than a distribution, but
we can be a bit sloppy with terminology provided we keep this restriction in mind. In
4
The Poisson can also be generalised to continuous number of observations, via the relation n! = Γ(n + 1), but
this is rarely used in physics applications. Even when events are continuously weighted, the ‘right thing’ is
usually to use discrete Poisson statistics and multiply the event weight back in afterwards.
5-11
Practical Collider Physics
0.25
0.20
0.15
0.10
0.05
k
5 10 15 20
Figure 5.3. The Poisson distribution, shown for λ = 2 (blue), λ = 7 (orange) and λ = 12 (green).
5-12
Practical Collider Physics
Gaussian distribution with mean5 μ = λ and σ = λ . Note that the relative width of
this stochastically assembled distribution, σ /μ = 1/ λ , reduces with larger rates: large
numbers of observed events produce better-defined results. This is an example of the Law
of Large Numbers: that the average of a large number of samples from a distribution
will become asymptotically closer to the expectation value.
B(k ; n , P ) = (kn )P (1 − P)
k n −k (5.34)
n!
= P k(1 − P )n−k (5.35)
k! ( n − k ) !
where P is the probability of success in each test (e.g. the ‘yes’ probability in the binary
questions) and therefore (1 − P ) is the probability of failure/‘no’. The combinatorial
prefactor is called the combination function or binomial coefficient, and counts the
number of ways of distributing k successes and (n − k ) failures in n trials. To see where
it comes from, note that the number of ways of assigning a given list of yes and no
outcomes to a set of n trials is n!, given that we have n choices for the first entry in the
list, (n − 1) for the next and so on. However, we have then overcounted the total
number of distinct outcomes, as we can separately reorder the assignment of success
outcomes amongst themselves k! ways, and the number of failures (n − k )! ways. The
resulting distribution, in k for various n and fixed P, is illustrated in Figure 5.4.
As for the Poisson pmf, the large-number limit is interesting. Indeed, we can obtain
the Poisson distribution as a particular large-n limit of the binomial. To see how this
works, imagine that we have a finite time-interval divided into n smaller, equal-sized
bins, each with a success probability Pn . If we increase the number of bins while ensuring
that the mean success rate for the interval, μ = nPn remains fixed, then in the continuum
limit, n → ∞, the binomial distribution behaves as a Poisson with mean rate λ = μ
n → ∞ , nP → μ (5.36)
(k ; n , P ) ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯→ (k ; μ) .
As the Binomial pmf in this specific limit approaches a Poisson pmf, it should be no
surprise that the Central Limit Theorem again contrives to make a large number of
samples distribute as a Gaussian, unsurprisingly with mean μ = nP , and with and
standard deviation σ = nP (1 − P ) . We can see this limit behaviour in Figure 5.4,
5
Recall that we used the fact that the mean of a Poisson distribution was the same as the (constant) event rate
in Chapter 1, when deriving the form of the cross-section.
5-13
Practical Collider Physics
Figure 5.4. The binomial distribution, shown for p = 0.3 and n = 5 (blue), n = 10 (orange) and n = 40 (green).
as the number of trials n increases. We may also note the scaling of the relative width
of the distribution, σ /μ = (1 − P )/nP ∝ 1/ n —the Law of Large Numbers
ensuring that the distribution becomes better defined with increasing statistics.
The binomial and Poisson distributions are hence closely related to one another
and to the Gaussian—you may find it useful to view the binomial and the Poisson
distributions as essentially the same concept, with one reflecting statistics aggregated
over a continuous period, and the other over a set of discrete trials.
5-14
Practical Collider Physics
0.30
0.25
0.20
0.15
0.10
0.05
x
2 4 6 8 10
Figure 5.5. The Cauchy distribution, with x0 = 5 and Γ = 1 (blue), Γ = 2 (orange) and Γ = 3 (green).
0.5
0.4
0.3
0.2
0.1
x
2 4 6 8 10 12
5-15
Practical Collider Physics
reflect the contribution of multiple core processes. The influence of processes which
distort the whole core distribution, generally by convolutions, is more subtle and it is
these which we will focus on here.
The first such distribution, since it is heavily used in particle physics, is the Crystal-Ball
distribution, named after the classic experiment that popularised it. It is essentially a
Gaussian distribution with one ‘heavy tail’, typically on the negative side of the Gaussian
peak as motivated by wanting a distortion capturing the effects of energy loss in detector
interactions and read-out. Such an effect could be encoded, for example, by convolution
of a truncated exponential or power law with a Gaussian, but convolutions are typically
not expressible in analytic closed form and instead the Crystal-Ball function is a
piecewise combination of a power-law and Gaussian: its pdf is
⎧ (x − x¯ )2 x − x¯
⎪ exp − for > −α
1 ⎪ 2σ 2
σ
fCB (x ; α , n , x¯ , σ ) = ·⎨ (5.39)
N ⎪ ⎛ x − x¯ ⎞⎟−n x − x¯
⎪ A · ⎜B − for ⩽ −α
⎩ ⎝ σ ⎠ σ
where
⎛ n ⎞n ⎛ α 2⎞ n
A=⎜ ⎟ · exp ⎜ − ⎟, B= − α ,
⎝ α ⎠ ⎝ 2 ⎠ α
(5.40)
n 1 ⎛ α 2⎞ π⎛ ⎛ α ⎞⎞
C= · · exp ⎜ − ⎟, D= ⎜1 + erf⎜ ⎟⎟ ,
α n−1 ⎝ 2 ⎠ 2⎝ ⎝ 2 ⎠⎠
5-16
Practical Collider Physics
0.35
0.30
0.25
0.20
0.15
0.10
0.05
x
-4 -2 2 4 6 8 10
Figure 5.7. Example Crystal-Ball distribution, in which a Gaussian at larger parameter values is joined onto a
power-law behaviour at low values.
5.2 Correlations
In the previous section, we saw a number of examples of probability density functions,
that are commonly used in statistical applications. However, a real physical dataset—
consisting of various combinations of measured variables—will have its own very
complex pdf, that depends on various parameters. The dimensionality of this
parameter space can be huge: for example, confounding theory parameters like the
factorization and renormalization scales, pdf error sets, and parton shower ambi-
guities; parametrisations of uncertainties in the detector response and reconstruction;
and finally the parameters of the fundamental physics model being tested, e.g. particle
masses, couplings and spins. Nevertheless, we can at least distinguish two different
types of parameter in the pdf. The parameters of interest (POIs) are those that
correspond to the physics we are interested in (e.g. the mass of the Higgs boson).
Nuisance parameters are those which are associated with the uncertainties, and in most
cases we know something about them already (e.g. we might know that the energy
measured by our detector is off by a certain amount, but we know that this
discrepancy roughly follows a Gaussian distribution with a narrow width).
The pdf is in general distributed in an arbitrary way through the high-dimensional
space of all these parameters, and thus not necessarily structured or aligned with any
of them. We cannot work in general with such an arbitrary form, but fortunately can
approach it systematically in terms of a moment expansion, starting with the mean
(first moment) and the variance (second central moment). In principle one can carry
on through third and higher moments until the whole distribution is characterised,
but in practice we usually stop at second order. These first two moments can be
interpreted conveniently as describing a multivariate Gaussian distribution, which
functions as a second-order approximation to the full pdf.
5-17
Practical Collider Physics
N( x ; μ , Σ ) =
1
2π k detΣ
1
{
exp − (x − μ)T Σ−1(x − μ)
2 } (5.42)
⎛ k ⎧ ⎫
1 ⎞⎟
k
⎪ 1 ⎪
= ⎜⎜∏ ⎨
⎟ · exp⎪∑ − 2 (xi − μi )Σij (xj − μj )⎬
−1
, (5.43)
⎪
⎝ i 2π σi ⎠ ⎩ i, j ⎭
where Σ−1 is the inverse covariance matrix, sometimes called the precision matrix: the
inversion here is equivalent to the placing of the single-dimensional variance σ 2 in
the denominator of the 1D Gaussian exponent.
The interpretation of this is not yet clear, as the covariance matrix has in general
k × k non-zero entries. However, as a real symmetric matrix, Σ can always be
diagonalised by a similarity transform (basis rotation), to give its ‘natural’
representation Σdiag = O T ΣO = diag(σˆ12, σˆ22 ,... σˆk2 ), where the σ̂ 2 eigenvalues are
the variances of independent 1D Gaussians aligned along the orthogonal eigenvec-
tors of the covariance, as illustrated in Figure 5.8.
5-18
Practical Collider Physics
X2 X'1
X'2
σ2
σ^2
σ^1
true
pdf
2nd order
approximate
pdf
X1
σ1
Figure 5.8. Gaussian approximation to a general pdf in two dimensions, with the eigenvectors of the
covariance matrix defining the principle directions of independent Gaussians centred on μ , and with variances
given by the eigenvalues.
5-19
Practical Collider Physics
Figure 5.9. A (log-)covariance matrix between the binned signal regions of a CMS search for hadronic
signatures of supersymmetry (top), and a correlation matrix between the bins of multiple reconstructed
observables in an ATLAS measurement of semileptonic tt¯ events (bottom). The influence of the absolute scale
of bin populations is evident in the upper covariance figure, where large populations and hence absolute
uncertainties for low jet-multiplicity bins in the left and bottom of the plot give way to low populations and
absolute uncertainties in the top and right. By contrast, the lower correlation plot has divided out the absolute
scales of uncertainties as characterised by the diagonal covariance bins, more clearly exposing the correlation
structures of the observables. Reproduced with permission from CMS-SUS-16-033, supplementary figure 9;
and ATLAS TOPQ-2018-15, supplementary figure 106.
5-20
Practical Collider Physics
Σij Σij
corrij ≡ ρij = = , (5.44)
σiσj Σii Σjj
which effectively includes the rotation angles between the variable basis and the diagonal
eigenbasis, but has discarded the relative scalings of the component 1D Gaussians.
Correlations, in this second-order approximation and perhaps beyond, are very
important objects in particle-physics data-interpretation, as without them we are
restricted to using single variables in model testing. The variables that are correlated
can be quite abstract: we can have, for example, correlations between:
allowing linear correlation of observables with either each other variances, but that
combinations of them may be well constrained, with small variances.
Correlations between model parameters often show that individual parameters
may be unconstrained, with large. We shall see later the use of correlation
information in forming likelihood tests, and in methods for estimating statistical
‘connectedness’ observables.
5-21
Practical Collider Physics
Likelihood formalism to be discussed in Section 5.8.1. Note that the most extreme
case of marginalisation, dropping all rows and columns except for a single variable,
gives Σii : the total, diagonal variance of variable i as imprinted by the pdf joint
structure in the other k − 1 parameters.
Non-square covariance subsets can also be useful: for example, we regularly wish
to know the contributions of each systematic error component to the value of each
measurement bin (or model-parameter extraction, or whatever). Slicing to a
rectangular matrix whose first index runs over only the measurement bins, and
the second over only the systematics’ nuisance parameters, gives a measure of this:
this is known as a cross-covariance matrix. While useful, this conflates the spread of
the POI with the (usually) less interesting spread in the nuisance parameter—they are
called nuisances for a reason. Not to worry: we can use ‘half’ of the correlation
matrix construction of equation (5.44) to project out the uncertainty of POI p due to
nuisance parameter n:
∣Σpn∣
σp = . (5.45)
Σ nn
Indeed, there are instances where such a combination is meaningful e.g. the sum of
independent covariance matrices for a set of histogram bins can be combined in
this way—each covariance element is effectively a ‘σ 2 ’, so their linear sum
implements combination in quadrature—but we cannot perform this elegant
looking combination more generally for an important and rather educational
reason: relative signs.
Covariance-matrix elements can have relative minus signs. This tells us that when
the value of one parameter goes up, the other tends to go down, and vice versa. This
may have no more significance than the arbitrary sign between the nuisance parameter
5-22
Practical Collider Physics
and its impacts6. But when we combine over systematics, we no longer have a single
representative nuisance parameter with a meaningful direction, and are instead
treating the covariance elements as unsigned indications of the size of an uncertainty.
The sum therefore must at least be over the absolute values of the covariance elements,
∣Σpn∣. Failing to do so would mean that combining two oppositely-signed sources of
error would result in a reduced total uncertainty, even though any sign-combination of
the two elementary nuisance parameters could have been true.
Another important factor in combining sources of uncertainty within covariance
matrices is to ensure that they really are sources of error and not consequences. The
covariance matrix does not distinguish between correlation and causation, as is
trivially seen if one introduces an extra variable which is algebraically related to an
existing nuisance parameter: it will acquire the same correlations with the POIs as the
original, and it is tempting to sum this apparent contribution into the total
uncertainty. However, it represents no new information, but a reflection of an
established error source. This example is extreme, but more subtle versions are
introduced by performing a fit that correlates elementary nuisance parameters
directly as well as through their common influences on the POIs.
The combination of whole covariance matrices is a final useful concept for
breaking down groups of uncertainties into orthogonal components, the classic
decomposition being into several ‘systematic’ sources of uncertainty, and one
‘statistical’ component. In this mode, several smaller n poi × n poi covariance matrices
areestimated, each corresponding to the correlated uncertainties between the POIs
induced by a subset of the uncertainties. The limitation of attentionto the
correlations between the POIs only is because, by construction, thenuisance param-
eters are different in each group of uncertainties and hencecannot be combined other
than in a trivial block-diagonal form. To compute an effective total POI-covariance
matrix, these component covariances are added linearly (cf. standard deviations in
quadrature) over the set of error groups C:
Σ ijtot = ∑ Σijc . (5.47)
c∈C
6
For example, in QCD theory uncertainties, does an ‘up’ variation mean an increase in coupling, or an
increase in scale (which decreases the strong coupling)? Either is fine, but the polarity of the labelling choice
cannot affect the physical conclusions.
5-23
Practical Collider Physics
for discrete and continuous distributions on the first and second lines, respectively.
We state here without proof that the entropy of a variable can be viewed as its total
information content. This statement may seem more intuitive if you examine the
formulae and note that if a single value has probability 1 (so all other Pi = 0, or
p(x ) → δ (x − x0 )), then the information content is zero: observing the 100%
predictable outcome of a sampling from it tells us nothing. Similarly, a set element
or value of x with probability zero contributes no information to the total. The real
interest, of course, lies away from these extremes, and has great application to many
topics from code-breaking to digital compression.
Let us now consider a joint probability distribution p(x , y ) in variables x and y.
The mutual information between these two variables is then
⎛ p(x , y ) ⎞
I (x , y ) = ∫X dx ∫Y dy p(x , y ) ln⎜ ⎟,
⎝ p(x )p(y ) ⎠
(5.50)
where X and Y denote the domains (ranges of possible values) of the random
variables x and y, respectively7. The similarity with the entropy is clear, and
7
In a slight abuse of notation in equation (5.50), we use x and y both to denote the random variables
themeselves, as well as the dummy variables on the right, which take all possible values in X and Y,
respectively.
5-24
Practical Collider Physics
equation (5.50) in fact captures a measure of ‘information distance’ between the two
variables, as expressed through their probability distributions. There is some of the
flavour of the covariance projection method here, too: in the argument of the
logarithm, the joint probability is normalised by the marginal probabilities of each
variable.
Again, some intuition about I (x , y ) can be gained through its extreme config-
urations: if the distributions of events in variables x and y are completely independent
of one another, then p(x , y ) = p(x∣y )p(y ) = p(x )p(y ) and the logarithm collapses to
zero: there is zero mutual information when the variables are completely disconnected.
At the other extreme, when one variable is completely dependent on the other,
p(x , y ) = p(x ) = p(y ), and the integral collapses to I (x , y ) = H (x ) = H (y ), i.e. the
full information content of either variable is shared with the other. The fractional
mutual information I (x , y )/H (x , y ) is hence a useful normalised measure of how
useful one variable is as a measure of the other: a powerful technique, without the
pitfalls of correlation, for assessing which variables have the most a priori power for
parameter inference, signal/background discrimination, or detector-bias corrections.
We should note, however, that mutual information is not a silver bullet—we still
have to be careful about how the events which define the probability measures are
generated. And continuous variables are more awkward in practice than they appear
here. Given a finite event sample, we cannot actually integrate but have to sum over
a set of discrete bins, whose sizes bias the calculation somewhat. Bias corrections or
fitting of integrable functions are needed, and this extra complexity has, thankfully,
been already implemented in various publicly available computational data-analysis
toolkits and packages.
5-25
Practical Collider Physics
noting that biased estimators may converge more quickly, i.e. the uncertainty on the
estimator value down-scales more rapidly with number of events than the best
unbiased estimator: in some applications this may be the critical factor, but here we
focus on the unbiased variety.
for N events, each having value xi. That this is a suitable estimator of the mean is
perhaps unsurprising, since you have undoubtedly seen the statistical mean many
times before. However, it is worth taking a moment to appreciate that it is not a
priori obvious. Similarly, a sample variance is derived directly from the pdf from
following equation (5.21):
σ˜ 2 = σ 2 = E˜ [x 2 ] − μ2 (5.52)
N
1
= ∑(xi2 ) − μ2 . (5.53)
N i
N
1
= ∑ xi2 − μ˜ 2 .
N−1 i
(5.55)
This replacement is called Bessel’s correction, and the result is the unbiased estimator
for the population variance. But interestingly it does not fully unbias the standard
deviation estimator—in fact there is no general expression for an unbiased sample
standard deviation that can be used with all distributions! In practice this is rarely of
importance, but is worth being aware of if you are unable to avoid small sample
numbers (e.g. in studies of rare processes where despite large total data samples, the
number of signal candidates to pass selection criteria remains small).
5-26
Practical Collider Physics
All these concepts are essentially finite-statistics retreads of what we have already
discussed in probabilistic pdf/pmf language. You can convince yourself that in the
N =∞
limit of infinite statistics, these quantities are unbiased, e.g. ∣μ˜−μ∣ ⎯⎯⎯⎯⎯→ 0. But we are
led to a new concept: the uncertainty of the estimator resulting from the finiteness of
the statistics. The canonical example of this is the uncertainty on the estimator of the
mean, μ̃, known as the standard error on the mean. This is given by
σˆ = σ / N , (5.56)
where again we see the classic 1/ N asymptotic approach to perfect accuracy. This
expression tells us that the estimator of the mean has an uncertainty which will
disappear with sufficient statistics, and that intrinsically wider distributions require
higher statistics to achieve a given estimator precision. Technically, we should be
talking in terms of the estimator of the standard error, σˆ˜ = σ˜ /N , but this notation
is all getting rather heavy and we will shortly drop it except when essential for
clarity.
5-27
Practical Collider Physics
common choice: in this case the only effect is in normalisation, not shape. However,
you should be aware that the width-division is necessary to make a plot physically
comparable to theoretical differential cross-sections. Knowing this also frees you up
to use non-uniform binnings, which can be extremely useful to get comparable fill
populations (and hence statistical uncertainties) in all bins across e.g. a steeply
falling differential spectrum as found in ‘energy scale’ variables like (transverse)
momenta and masses.
The same idea generalises to 2D histograms and beyond: the variable measure is
now of higher dimensionality, so the relevant probability measure is the fill content
of the differential area dx dy in 2D, the volume dx dy dz in 3D, etc and their finite-
bin equivalents ΔxΔy , ΔxΔyΔz , etc. The relevant idealised density is now the
corresponding higher derivative, e.g. d 2σ /dx dy or d 3σ /dx dy dz —which may look
intimidating on plots but is just particle-physics code for ‘2D histogram’, ‘3D
histogram’, and so on, giving a constant reminder that the bin height (or colour-
scale) should now be divided by the bin area, volume, or whatever ∏i dxi quantity is
appropriate for its dimensionality. Be aware that the computer memory needed to
n
handle n-dimensional binnings scales as Naxis bins , which can easily get out of hand: in
practice one- and two-dimensional binnings are almost always all that you need.
In practice we need to be able to represent histograms as graphical data objects,
which largely reduces us to at most two- dimensional projections—three-dimen-
sional surface plots are sometimes used, but mostly for aesthetic excitement as they
contain no more (and sometimes less) information than a 2D ‘heat map’ plot,
usually with the same colour scheme as was used to paint the surface in the 3D view.
Most of the time, ‘1D’ plots of the binned density as a function of one variable are
used8, with multiple lines representing different slices (partial integrals) of the hidden
dimensionality. It is much easier to show multiple models or phase-space slices on
such a 1-parameter plot than to overlay multiple colour maps or surfaces.
Another very useful data type is the profile histogram, which is a compression of
one axis of a 2D histogram so that each e.g. x-bin contains the mean value of y in
+∞ x1
that range of x, 〈y 〉 = ∫ dy ∫ dx f (x , y ) (again including the overflows).
−∞ x0
Profiles are a very useful way for visually characterising how the typical value of
one variable depends on another, in a form which allows plotting and comparison of
multiple lines on a single graph.
The last feature of histograms to explore in this summary is that the bins of a
histogram are typically represented with an indicator of their statistical uncertainty,
the error bar. It is important to be careful about the meaning of this—despite the
name ‘error’, it does not mean that we might have made a mistake in the
measurement (that’s what systematic errors are for) but that, were we to perform
the experiment again with the same finite event statistics, we would get a slightly
different result just from randomness alone. Were we to make lots of equal-statistics
8
The plot itself is two-dimensional, of course: when we say a ‘2D histogram’ we mean that there are two
independent variable axes, plus the density values as functions of those two variables.
5-28
Practical Collider Physics
N
∑xi,nxj,n N N
n 1 (5.58)
= − 2 ∑ xi,m ∑ xj,n,
N N m n
where xi,n is the value of the ith parameter in event/sample n. In practice we will not
distinguish strongly between the true covariance Σ and its estimator Σ̃ , and will
generally use the un-tilde’d symbol.
5-29
Practical Collider Physics
9
Note that the uncertainty on ϵmeas is zero: there is no ambiguity about how many objects passed or failed the
selection. Uncertainties only appear when we try to extrapolate our finite-statistics observations to draw
conclusions about the true, asymptotic parameters of the system.
5-30
Practical Collider Physics
1
= Nϵ(1 − ϵ ) (5.62)
N2
ϵmeas(1 − ϵmeas )
≈ (5.63)
N
(n / N )(1 − n / N )
= , (5.64)
N
and hence obtain the estimator of the standard deviation,
m(1 − m / N )
σ˜ [ϵ˜ ] = . (5.65)
N
Neatly, this goes to zero at both the m → 0 and m → N extremes, in such a way
that the 1σ error bars never extend beyond the logical efficiency range [0, 1]. This is a
very useful property to have on our list, being the standard estimator of uncertainty
on an estimated efficiency.
But we should be a little critical: it is only by convention that we report 1σ
uncertainties, and e.g. the 2σ error bars can go outside the physical range. It is also
not true that the uncertainty on the efficiency really goes to zero at the extremes:
after all, the number of samples N was finite, and a larger sample might have
revealed small deviations from absolute pass/fail behaviour. What we should
actually do is construct asymmetric uncertainty estimators, such that the upper
uncertainty goes to zero as n → N , and vice versa for the lower uncertainty as
n → 0. Such a construction can be performed, for example, using the Poisson
bootstrap method, but for most purposes the binomial estimators are sufficient, as
long as they are not taken too literally.
5-31
Practical Collider Physics
The limitation to a counting experiment is responsible for this form, which maybe
differs from what you have encountered in physics. To generalise, and to understand
it better, we note that in the limit of large statistics the binomial distribution of the
expected count in each bin limits to a Gaussian distribution with variance
σ 2(x ) = f (x ). The test statistic is then equivalent to
(y(x ) − f (x ))2
χ2 = ∑ . (5.67)
x ∈ {X }
σ 2 (x )
ln p(y∣M ) = ∑ ln N(y(x ); f (x ), σ (x ))
x ∈ {X }
(5.68)
−(y(x ) − f (x ))2
∼ ∑ 2 2
( x )
= −χ 2 /2.
x ∈ {X }
σ
10
The attentive reader may wonder why we dropped the constant offset ∑x ln(1/ 2πσx ) arising from the
Gaussian normalisation terms. Isn’t this important? In fact, the χ 2 test statistic is implicitly comparing the
model to the ‘saturated model’ where the prediction exactly equals the data. As this has the same normalisation
constants, the extra terms cancel in the implied comparison. While this is a minor point here, the concept of
cancellation between two models is an important one we shall return to.
5-32
Practical Collider Physics
variables y(x ) may be correlated with one another. Fortunately this does not pose
any fundamental problem to the χ 2 statistic, which is easily extended using the
covariance matrix between the bins, Σ, whose second-order characterisation of
correlations captures directly the structure of a multidimensional Gaussian pdf in
the space of bin values, with the principle axes now lying along linear combinations
of the bin axes. As we have shown that up to a factor of −2, the naïve χ 2 statistic is
the summed log-likelihood of the independent Gaussians, it is no surprise that in the
presence of bin-correlations, the χ 2 generalises to the similarly scaled exponent of
the multivariate normal distribution,
2
χcorr = (y − f )T Σ−1(y − f ) (5.70)
where the summation and 1/σ 2 factors have been handily absorbed into the vector
and matrix multiplications, and the inverse covariance Σ−1 respectively11.
Equation (5.70) describes a quadratic-sided ‘valley’ in χ 2 as a function of
statistical deviations from the model, assuming that the model is true. We will
return to this picture later, with variations in the model parameters away from their
fitted maximum-likelihood estimators. A useful result follows that the curvature of
the χ 2 well, or more generally that of the negative log-likelihood, can be used to
derive an estimator for the covariance. Computing the Hessian matrix of second
derivatives of the negative log-likelihood gives an estimate of the inverse covariance
matrix:
⎡ ⎤
∂ 2L ∂ 2(χ 2 /2) 1 ∂2 ⎢
− = = ∑ (yi − fi )Σ ij−1(yj − f j )⎥ = Σ −αβ1. (5.72)
∂fα ∂fβ ∂fα ∂fβ 2 ∂fα ∂fβ ⎢⎣ i, j ⎥⎦
This result is exact for a Gaussian likelihood, and is more generally useful as an
approximation in the immediate vicinity of the MLE model. Numerical evaluation
of these derivatives is used by numerical fitting tools to estimate the covariance in the
vicinity of the best-fit point, i.e. the MLE, of a likelihood model.
For the case of a signal discovery, there are some specific likelihood estimators
that are useful to know. Imagine that you wish to optimise an analysis designed to
search for a particular signal, and you want to know when you will have sensitivity
to a discovery. For any given analysis design, you can simulate the number S of
expected signal events, and the expected number B of background events. A general
statement of the significance s of the hypothetical signal observation is then given by
the number of Gaussian standard deviations of the background distribution to
11
An important caveat must be borne in mind when using covariance (and hence correlation) matrices between
bins of normalised distributions, i.e. whose integral has a fixed value. In that case, knowing the value of
n bin − 1 bins fixes the value of the remaining one, resulting in a singular covariance whose inverse cannot be
computed. To make statistical progress one must either drop one bin, or re-obtain the unnormalised
distribution.
5-33
Practical Collider Physics
5-34
Practical Collider Physics
being studied; the weights then have to recover the physical spectrum by cancelling
the enhancement factor, i.e. in this case wi = pT−,4i .
Whatever the motivation for the weights, the biased, weighted sample mean is
1
μ̃ = ∑ wixi , (5.74)
∑wi i
i
equal to the estimated mean of the true distribution rather than the MC one. Less
obviously, the weighted variance estimator (including a weighted Bessel correction)
becomes
⎛ ⎞ ⎛ ⎞2
∑⎜wixi ⎟∑wi − ⎜∑wixi⎟⎟
⎜ 2⎟ ⎜
i ⎝ ⎠ i ⎝ i ⎠
σ̃ 2 = . (5.75)
⎛ ⎞2
⎜⎜∑wi ⎟⎟ − ∑ wi2 ( )
⎝ i ⎠
Note that, in both of these cases, what matters is not the total scale of the weights, but
how they are distributed across the event set. A global rescaling of the form wi = w
allows the wi terms to be pulled out of the sums, where they cancel: any physical
quantity cannot be sensitive to the overall scale of weights, and indeed different event
generators use this freedom to set weight scales variously close to unity, or to around
the process cross-section in some generator-specific choice of units.
5-35
Practical Collider Physics
12
We here use this ‘semicolon’ likelihood-function notation to avoid confusion. In the literature a common
convention is to write L(θ∣D ) ≡ p(D∣θ ), a strong candidate for the most counterintuitive notation of all time.
We prefer not to muddy the water further with an inverted ‘bar’ notation, while still introducing the common L
symbol.
5-36
Practical Collider Physics
13
Often referred to as ‘integrating out’ or ‘marginalising over’ the other parameters.
5-37
Practical Collider Physics
The key point here is that it is the pdf that is the real result. What if we want to
quote the ‘value’ of θ1? We could take the mode of p(θ1∣D ), or the mean, or some
other summary statistic, but all of these might be unsatisfactory in some circum-
stances. For example, if we have a spike in the marginal posterior at one value
(which is the mode), but a very broad region of probability density elsewhere
which has a higher volume, then it is more probable that our parameter exists
outside of the spike. The pdf, however, tells us everything, and results will typically
be presented in the form of the marginal posterior for each parameter of interest.
We can also quote values with uncertainties using the notion of credible regions.
Note that this is ambiguous: there are an infinite number of possible choices of θi
interval (especially when allowed to be disjoint). Which we prefer depends on the
circumstances: for example, if testing for the (non-)existence of a rare process, it may
make sense to build the interval on the lower tail of the distribution (integrating
rightward from −∞ in the parameter value) to set an upper limit on the rate; without
such a context, the highest-priority volume elements might be motivated by their
corresponding probability density rather than their parameter values. In general we
need a well-motivated ordering criterion for sequentially adding parameter-space
volume elements to the CrI.
Of the better-motivated orderings, an obvious choice is to add points to the
interval starting with those that have the highest posterior probability density
p(θi∣D ) (known as the maximum a posteriori probability or MAP), and continue
‘integrating down the probability density’ until CL coverage has been achieved. This
scheme can also be naturally generalised to the full (unmarginalised) likelihood,
again adding elements to the CrI—but now multidimensional dnθ elements rather
than one-dimensional marginal ones—from the mode of the pdf downwards, until
probability fraction CL is contained, cf. equation (5.81). This construction is
illustrated in Figure 5.10.
Note that disjoint CrIs also naturally emerge from this ‘top down’ scheme for
CrI construction, should the distribution be multimodal, and the secondary modes
have high-enough probability densities to be included in the CrI. This is an elegant
feature in general, but has an obvious downside should one wish to use the CrIs of
POIs to define ‘error bars’ in graphical or tabulated data: no such contiguous bar
can be drawn. This is more a reflection on the relative poverty of that uncertainty
5-38
Practical Collider Physics
dnp
/dθ n
dnp
∫ dθ n n = CL
dθ
θ1
2D marginal credible
interval
1D marginal pdf θ1
Figure 5.10. Confidence interval construction by projection from the pdf p(θ D ). The vertical axis shows the
probability density as a function of two model parameters θi and θj , which are represented on the two other
axes: these could either be POIs or NPs. The global mode of the pdf is shown as a blue dot on top of the pdf
bump, and projected into the 2D θi − θj plane, and on to each axis. The whole pdf is also projected on to the
axes, giving the marginal pdfs: the modes of these need not match the global mode. Credible regions can be
made either globally (here in θi −θj ) or marginally by integrating from the mode downwards until a fraction CL
of the pdf lies in the CrI.
14
For graphical presentation, a richer view is provided by, for example, violin plots in which the error bar is
replaced by a marginal pdf estimate.
5-39
Practical Collider Physics
important to also publish more complete estimates of the full pdf, since the graphical
or tabulated form is obviously not the whole story.
5-40
Practical Collider Physics
There is in fact a close relationship between the MLE method and the Bayesian
approach. Bayes’ theorem tells us that P (θ∣D ) ∝ P (D∣θ )P (θ ). Ignoring philosoph-
ical difficulties for now, this tells us that, if P(θ ) = 1, the maximum of the
likelihood will be the same as the mode of the posterior. So the MLE can actually
be thought of as the peak location for the Bayesian posterior pdf in the case of a
flat prior.
How do we obtain the MLE? Although it is possible analytically in limited
cases, it is almost always found numerically, and it is common to turn the problem
of maximising L into the equivalent problem of minimising −ln L, where the minus
sign ensures that a minimum will be found rather than a maximum. There may
easily be O(100) parameters or more, once nuisances are included, and in addition
there are few distributions for which the optimum can be computed analytically;
real-world likelihoods tend not to be among them. The exact way in which
numerical minimisation is performed is not a topic in which physicists typically
have a deep involvement, there being off-the-shelf tools to perform it, either in the
HEP Software: ROOT-oriented ecosystem, or in e.g. the rapidly growing data-
science world outside particle physics. But it is worth noting that function
minimisation in high dimensional spaces is generally not computationally
straightforward and introduces issues of fit convergence, stability, and the
possibility of finding a non-global minimum: manual ‘babysitting’ of fits is not
unusual.
One class of minimisation technique is ‘hill-climbing’ optimisation, which starts with
an arbitrary solution, then makes an incremental change ‘step’ to the solution based on
estimated function gradients. Several specific methods exist to efficiently estimate the
gradient, such as the BFGS method which iteratively improves an estimate of the loss
function’s Hessian curvature matrix. If a better solution is found after a step, another
step is made, and so on until the solution stops changing. This means that the search is
local, and the algorithm may easily get stuck in local minima of the negative log-
likelihood. In a real physics example, it is essential to run multiple minimisations with
different initial conditions to check that the results are robust. There is no general way
to be sure that the true global minimum has been found via this process.
To overcome the problems of local search strategies, one chooses from a wide
range of global optimisation algorithms, which should provide more robust behav-
iour in the case of multimodal functions. One of these is differential evolution which
has recently provided lots of exciting results in particle physics and cosmology,
proving capable of exploring very complicated likelihood functions in up to 20
parameters. Differential evolution is an example of an evolutionary algorithm, where
a population of NP individuals or ‘target vectors’ {X ig}, are evolved through the
parameter space, for a number of generations. Here i refers to the ith individual, and
g corresponds to the generation of the population. Each Xi represent a vector of
parameters, i.e. one point in the parameter space.
The algorithm starts with an initial generation that is selected randomly from
within the range on each parameter. A generation is evolved to the next via three
steps that are inspired by genetics: mutation, crossover and selection. The simplest
variant of the algorithm is know as rand/1/bin; the first two parts of the name refer to
5-41
Practical Collider Physics
the mutation strategy (random population member, single difference vector), and
the third to the crossover strategy (binomial).
Mutation proceeds by identifying an individual Xi to be evolved, and constructing
one or more donor vectors Vi with which the individual will later be crossed over. In
the rand/1 mutation step, three unique random members of the current generation
Xr1, Xr2 and Xr3 are chosen (with none equal to the target vector), and a single donor
vector Vi is constructed as
Vi = X r1 + F (X r 2 − X r3), (5.83)
with the scale factor F a parameter of the algorithm. A more general mutation
strategy known as rand-to-best/1 also allows some admixture of the current best-fit
individual Xbest in a single donor vector,
Vi = λXbest + (1 − λ)X r1 + F (X r 2 − X r3), (5.84)
according to the value of another free parameter of the algorithm, λ.
Crossover then proceeds by constructing a trial vector Ui , by selecting each
component (parameter value) from either the target vector, or from one of the donor
vectors. In simple binomial crossover (the bin of rand/1/bin), this is controlled by an
additional algorithm parameter Cr. For each component of the trial vector Ui , a
random number is chosen uniformly between 0 and 1; if the number is greater than
Cr, the corresponding component of the trial vector is taken from the target vector;
otherwise, it is taken from the donor vector. At the end of this process, a single
component of Ui is chosen at random, and replaced by the corresponding
component of Vi (to make sure that Ui ≠ Xi ).
In the selection step, we pick either the trial vector or the target vector, depending
on which has the best value for the objective function.
A widely-used variant of simple rand/1/bin DE is so-called jDE, where the
parameters F and Cr are optimised on-the-fly by the DE algorithm itself, as if they
were regular parameters of the objective function. An even more aggressive variant
known as λjDE is the self-adaptive equivalent of rand-to-best/1/bin, where F , Cr and
λ are all dynamically optimised.
5-42
Practical Collider Physics
~
g (θ/θ0) vß
uα
ß α
~
θ
Figure 5.11. Illustration of a hypothetical sampling distribution g(θ˜∣θ 0 ) for a particular choice of the true
parameters θ 0 , for a 1D example. The distribution can be obtained via Monte Carlo sampling if we lack an
analytic expression.
Assume for simplicity that we only have one parameter θ. In the general case, we
do not know g(θ˜∣θ0 ) analytically. However, we can still obtain it by using the Monte
Carlo method. This consists of simulating a large number of experiments, computing
the MLE values each time, then looking at how these results are distributed. We can
then define a confidence interval (CoI) for each of our parameters, whose defining
characteristic is that it contains the true value of the parameter some fixed
proportion of the time (e.g. 95%) again given by the confidence level (CL). It is
worth reading this sentence again several times; many people think (and, even worse,
say in public) that a confidence interval and a Bayesian credible interval are the same
thing. They have quite distinct meanings. A frequentist interval says nothing about
the probability of a parameter taking a certain value (which is undefined for a
frequentist), but instead tells us the fraction of times that the derived CoI would
contain the true value of the parameter a certain fraction of times (given by its CL)
in a large number of repeated experiments. Note that the random variable here is not
the true value, but the estimated interval itself!.
The classic construction of CoIs is the Neyman construction. This is more than a
little baroque, so let’s work through it piece by piece. First, assume that, for any
given true parameter value θ0 , we know g(θ˜∣θ0 ) by Monte Carlo simulation. A
hypothetical distribution for a particular choice of θ0 is shown for a 1D example in
Figure 5.11. We can then choose some value uα such that there is some probability α
to observe θ̃⩾uα . We can also choose a value vβ such that there is a probability β to
observe θ̃⩽vβ . We can write
∞
α= ∫u (θ ) g(θ˜∣θ0) dθ˜ = 1 − G(uα(θ0)),
α 0
(5.85)
where G is the cdf corresponding to g, and we have made it clear that various
quantities depend on our assumed true value of the parameter θ0. We can also write
a similar expression for β:
vβ (θ 0 )
β= ∫−∞ g(θ˜∣θ 0) dθ˜ = G (vβ (θ 0)). (5.86)
5-43
Practical Collider Physics
~
g (θ/θ0)
θ0
b
vβ
Confidence
interval
uα
a
~ ~
θobs θ
If we now vary the assumed θ0 , we get different uα(θ0 ) and vβ (θ0 ) values, and we
could plot these in the θ0 − θ˜ plane, as shown in Figure 5.12. The region between
uα(θ0 ) and vβ (θ0 ) is called the confidence belt, and the probability for θ̃ to be inside
the belt is given by
P(vβ (θ 0) ⩽ θ˜ ⩽ u α(θ 0)) = 1 − α − β. (5.87)
Let us now say that we have obtained a particular value of the estimator θ̃obs in
our experiment. This defines two values of a and b, shown in Figure 5.12, at the
points where the horizontal line for θ̃obs intersects the confidence belt15. The interval
[a, b] is then said to be a CoI, at a confidence level of CL = 1 − α − β . Every point in
the interval corresponds to a model with which the observed data is consistent at the
given CL. a and b are themselves random variables, since if we repeated our
experiment many times, we would get different θ̃obs values, which define different
values of a and b. The point of this construction is that, under many repeats, θ0 will
be within the CoI in a fraction of the experiments given by 1 − α − β .
You should think of this complex procedure as an elaborate workaround for the fact
that a frequentist cannot define a probability that a parameter takes a particular value, but
only the likelihood of the observed data under the hypothesis of that value. Sometimes, we
only want a one-sided interval, in which case we can use a as the lower limit on θ0 such that
θ0 ⩾ a with probability 1 − α , or we can use b as an upper limit on θ0. In the case of the
two-sided intervals, the confidence level 1 − α − β is not sufficient to uniquely determine
how to choose uα(θ0 ) and vβ (θ0 ) for fixed θ0 (i.e. we could shift uα(θ0 ) and vβ (θ0 ) around
15
More generally, but unusually, there could be more than two intersection points, producing a disjoint
interval with the same CL.
5-44
Practical Collider Physics
and still get the same probability of θ̃ being between uα and vβ ). There are various
prescriptions that can be used (each called an ordering principle), including:
• add all parameter values greater than or less than a given value;
• draw a central region with equal probability of the measurement falling above
the region as below (e.g. α = β = γ /2) (the central confidence interval);
• start from the mode of g(θ˜;θ0 ) and work downwards, adding points until the
region contains the required amount of ∫ g(θ˜∣θ0 )dθ˜
• the Feldman–Cousins prescription, in which points are added in order of the
decreasing likelihood-ratio:
g(θ˜∣θ 0)
R= , (5.89)
g(θ˜∣θ best )
where θ best is the value of θ (of all physically allowed values) that leads to the
absolute maximum of g(θ˜∣θ ).
Assuming that our particular experiment has led to an estimator value of θ̃obs ,
equations (5.85) and (5.86) can be used to implicitly define the CoI [a, b ],
⎛ θ˜ − a ⎞
α = 1 − G (θ˜obs; a , σ θ˜ ) = 1 − Φ⎜ obs ⎟ (5.91)
⎝ σ θ˜ ⎠
⎛ θ˜ − b⎞
β = G (θ˜obs; b ; σ θ˜ ) = Φ⎜ obs ⎟, (5.92)
⎝ σ θ˜ ⎠
where we have used the standard notation for the cdf of the Gaussian function Φ.
The actual values of a and b can be obtained by inverting these equations:
a = θ˜obs − σ θ̃ Φ−1(1 − α ) (5.93)
5-45
Practical Collider Physics
Table 5.1. Values of the confidence level that correspond to choices of Φ−1(1 − γ /2) for a
central confidence level, for a 1D, Gaussian-distributed estimator.
1 0.6827
1.645 0.90
1.960 0.95
2 0.9544
2.576 0.99
3 0.9973
Here, Φ−1 is the inverse function of the Gaussian cdf, which is the quantile of the
standard Gaussian. Note that Φ−1(1 − α ) and Φ−1(1 − β ) tell us how far the edges of
our CoI a and b are from θ̃obs , in units of σ θ̃ . The choice of α and β then allows us to
define CoIs at different confidence levels. For a central confidence interval with
α = β = γ /2, the confidence level is given by 1 − γ , and a common choice is
Φ−1(1 − γ /2) = 1, 2 or 3. This means that the edges of the interval are 1, 2 or 3
standard deviations away from θ̃obs . Another option is that we choose 1 − γ to be a
round number, so that we get a nice round confidence level (e.g. 95% ∼ 2σ ). The
values of the confidence level for different values of Φ−1(1 − γ /2) are shown in
table 5.1.
The choice Φ−1(1 − γ /2) = 1 results in a ‘1σ ’ error bar, which allows us to quote
the CoI as
[a , b ] = [θ˜obs − σ θ˜, θ˜obs + σ θ˜ ]. (5.95)
If we do not know the value of σ θ̃ , the maximum likelihood estimate σ̃ θ̃ can usually
be used instead, provided we have a large enough set of observations.
1 ⎛ −(θ˜ −θ )2 ⎞
g(θ˜∣θ ) = exp ⎜ ⎟. (5.96)
2πσ θ˜2 ⎝ 2σ θ˜2 ⎠
However, it can indeed be shown that under the same conditions, the likelihood
function itself becomes Gaussian, centered around the maximum likelihood
estimate θ̃ :
⎛ −(θ − θ˜ )2 ⎞
L(θ ; D ) = L max exp ⎜ ⎟. (5.97)
⎝ 2σ θ̃2 ⎠
5-46
Practical Collider Physics
Note that we have not proved that the variance of both Gaussians is the same, but
this can be shown. The log of the likelihood is then given by
⎛ −(θ − θ˜ )2 ⎞
ln L = ln L max − ⎜ ⎟. (5.98)
⎝ 2σ θ̃2 ⎠
We see that a one σ θ̃ variation of θ around the MLE value leads to a change in the
log-likelihood of 1/2 from its maximum value. We can turn this round and define σ θ̃
as the variation in θ̃ that results from varying the log-likelihood by 1/2 around its
1
maximum value. The 2-sigma region corresponds to Δ log L = 2 (2)2 = 2, and the 3-
1
sigma region corresponds to Δ log L = 2 (3)2 = 4.5, so that in general we have
N2
ln L(θ˜ ± Nσ θ̃ ) = ln L max − . (5.99)
2
Putting this together with equation (5.95) tells us that the 68.3% central
confidence level is given by the values of θ at which the log-likelihood function
decreases by 1/2 from the maximum value, which is much easier to think about than
the exact confidence interval procedure given by the Neyman construction. In fact,
this concept remains useful even if the likelihood function is not Gaussian, since it
can be shown that the central confidence interval [a, b ] = [θ˜−c, θ˜+d ] is still
approximately given by
+d N2
ln L(θ̃−c ) = ln L max − , (5.100)
2
where N = Φ−1(1 − γ /2) is the quantile of the standard Gaussian that corresponds to
the desired CL = 1 − γ .
5-47
Practical Collider Physics
Table 5.2. For different choices of the confidence level, this table gives the Qγ value that can be used with
equation (5.103) to derive the confidence region.
Qγ
with the same covariance matrix that appears in the pdf for the estimator. Under
these conditions, the confidence region with confidence level 1 − γ can be obtained
by finding the values of θ at which the log-likelihood function decreases by Qγ /2
from its maximum value,
Qγ
ln L(θ ; D ) = ln L max − , (5.103)
2
where Qγ = F −1(1 − γ ; m ) is the quantile of order 1 − γ of the χ 2 distribution with m
degrees of freedom. Though this sounds highly obscure, the procedure is actually
fairly simple, thanks to the existence of look-up tables for this mysterious Qγ
number. For example, if we want a 90% confidence region, we can use table 5.2 to
tell us the Qγ number that can be used to derive the confidence region using equation
(5.103) (i.e. it tells us how far down in the log-likelihood to go when constructing the
confidence region).
If we depart from the asymptotic limit, this procedure can once again still be used,
but the resulting confidence regions will be only approximate. In fact, for a higher
number of estimators m, the approach to the Gaussian limit is slower as we add
more observations. It is always possible to use Monte Carlo simulation to derive the
actual coverage of the confidence region (assuming that the look-up table cannot be
trusted if we are not in the Gaussian regime).
5.7 Sampling
In discussing estimators and parameter inference, we have often assumed that values
can be drawn from probability distributions in order to compute resulting statistical
moments, or map the shape of the likelihood or posterior-probability function.
While nature apparently does this all the time, it is not so obvious that we can
efficiently mimic the same process — yet we need to, because sampling is central to
particle physics, from event-generation, to parameter inference and hypothesis
testing, and to estimating corrections for detector biases.
We begin with the apparently simple (yet still with potential to be ruinously
inefficient) task of drawing samples from a one-dimensional analytic or
binned distribution, then move to the research frontier of efficiently sampling
5-48
Practical Collider Physics
5-49
Practical Collider Physics
function, locating the global optimum at all becomes a hard problem in need of a
very global approach. Here, the marginalisation integral may be easier to perform
than locating the global best-fit point. We briefly reviewed sampling strategies for
global optimisation in Section 5.6, and hence will here focus our attention on the
concrete case of sampling from a Bayesian posterior.
Assume that we have a posterior pdf p(θ∣D ) that, in the general case, can be
multimodal, with multiple “hot spots” of probability density in the parameter space.
It is important that a sampler visits all of these parameter regions in order for
marginal integrals to correctly converge. A central problem, however, is that as the
dimensionality of the parameter space increases, the chance of a uniformly sampled
point having a significant probability density becomes vanishingly small, since there
is a huge volume of improbable regions. In this section, we will review the worst
possible method of sampling (random uniform sampling from the prior), and suggest
two superior replacements that receive extensive use in the physics literature.
where θ* runs over the samples we have taken. Even in 1D, this will fail to give us an
accurate evaluation of the posterior when it is sharply peaked compared to our grid
size. In larger numbers of dimensions (meaning more parameters), however, we meet
a different problem. If we have D parameters in total, the number of grid scan points
at which we need to evaluate the posterior scales as O(n D ), where n defines the grid
resolution. To get anything like a reasonable sampling density of a multidimensional
posterior, this becomes a very large number very quickly.
What if, instead of scanning points over a grid of equal resolution, we simply
threw samples at random from the prior range on our parameters, and evaluated the
posterior at each of those points? We now get a different problem, which is that our
samples will mostly be concentrated at the boundary of the prior range when D gets
large. To see this, imagine that we randomly choose a variable to be between 0 and 1,
and we define P to be probability of not being near the boundary (i.e. not too close
to 0 or 1). As we increase D, we can define an n-dimensional boundary, where the
probability of a randomly selected point being at the boundary is given by (see
Figure 5.13),
P(boundary) = 1 − P(not boundary) = 1 − P D . (5.105)
5-50
Practical Collider Physics
(1-P)/2 P (1-P)/2
0 1
Figure 5.13. Illustration of the “boundary dominance” problem when random sampling in D ≫ 1 dimensions,
where the naïve fraction of the space contained in the outer parts of the parameter ranges,
P (boundary) = 1 − P D → 1.
A final problem with both random and grid sampling is that it is hugely wasteful
to spend lots of time evaluating the posterior in regions of the space where the
posterior probabiltiy density is very small. It would be far better to have a method
that somehow obtained samples in proportion to the probability density, so that we
concentrated our CPU time in interesting regions of the parameter space.
5-51
Practical Collider Physics
of the system, which we will label X (t ) (this is just the vector of parameters that
defines the current sample). The density Q(X ; X (t ) ) (where X is the next sample
that we will consider adding to our chain of samples) can be any fixed density from
which it is possible to draw samples; it is not necessary for Q(X ; X (t ) ) to resemble
the posterior that we are sampling in order to be useful. It is common to choose a
simple, easily-sampled distribution such as a multidimensional Gaussian.
We then apply the Metropolis-Hastings algorithm as follows. First, we choose a
random sample somewhere within the allowed parameter space (i.e. the parameter
values fall within the range of the prior. This is our starting value of X (t ). We then
use the proposal density Q(X ; X (t ) ) to obtain a new sample X . We choose whether
to accept the sample or not, by computing the quantity:
p* (X ∣D )Q(X (t )∣X )
a= , (5.106)
p* (X (t )∣D )Q(X ; X (t ))
where p* (Xi∣D ) is the unnormalised posterior for the parameter choice Xi . If a ⩾ 1
the new state is accepted, otherwise the new state is accepted with probability a (i.e.
we can draw a random number on the interval [0, 1], and use that to decide whether
we accept the same or not). If we accept the new sample, it becomes the next point in
our Markov chain, but if we reject it, we stay at the current point and try another
sample. Note that, if Q is symmetric function of X (t ) and X , then the ratio of Q
factors evaluates to 1, and the Metropolis-Hastings method reduces to the
Metropolis method, which involves a simple comparison of the posterior at the
two candidate points in the Markov chain.
It can be shown (rather remarkably), that the probability distribution of X (t )
tends to the posterior distribution as t → ∞, provided that Q is chosen such that
Q(X ; X ) > 0 for all X , X . Thus by choosing points via the Metropolis algorithm,
we obtain samples from the unnormalised posterior, spending most of our sampling
time in regions where the posterior is interesting. This is exactly what we wanted!
Note that the presence of the caveat t → ∞ implies that there is an issue of
convergence in the application of the Metropolis algorithm, and this is to be
expected from the Markov Chain nature of the method. Each element in the
sequence X (t ) has a probability distribution that is dependent on the previous value
X (t − 1) and hence, since successive samples are correlated with each other, the
Markov Chain must be run for a certain length of time in order to generate samples
that are effectively independent. The exact details of convergence depend on the
particular posterior being sampled, and on the details of Q , and hence there is some
element of tuning involved in getting the algorithm to run successfully. The initial
samples in the chain are part of a “burn-in” phase, where the chain is effectively
performing a random walk around the space until it finds an area of interest. It will
then start to explore the maximum of the posterior (which might be a local
maximum only, and we shall return to this shortly).
The purpose of the definition in equation (5.106) is to ensure that the Markov
Chain used in the Metropolis method is reversible. By this it is meant that the chain
5-52
Practical Collider Physics
satisfies the principle of detailed balance, i.e. that the probability T (X a; X b ) of the
chain making a transition from a state X a to a state X b is related to the reverse
transition via T (X a; X b )P (X a ) = T (X b; X a )P (X b ). This property is necessary if we
require the distribution of samples from the chain to converge to the posterior.
Before we move on, it is worth thinking about the issues with the Metropolis-
Hastings algorithm. The choice of proposal function is clearly very important.
Imagine that we have a single peak in the posterior, with some characteristic width.
Ideally, the step size of the proposal function would be similar to that width. If it is
much smaller, you will spend ages random-walking around the space before you find
the region of interest. If it is much bigger, you are likely to overshoot the maximum
in the posterior each time you make a step, and you will struggle to converge to the
posterior within a reasonable number of samples. Unfortunately, you do not know
the shape of the posterior in advance in most physics examples! Strategies for getting
around this are doing preliminary scans with different step sizes to try and get a
rough mapping of the posterior which you can use to guide future scans, or using
more fancy MCMC techniques that have adaptive step sizes. The latter have to be
created carefully to ensure that detailed balance is preserved, but there are a number
of techniques on the market.
Another problem is that, as the number of parameters increases, it can get harder
to find regions of interest if they occupy an increasingly small volume of the space.
Changing the proposal function helps, since a multidimensional Gaussian is actually
quite a spiky function in high numbers of dimensions. Again, however, some
preliminary insight (which may come from the physics of the problem) is required if
the results are to be optimum.
Finally, there is a classic issue with MCMC techniques, which is that they struggle
with multimodal posterior functions, i.e. functions with more than one peak.
Imagine that we have two peaks, one that is relatively narrow, and one that is
relatively broad. A Metropolis algorithm with a Gaussian proposal of a fixed width
is going to struggle here, since the Markov chain might find only one of the peaks,
and you will think that it has converged to the full posterior whilst remaining
ignorant of the second peak. Running different chains with different proposal
densities is one way around this, but it is then difficult to weight the samples
correctly in order to obtain the relative height of each peak. If a posterior has many
modes, it is better to use another technique.
where Z is the Bayesian evidence. In MCMC sampling, we threw this away because
it is only a normalisation constant, and it cancels in the ratio in equation (5.106).
5-53
Practical Collider Physics
4
1
3
z
X
0 X4 X3 X2 X1 1
Figure 5.14. Cartoon illustrating (a) the posterior of a two dimensional problem; and (b) the transformed L(X )
function where the prior volumes Xi are associated with each likelihood Li .
where the integral extends over the region(s) of parameter space contained within the
contour L(θ; D ) = λ. To get a picture of what this means, imagine that we have two
parameters, and the likelihood function defines a hill in those parameters.
L(θ; D ) = λ defines a horizontal slice that is at some height up that hill. X (λ ) is
then given by the integral of the prior over the parameters, over the part of the hill
that is higher than that slice. Another way to state this is that equation (5.108) gives
the cumulant prior volume covering all likelihood values greater than λ . As λ
increases, the enclosed prior volume X decreases from 1 to 0.
The definition of the prior volume allows us to write the evidence integral as:
1
Z= ∫0 L(X )dX , (5.109)
5-54
Practical Collider Physics
prior over its full range, and it is normalised as a pdf (i.e. it integrates to unity). Now
take a slightly higher likelihood value, L1. This defines a value of X = X1 that must
be slightly smaller, since the prior volume above that likelihood value is now smaller.
This is marked on the right-hand side in Figure 5.14. We can keep going in this
fashion, choosing higher likelihood values (which define successive shells of the
likelihood as we go up the hill), and charting their associated X values on the plot on
the right. What we find is that the X values decrease as we decrease the likelihood,
until finally the X value goes to zero.
Now, the evidence Z is given by the area under the curve in the right-hand figure,
and we can get an approximate value of that using numerical integration. Assume
that, for some sequence of values Xi ,
0 < XM < ⋯ < X2 < X1 < X0 = 1 , (5.110)
we can evaluate the likelihoods Li = L(Xi ), shown schematically in Figure 5.14. The
evidence can then be approximated as a weighted sum:
M
Z= ∑i = 1Li wi , (5.111)
1
with weights wi given by the trapezium rule as wi = 2 (Xi − 1 − Xi + 1). An example of a
two-dimensional posterior and its associated function L(X ) is shown in Fig. 5.14.
How can one actually perform the summation in equation (5.111)? A neat way to
do it is as follows:
Once the evidence is found by Monte Carlo integration, the final live points and the
full sequence of discarded points can be used to generate posterior inferences, by
assigning each point the probability weight
Li wi
pi = . (5.112)
Z
One can then construct marginalised posterior distributions or calculate inferences
of posterior parameters such as means, covariances, etc. You may be wondering
how it is that one can draw samples from the prior subject to the condition L > Li at
each iteration i . Indeed, this would seem to require advance knowledge of the
likelihood function itself, but the whole point is that we do not know it analytically,
and can only sample from it!
5-55
Practical Collider Physics
The MultiNest algorithm that has been used in many physics studies tackles this
problem through an ellipsoidal rejection sampling scheme. This operates by first
enclosing the live point set within a set of ellipsoids (which may overlap), chosen to
minimize the sum of the ellipsoid volumes. New points are then drawn uniformly
from the region enclosed by these ellipsoids, which removes the need to sample from
the full prior volume at each iteration. The MultiNest algorithm has proven very
useful for tackling inference problems in cosmology and particle physics, partly
because the ellipsoidal decomposition is useful for dealing with posteriors with
curving features (typical of e.g. global fits of supersymmetric models), and for
dealing with multimodal posteriors (where ellipsoids that do not overlap can be
evolved independently as separate modes).
5-56
Practical Collider Physics
composite hypothesis, and we can write a function p(D∣θ )16. Under repeats of our
experiment, we would observe different datasets, since our measurement is a
stochastic process. If the hypothesis H were true, then we would expect our observed
dataset to be somewhere within the statistical fluctuations of the possible datasets
generated by the hypothesis. It is then natural to ask whether our particular observed
dataset falls within the expected statistical fluctuations or not, and to quantify the
level of consistency with the expected behaviour given the hypothesis.
The standard way to do this is to use a test statistic which is a function of the
measured data t(D ) that maps D either to a single number or a vector. One could in
principle use the original data vector itself as the test statistic (i.e. t = D ), but a
statistic of lower dimension is preferred since it reduces the computational overhead
without affecting the ability to discriminate hypotheses. For a goodness of fit test,
the test statistic is chosen to be a function that quantifies the agreement between the
observed data values D , and the values expected from the hypothesis, such that small
values of t indicate better agreement (e.g. the χ 2 statistic introduced in Section 5.3.5).
Assuming that we have defined a 1D test statistic t, we can define the pdf f (t (D )∣H)
of the test statistic, under variations of D . We can then state the probability P of
observing, given the assumption that H is true, a result that is as compatible or less
with H than the observed data set. Since larger t values correspond to lower
compatibility of the data with the hypothesis (by construction), we can write this as
the following integral over the pdf of the test statistic,
∞
P= ∫t obs
dt f (t∣H), (5.113)
where tobs is the value of the test statistic observed in the data, and the left-hand side
is called the p-value of the observation given the hypothesis17. A small p-value
indicates that the measurement made is highly unlikely under the hypothesis,
meaning that some alternative hypothesis—which the statistics do not directly
help us to construct—may be preferred. The p-value tells us the significance of any
discrepancy between the observed data and the predictions of the hypothesis, but it is
not yet a hypothesis test on H itself. However, we can define the latter by comparing
(1 − P ) to a standard threshold, stating that H is rejected at a confidence level CL if
CLobs = (1 − P ) > CL. (5.114)
Using this equation alongside the definition of the p-value, we can write
tobs
CLobs = ∫−∞ dt f (t∣H). (5.115)
16
Note that this is not philosophically equivalent to the likelihood of the data that we described in Section 5.4.
In the case of the likelihood of the data, we take the data as fixed, and we vary the parameters. In the present
case, we are taking the hypothesis as fixed, and we interested in variations of the data.
17
The lower-case p is a break with our convention of capital P for finite probabilities, but this is the standard
notation for p-values.
5-57
Practical Collider Physics
5-58
Practical Collider Physics
32% of the time is often brushed under the carpet! At the 95% CL, the χ 2 limits are
3.84 for k = 1, and 5.99 for k = 2.
What exactly is k? It is not just the number of bins, but the number of
unconstrained elements in the statistical system. If our model really came from
nowhere and has no free parameters, then every bin is free to fluctuate relative
to the model. But what if we introduce nθ parameters θ to the model and find the
combination θfit that fits best? In this case the model and data are no longer
decoupled and free to statistically fluctuate relative to each other: if a bin
fluctuates upward in data, the fitted model will try to follow it to the extent
permitted by the global fit measure. In principle every model parameter
subtracts one freely fluctating bin, reducing the degrees of freedom to
k = nb − nθ . If one had as many free parameters as bins, i.e. nθ = nb and hence
k = 0, the fitted model could simply consist of setting f (x ) = y(x ) and then the
χ 2 would always be a delta-function at zero. Such models where f (x ) = y(x ) are
known as saturated models, and typically have as many parameters as there are
data points. In practice real statistical models are not so mercenary, but contain
some assumptions, e.g. of physics or at least of parametric smoothness. But the
statistical test assumes the worst and the chi-squared distribution is hence not
defined for k < 1.
5-59
Practical Collider Physics
accept 0 reject 0
p(t| 0) p(t| 1)
p1 p0
Figure 5.15. Illustration of a frequentist hypothesis test of two hypotheses H0 and H1. t stands for a general test
statistic, of which log likelihood ratio λ is a particular instance.
or any monotonic relative, most importantly the log likelihood ratio (LLR),18
LLR(D ) = −2 ln λ(D ) (5.119)
18
Note, technically it is a ‘log likelihood-ratio’, and certainly not a ‘log-likelihood ratio’. It’s safest to avoid the
hyphens.
19
In the ratio it doesn’t matter that ln p is negative for probabilities p ∈ [0, 1]—and this range anyway does
not apply for probability densities.
5-60
Practical Collider Physics
i.e. assuming that H0 is true, we then take the pdf of the test statistic p(t∣H0) and
we integrate the tail above the cut where we reject H0 . This is called the
significance of the test of H0, in line with our previous section on testing a single
hypothesis.
The probability of a type-II error is instead given by taking the pdf that assumes
H1 is true, and integrating over the lower tail to the left of the cut (i.e. where we
accept H0 instead of H1). The probability of a type-II error is thus given by
c
β= ∫0 p(t∣H1)dt , (5.122)
where the lower limit of the integral would instead be −∞ for a test statistic that
could take arbitrarily negative values. The statistical power of our test is then 1 − β .
We can say that H0 is rejected at a confidence level CL = 1 − α if the likelihood
ratio λ(D ) > c; i.e. if H0 were true, a large fraction CL of the possible observed
datasets would have given a λ below the threshold c, yet one above the threshold was
measured. Note that this tells us how to choose c - we need to set it to the value that
generates our desired CL for the hypothesis test.
It is instructive to think carefully about how one would actually choose a value of
c for a given CL, for general hypotheses: if one does not know the form of the pdf for
the test statistic analytically, one can obtain it via Monte Carlo simulation. For the λ
statistic above, for example, one can assume that H0 is true, run a simulation of that
hypothesis to get a toy dataset D , and calculate λ(D′). On repeating this exercise for
many toy experiments, one gets p(λ∣H0). The CDF of this can be evaluated
numerically, and then inverted to find c for a given CL.
It is also useful to think about why the likelihood ratio construction is powerful.
It comes from cancellation of degrees of freedom: in our preceding testing of a
single hypothesis, the more random variables (e.g. bins of a distribution) there are,
the more sources of statistical fluctuation, the broader the test-statistic distribu-
tion, and the harder to exclude the model. But if we have two hypotheses, being
compared against the same data, then the same data fluctuations occur in both
hypothesis tests and can be effectively cancelled between them. The statistics being
finite, this cancellation is not perfect, and so the likelihood functions do not
collapse to delta functions (which may be trivially discriminated!), but retain some
statistical width and overlap which necessitates the λ(D ) < c hypothesis-rejection
threshold.
You may find it interesting to revisit this explanation and understand its
connection to the asymptotic result for nested hypothesis comparison in the
following section. Also note that in gaining discriminatory power we lost something
else: the ability to determine whether both hypotheses are inconsistent with the data:
this issue is addressed shortly.
5-61
Practical Collider Physics
hypotheses live in the same parameter space, θ , which turns out to be a necessary
condition of a likelihood ratio test. The typical situation is where one hypothesis—
conventionally the null, H0 —is fixed, and the other is varied through the space,
finding the regions where it is relatively preferred or disfavoured with respect to
H0 . This scenario is called a nested model, as the fixed hypothesis H0 is a subset,
nested inside the more general space of H1 hypotheses to be evaluated.
A typical example is the case of a model whose parameters θ include parameters of
interest ϕ, and some nuisance parameters ν. If we want to find a p-value for ϕ, we can
proceed as above and construct a test statistic tϕ , for which larger values indicate
increasing incompatibility of the data and some hypothetical choice ϕ. If we then observe
a value tϕ = tϕ,obs for our particular dataset, the p-value of the hypothesis ϕ is given by
∞
ts(ϕ , ν ) = ∫t ϕ,obs
p(tϕ∣ϕ , ν )dtϕ. (5.123)
This necessarily depends on ϕ, since ϕ specifies the hypothesis that we are testing. The
problem is that it also depends on the nuisance parameters, and we can only reject the
hypothesis ϕ if the p-value is less than α for all possible values of the nuisance parameters.
We can instead define a test statistic that is approximately independent of ν as follows.
First, we define the conditional maximum likelihood, or profile likelihood, L(ϕ, ν̂ˆ ), where
the double hat denotes the value of the nuisance parameters that maximises the likelihood
given a particular value of ϕ. The procedure of choosing the nuisance parameter values
that maximises the likelihood for a given ϕ is itself called profiling. We can then define a
test statistic in which the choice of null hypothesis is the ‘best-fit’ MLE point in the space of
θ , for which we have a likelihood L ˆ = L(θˆ ) = L(ϕˆ , νˆ ):
L(ϕ , νˆˆ )
λ p ( ϕ) = (5.124)
L(ϕˆ , νˆ )
This is called the profile likelihood ratio, or sometimes (confusingly) just the
profile likelihood, and it now only depends on ϕ. Usually likelihood profiling is
taken a little further: the space of parameters ϕ may itself be larger than can be
shown in a 1D or 2D plot, which requires us to do something with the extra
parameters ϕ if we want to plot a 1D or 2D CL = 95% limit contour20. In this case,
we can include the ϕ parameters that we do not wish to plot in the list of
parameters that get profiled out.
The qualitatively interesting thing about this construction, is that the null and test
hypotheses have different degrees of freedom: for example, if ϕ is a two-dimensional
parameter plane, L(ϕ) has two fewer degrees of freedom than L̂ when we perform
maximisation over the variables that carry hats or double hats, and hence
its statistical goodness of fit is expected to be two degrees of freedom worse.
An extremely useful result called Wilks’ theorem comes into play here: subject to
20
Note that a Bayesian would instead marginalise over the extra parameters in projecting a large parameter set
down to a 1D or 2D plot, and this difference between profiling and marginalisation is one of the key working
differences between frequentist and Bayesian methods in practice.
5-62
Practical Collider Physics
L(ϕ , νˆˆ )
=−2 ln (5.126)
L(ϕˆ , νˆ )
21
A particularly important one is that the main probability density is located well away from physical
boundaries in the parameter space, for example a positivity requirement on masses or cross-sections. If
working with theories close to such a boundary, Wilks’ Theorem cannot be assumed to hold, and an explicit
Monte Carlo method or similar must be used.
5-63
Practical Collider Physics
hypothesis might also be excluded by the data when LLR methods, including the
ubiquitous profile likelihood, are explicitly constructed as test statistics relative to a
presumed-reliable null or best-fit reference model?
An approach known as the CLs method has proven useful in cases where we are
searching for a small new signal in a dataset. We have two possible hypotheses in this case:
Given an observed number of events, we then want to determine which of these two
hypotheses is more favoured. For each value of μ, there is a pdf for the log likelihood
ratio test statistic under assumed variations of the observed data (exactly as we found
above for a general test statistic given that a particular hypothesis was true). To spell
this out completely, let’s assume that we have observed eight events in data, and that
our theory calculations give us b = 3 and s = 4. For the μ = 0 hypothesis, our likelihood
ratio is given by P(8; λ = 7)/P(8; λ = 3). We could then sample repeatedly from a
Poisson distribution P(k ; λ = 3) (which is our background model), to get different
hypothetical numbers of observed events. For each of these we could recalculate the
test statistic (i.e. −2 times the log of the likelihood ratio), and the set of these values for
different hypothetical datasets sketches out the pdf of the test statistic f (tμ∣μ = 0). For
the signal-plus-background hypothesis, we can follow a similar procedure, only we
now sample our hypothetical datasets from P(k ; λ = 7), which gives us the pdf
f (tμ∣μ = 1). By construction, hypothetical data samples drawn from the null hypoth-
esis μ = 0 will be more null-like (or background-like) and will produce a more positive-
valued tμ distribution than signal-like samples drawn from the μ = 1 hypothesis—but
being random processes, it is perfectly possible for the two pdf distributions f (tμ∣μ = 0)
and f (tμ∣μ = 1) to significantly overlap. Indeed, if the signal is small, the background
and signal-plus-background hypotheses might not look very different.
Figure 5.16 shows two example test statistic distributions, taken from the search
for the Higgs boson at the Large Electron Positron collider at CERN. For an
assumed Higgs mass of 115.6 GeV/c 2 , the plot shows the two LLR distributions
for the background and signal+background hypotheses (where we note that the
original paper uses Q for the LLR instead of tμ). The actual observed dataset defines
a particular value of the test statistic t μobs, which is shown as the vertical red line.
5-64
Practical Collider Physics
Figure 5.16. Example of the LLR pdfs for the background and signal+background hypotheses, from the
Higgs-boson search at the Large Electron Positron collider. Q is the log likelihood ratio L(s + b )/L(b ) where,
in the absence of systematic uncertainties, the likelihood terms here would be Poissonian. Credit: http://cds.
cern.ch/record/508857/.
In analogy with equation (5.115), we can define an observed CL value for each
hypothesis using
∞
CLsb = ∫t obs
μ
f (tμ∣μ = 1)dtμ, (5.128)
and
∞
CL b = ∫t obs
μ
f (tμ∣μ = 0)dtμ. (5.129)
5-65
Practical Collider Physics
p-value CLsb is small—a right-going integral that starts on the right-hand tail of the
distribution. But be careful: while t μobs is much more compatible with the background
than the signal, in absolute terms it has poor compatibility with either. In this case
perhaps the CLsb value alone is not the whole story and we should be less willing to
exclude signal models when the background model (or, more generally, some null
reference hypothesis H0) itself is on statistically shaky ground.
The classic example of this arises when the expected yield of events in our search
process is, say, three events, and none are observed. Whether the expectation arises from
the background-only or signal-plus-background model, it only has a P(0; 3) ∼ 4.98%
probability of producing a result with a rate this low. Let’s say that our expectation of
three events comes from the background model. An observation of 0 events is more
background-like than signal-like, but it is the extreme value of ‘background-like’, sitting
on the far right-hand tail of the background-only LLR distribution to such an extent that
it is arguably unrepresentative of the background hypothesis at 95% CL! The long-
standing consensus in this situation was for a long time that experiments should not
publish claims of signal model exclusion in circumstances where the data can also be
argued to be incompatible with the null hypothesis. But this seems too harsh: a low
sensitivity signal process that would produce, say, a signal plus background expectation
of four events may be indistinguishable from the background hypothesis, but an
‘obvious’ signal that leads to 14 signal plus background events can surely still be
reasonably excluded. How do we quantify this intuition?
A neat solution, although strictly speaking ad hoc rather than a pure frequentist
result, is the CLs method. In this, the CL reported by the hypothesis test is not CLsb
as we would expect, but a modified version,
CLsb
CLs ≡ , (5.130)
CL b
where the labelling is just a mnemonic reminiscent of division rules, rather than
meaning that it is actually a CL for signal alone (without background). The effect of
this construction is to reduce CLsb by the extent to which CLb is less than unity: an
extreme observation, unrepresentative even of the background model, will have a
low CLb and hence the even lower CLsb will be somewhat inflated (Figure 5.17).
5-66
Practical Collider Physics
∞
more CLb = ∫ Pb(LLR)dLLR≈1
signal-like LLRobs obs
LLR
∞
CLsb = ∫ psb(LLR)dLLR
obs
∞
CLb = ∫ pb(LLR)dLLR<1
obs
LLR
Figure 5.17. Demonstration of the construction of CLsb and CLb confidence levels from integration of the
respective s + b and b LLR distributions, across all entries less signal-like than the observed LLR. The ad hoc
CLs ≡ CLsb/CLb construction only deviates from CLsb if the s + b and b distributions significantly overlap
with each other and the observation, as in the bottom plot.
It is worth comparing this to the Poisson examples in the previous two sections. In
those cases, we assumed that we had n repeated observations of a variable drawn
from a single Poisson distribution. We are now dealing with a histogram, each bin of
which has its own Poisson distribution.
Several features of equation (5.131) are immediately worthy of note. First, it is a
discrete distribution (a pmf) in ki, reflecting that real events are observed in integer
quantities. If there are non-integer event weights to be applied, they can be absorbed
into the expected rates λ rather than the observed yields k . And secondly, in the
large-k limit, as its individual Poisson terms tend towards Gaussians, the product in
this likelihood tends towards to a multivariate Gaussian.
In searches for new physics, this form is typically made more explicit, expressing
the expected bin yields in terms of background-process and signal-process
5-67
Practical Collider Physics
Our primary use for this likelihood is usually to perform hypothesis testing to
label POI regions (in μ or a more general ϕ space) within a model by the (in)
compatibility of their λ(θ ) with the data observations k , at confidence level CL. Note
the mismatch between the dependence of the expected rates λ on the full set of model
parameters, θ , against our interest in constraining the POI subset, ϕ: as before,
frequentist and Bayesian interpretations differ here, the frequentist view being to
work purely with the likelihoods and to eliminate the nuisance parameters ν by
optimisation in the profile LLR construction; and the Bayesian being to construct
the full posterior probability from the likelihood via Bayes’ theorem and priors on
the components of θ , and then to marginalise (integrate) over ν.
As it is more common in collider physics, we focus on the frequentist approach
via the profile log likelihood ratio,
L(ϕ , νˆˆ )
tϕ = −2 ln (5.134)
L(ϕˆ , νˆ )
where we note that the discretising factorials from the denominators of the Poisson
pmfs have cancelled between the global and conditional profile (log-)likelihoods
since the observed k are the same in both cases. As before, a CL figure may be
computed from tϕ by a rightward integral from its observed value:
∞
CLϕ = ∫t obs
ϕ
f (tϕ∣ϕ)dtϕ. (5.136)
5-68
Practical Collider Physics
case it makes sense to ‘freeze’ the global maximum likelihood statistic at the
conditional value for μ = 0, should the MLE estimator μ̂ < 0, giving rise to the test-
statistic t˜ϕ. The q0 test statistic modifies t˜ϕ to truncate the statistic for testing the
μ = 0 hypothesis (whose disproof would indicate discovery of a new process), and a
qμ test statistic is used for placement of upper limits on the strength of searched-for
processes. In this presentation we will focus on the canonical tϕ test-statistic, whose
consequences qualitatively apply also to the variations, but the reader should be
aware of their existence and the need in general to use the variant appropriate to
their task. A comprehensive presentation can be found in the Asimov dataset paper
listed in the Further Reading of this chapter.
• f (tϕ∣ϕ ), which is the pdf of the test statistic tϕ assuming that the hypothesis ϕ
is true. This is what we worked with above to define CL values for testing the
hypothesis ϕ, but it is worth parsing again to make sure that you really
understand it. For example, consider the case where ϕ is equal to a signal
strength parameter μ, we could define a log likelihood ratio for a particular
value of μ. We could then look at how this is distributed under the assumption
that our data indeed arise from the hypothesis with that value of μ.
• f (tϕ∣ϕ′), which is the pdf of the test statistic under the assumption that some
other hypothesis ϕ′ is true. In this case, for example, we could draw sample
datasets from the hypothesis ϕ′, but still evaluate tϕ for each one, which
results in a different pdf to f (tϕ∣ϕ ).
Now assume that the true hypothesis from which our data are drawn has a parameter
value equal to ϕ′. We then want to test some other value of the parameter ϕ.22
Using a result due to Wald it can be shown that, in the limit of asymptotically large
22
A good example is again that of the Higgs search, for which our data are drawn from the hypothesis μ = 1,
and we might want to test the hypothesis μ = 0 in order to see if we can reject it.
5-69
Practical Collider Physics
⎡⎛ ∂ ln L(θ ; D ) ⎞⎛ ∂ ln L(θ ; D ) ⎞⎤
= E ⎢⎜ ⎟⎜ ⎟⎥ (5.139)
⎢⎣⎝ ∂θi ⎠⎝ ∂θj ⎠⎥⎦
where the expectation value is taken with our single parameter of interest set equal to
ϕ′, and it is an average over possible realisations of the data.
This is an extremely interesting object. As the second identity above shows, it is
the covariance matrix of the set of variables ∂ ln L/∂θi , which are called the
likelihood scores. Since
23
The non-central chi-squared distribution has the rather horrible form:
∞
e−λ /2 (λ /2)i
fX (x; k, λ ) = ∑ fYk +2i (x), (5.137)
i= 0
i!
where Yk +2i follows a chi-squared distribution with k + 2i degrees of freedom. This very non-intuitive formula
is stated for completeness only.
5-70
Practical Collider Physics
∂ ln L(θ ) ∂L(θ )
= L(θ ), (5.140)
∂θi ∂θi
they express the relative sensitivity of the likelihood to parameter variations around.
θ An intuitive way to understand the Fisher information is that if the likelihood is
sensitive to a parameter, it will sharply peak around the true value θ′: as a result its
score—the derivative—will undergo a large shift from large positive to negative
slopes as it passes through θ′. The score will hence have a large (co)variance, I , in
the vicinity of the true parameter values. (The same picture correctly leads to the
conclusion that the mean score is zero at θ′, where the likelihood is maximised, so
the second moment is automatically central.) A more geometric picture, cf. the
second equality in equation (5.138), is that the Fisher information is the Hessian
curvature of the log-likelihood function: indeed, the I ij matrix can be used as a
metric in a differential geometry approach to likelihoods, called information
geometry.
As an aside, one important application of this object for Bayesian statistical
inference is in defining ‘uninformative priors’ on model parameters, when there is no
a priori evidence to favour some parameter values over others. It is often assumed
that this role is played by a flat (uniform) prior probability density p(θ ), but in fact it
depends on the statistical model within which the parameter set θ plays its roles:
scale parameters, for example, generally propagate through models differently than
location parameters, and e.g. Gaussian models’ parameters propagate differently
from those of Poisson models. A neat way to view uninformative priors is that they
transform trivially under reparametrisation of their model, which can be achieved by
use of the Fisher information in the so-called Jeffreys prior construction:
πJ(θ ) ∝ det{I(θ )} . (5.141)
5-71
Practical Collider Physics
∂ 2 ln L(θ ; k ) ⎡⎛ k ⎞ ∂ 2λi ∂λ ∂λ k ⎤
= ∑⎢⎜⎝ λ i − 1⎟
⎠ ∂θj ∂θk
− i i 2i ⎥ . (5.143)
∂θj ∂θk i ⎣ i ∂θj ∂θk λi ⎦
The maximum likelihood estimators for the parameters are obtained by setting
the scores to zero, ∂ ln L/∂θj = 0 for all j. This occurs, unsurprisingly, when the
observations are equal to their model expectations, i.e. when all ki = λi , the
simultaneous modes of all the constitutent Poisson distributions. This motivates
the idea of a special, artificial dataset24 for a given model, in which all observations
are equal to the model expectations (in practice to be derived via analysis of a very
large Monte Carlo event sample), and hence the maximum-likelihood parameter
estimators return the true values. This has become known as the Asimov dataset, in
homage to the Isaac Asimov short story Franchise, where a single voter is chosen as
sole representative of a political electorate.
Once we do that, the Asimov dataset allows us to estimate the Fisher information,
and hence the θ covariances, then f (tϕ∣ϕ′), and finally the p-value beyond the Wilks
approximation. Examining the second-derivative term in equation (5.143), we see
that it is linear in the observed yields ki. Hence, its expectation value over lots of
datasets is found by just evaluating it with the expectation values of the data, which
is the same as choosing the Asimov dataset. We can thus replace the ki with their
Asimov values:
∂ 2 ln LA (θ ; λ ) ∂λ ∂λ 1
=− i i . (5.144)
∂θj ∂θk ∂θj ∂θk λi
Here, we have defined the Asimov likelihood LA as the likelihood that results from
using the Asimov dataset. Using this construction, the derivatives of the Asimov
likelihood LA(θ; λ ) can be evaluated numerically, and the parameter variances
extracted. Alternatively, we can return to Wald’s result that t(ϕ ) = (ϕ − ϕˆ )2 /σ 2 and
note that with the Asimov dataset the ML estimator ϕˆ = ϕ′, hence
tA (ϕ ) = (ϕ − ϕ′)2 /σ 2 = Λ directly.
A final important summary result can be obtained from the Asimov dataset.
Using the Wald approximation, the distributions of the tϕ and other test statistics
can be derived. The significance of an observed tϕ value can be evaluated from
this distribution for a choice of μ, and is generally of the form Z (μ) ∼ tϕ .25 As
the significance is a monotonic function in the value of the test statistic, the
median expected significance to be obtained by an experiment—the key metric for
design optimisation of an analysis—is that function applied on the median tϕ(μ),
which we can approximate by its Asimov value tA(μ). The Asimov median
expected significance of a statistical analysis based on likelihood-ratio test
statistics is hence
24
In the sense of a set of binned yields, not a full set of event-sample data.
25
Actually, tϕ has a slightly more awkward form: the q0 and qμ variants have exactly this square-root form.
5-72
Practical Collider Physics
Simplified likelihoods
The full dependence of an event-counting likelihood on the model parameters θ is
typically not known in full detail, as every value of θ would involve generating and
processing a high-statistics MC event sample with that model configuration, and
processing it through a full detector simulation, reconstruction, and analysis chain.
This would take days at best, on large-scale computing systems, when what is
required is typically sub-second evaluation times, in order that the likelihood can be
optimised after many evaluations. An approximate approach is hence nearly always
taken, parametrising expected (both signal and background) event yields as a
function of each θi ∈ θ , based on interpolation between MC templates typically
evaluated at values -1, 0, and +1 in the normalised nuisance parameters.
These elementary nuisance responses are in general non-linear, making the like-
lihood a very complex function, and difficult to report as an analysis outcome—
although this is a rapidly developing area. In this context, a useful development has
been the simplified likelihood scheme, which replaces this full complexity with linear
responses of expected (background) event yields to an effective nuisance parameter Δi
for each bin i, with interplay between elementary nuisances absorbed into a covariance
matrix between the new effective nuisances. This permits a much simplified explicit
likelihood form,
LS(k ; λ(θS))
1 ⎛ 1 ⎞ (5.146)
= ∏ P(ki ; λi = si + bi + Δi ) · exp⎜ − ΔT Σ−1Δ⎟ ,
n /2
(2π ) det{Σ} ⎝ 2 ⎠
i
where the new nuisance parameters directly modify the nominal background event
yield, bi → bi + Δi , and the new Gaussian term imposes a penalty on such
deviations.
This may appear distressingly close to imposition of Bayesian priors in a
frequentist likelihood, and it is not unique to simplified likelihoods: similar penalty
terms are usually applied to elementary nuisance parameters in full-likelihood fits. It
is hence worth taking a little time here to explain why this is acceptable. These
constraints should be conceived of as encoding the restrictions on the estimators of
the nuisance parameter values imposed by the experiment’s calibration procedures—
which are based on dedicated studies, effectively in control regions well away from
the signal bins26. It could be possible to perform (non-simplified) analyses with the
calibration distributions and elementary parameters included directly in the final
likelihood—but it would be a fantastically complex, computing-intensive, and hard-
work approach. Parametrising the effect of such calibration studies on the likelihood
26
This assumption is also worth noting: as more exotic physics signatures are considered, some such as e.g.
long-lived particles or ’emerging jets’ do indeed overlap with calibration heuristics, mandating a less factorised
than usual approach to analysis and calibration.
5-73
Practical Collider Physics
5-74
Practical Collider Physics
the product of the prior ratio, and another factor which is defined to be the
Bayes factor:
5-75
Practical Collider Physics
The difference between local and global significance is expressed by a trials factor
which accounts for the multiplicity of comparisons. In many other fields, this is
referred to as ‘studentizing’ the test-statistic distribution, and corresponds to a
scaling by 1/ Ntests . In a case like the mH bump-hunt, however, the set of tests is a
continuous variable: how many tests does that correspond to?
Given a likelihood-ratio test statistic q(θ ) and a significance threshold c in that
test statistic, we define the largest observed test statistic as
q(θˆ ) ≡ max[q(θ )]. (5.151)
θ
The trials factor T is then given by the ratio between the maximum local p-value to
the p-value for the excess being seen anywhere in the parameter range,
P(q(θˆ ) > c )
T= . (5.152)
P(q(θ ) > c )
In the asymptotic limit where q(θ ) is χ 2 -distributed, the maximum q(θ ) is bounded
like
P(q(θˆ ) > c ) ⩽ P(χk2 > c ) + 〈N (c )〉, (5.153)
where 〈N (c )〉 is the expected number of q(θ ) > c ‘up-crossings’ sampled from the
background-only model and k = ∣θ∣, i.e. the expected global p-value will actually be
higher by 〈N (c )〉 than the local maximum would suggest. Computing the trials factor
hence requires knowing 〈N (c )〉, which is not trivial: it is specific to the details of the
statistical model, and while it can be estimated by Monte Carlo ‘toy’ sampling,
Figure 5.18. Example estimation of the look-elsewhere effect significance correction. Reproduced with
permission from Gross and Vitells 2010 Eur. Phys. J. C 70 525–30, arXiv:1 005.189 1. (a) An example
background-only toy-MC pseudo-experiment histogram for a hypothetical bump-hunt in mass parameter
m ≡ θ . The blue lines show the best-fit total and background components, and the bottom graph shows the
distribution of q(m ) as compared to the low c0 reference level indicated by the horizontal dashed line. (b) The
resulting trial factor for local-to-global p-value correction as a function of target significance Z. The solid line is
from the toy MC approach (and its yellow error band from its statistical limitations), the dotted black line is the
upper bound from equation (5.153), and the dotted red line the asymptotic approximation of equation (5.154).
5-76
Practical Collider Physics
doing so to the accuracy required for 5σ discoveries requires O(107) samples since the
rate of up-crossings will be miniscule! Leveraging the asymptotic formulae pre-
viously introduced for profile likelihoods, an asymptotic approximation can be
taken in which a lower threshold c0 ≪ c is tested using a relatively small toy-MC
sample, then extrapolated to the more demanding q > c test. This sampling
approach is shown in Figure 5.18(a).
An intuitive feel for what is going on here can be obtained by injecting an
asymptotic expression for 〈N (c )〉 in terms of the chi-squared distribution for k + 1
degrees of freedom, in which case equation (5.153) takes the form
P(q(θˆ ) > c ) ≈ P(χk2 > c ) + NP(χk2 > c ), (5.154)
5-77
Practical Collider Physics
machine learning, a computer learns directly from the data how to solve a
particular problem, without needing to be given a specific algorithm that tells it
what to do. This approach has become ever more powerful as advances in
computing power have allowed users to make much more complex machine-
learning systems, especially with the development of ‘deep’ learning methods in the
first decades of the 21st century.
It is important at the outset to distinguish two types of machine-learning problem:
5-78
Practical Collider Physics
Figure 5.19. Illustration of a decision tree. Nodes are shown as circles, with decision nodes labelled by question
marks, and leaf nodes filled in brown. Events passed through the tree will end up at one, and only one, leaf
node, which can be used to determine the category of the event. The category (signal or background) is
assigned based on which type of event dominates the subsample of events that reach that leaf.
simulate events from the signal and background, thus making this a supervised
problem (i.e. the category of each event we train on is known). We will further
assume that we are working in a particular final state (defined by the multiplicities
and types of the particles we select), and that our analysis can make use of the four-
vectors of the final state particles to compute useful functions expected to provide
signal–background separation. We will assume in the following that we can define a
vector, x, of useful variables for each event. These variables are frequently referred
to as the attributes or features of the event.
The first step towards defining the concept of a boosted decision tree is the
decision tree itself. A decision tree is a binary-tree structure which performs a series
of cuts on attributes to classify the events as either signal- or background-like,
organised much like a conventional flow chart (see Figure 5.19). The aim of the tree
is ultimately to separate the events into sub-samples which are as pure as possible;
that is, some are strongly enriched n signal events, whilst others are enriched in
background events. Starting at the top, we first pick an attribute and place a
selection on it. Each event will then pass down one of two branches depending on
whether its value of the attribute is less than or greater than the cut value, and we
end up at another decision node. A selection is then placed on a different attribute,
leading to further branches and further nodes. The process continues until some
5-79
Practical Collider Physics
stopping criterion is reached, which is typically placed on either the maximum depth
of the decision tree, or the number of events at each node. Once a node cannot be
split further, it is a ‘terminal’ or ‘leaf’ node, and events at that node are classified as
being more signal-like or more background-like depending on the proportion of
signal and background events at that node. This classification27 is performed via an
associated node value c(x ).
We have not yet specified a recipe for picking the attribute used at each decision
node, nor for the cut value placed on the chosen attribute. Based on the logic above,
the aim at each internal node of the tree should be to increase the purity of the
subsamples that follow the selection at that node. There are many possible
procedures, one of which is to use the Gini index, defined as
G = S (1 − S ) (5.157)
where S is the purity of signal in a node, given by the fraction of node events which
are signal. For a node that contains only signal or background events, the Gini index
is equal to zero. At each node, the variable and cut value are optimised by
minimising the overall increase in G when taking the sum of the daughter nodes
after applying the cut, relative to the value of G at the parent node. In the
calculation, the G values of the daughter nodes are weighted by their respective
fractions of events relative to the parent node.
Single decision trees are completely intelligible, since it is obvious why each event
x ends up at a particular node and its associated value. The problem is that they
rarely end up giving optimal signal-background separation, and they are also
typically unstable with respect to statistical fluctuations in the event samples used
to train them. A solution is to use a forest of trees, which combines a large number of
decision trees. The trees may be added one-by-one, allowing information from a
previous tree to be used in the definition of the following one. One can also reweight
the training events for a particular tree, weighting up those which were misclassified
by the previous tree. Each tree is assigned a score wi during the training process, and
the final BDT classifier is constructed from a weighted average over all the trees in
the forest,
1
C (x ) = ∑ wici(x). (5.158)
∑wi i
i
27
Or regression: BDTs can also be used to parametrise complex functions in the attribute space using the same
mechanism. The typical distinction is that classifier trees assign discrete values c = 1 for inferred signal events
and c = −1 for background, while regression of a function f uses continuous values c(x ) ∼ f (x ).
5-80
Practical Collider Physics
Figure 5.20. Core architectural elements of artificial neural networks. (a) A single perceptron, which forms a
basic computer model of a neuron. Input variables xi are fed with weights wi, and are summed through an
activation function which may include a bias parameter. (b) A multi-layer perceptron network, with an input
layer of variables xi, a hidden layer of perceptrons, and an output layer with a single output. The network is
fully-connected from layer to layer.
is performed k times. At each iteration, a different fold is chosen as a test data set
and is excluded from the training sample. The training is performed using data from
the remaining folds, and the trained algorithm is then applied to the test data set to
assess its performance. Comparing the performance of each training iteration can be
used to evaluate the degree of overfitting.
5-81
Practical Collider Physics
Figure 5.21. Schematic diagram of an autoencoder network. The numbers give the number of neurons in each
hidden layer, and the architecture shown has proven to be useful in LHC case studies. Reproduced with
permission from van Beekveld et al 2020 https://arxiv.org/abs/2010.07940.
where ytrue are the true classes of the events, and ypred are the corresponding
predictions of the MLP. The higher the quality of our predictions, the smaller the
binary cross-entropy. The goal of neural network training now becomes
apparent: we must find the values of the weight and bias parameters that
5-82
Practical Collider Physics
minimise the loss function. Having done that, we obtain a network that, for new
event data, can be expected to provide an excellent prediction of the category for
the events.
Unfortunately, there is no analytic way of determing the correct settings for the
weight and bias parameters, and one must rely on numerical optimisation proce-
dures. One choice is gradient descent, where the weights are updated as follows:
∂L
w′ij ⇐ wij − R · . (5.160)
∂wij
Here R is a free parameter known as the learning rate, which controls how large the
update steps should be during training. A more refined training strategy is the Adam
optimiser, which uses a learning rate that is adjusted during the training, rather than
being fixed a priori. In both cases, the gradients are calculated using a procedure
known as back-propagation of errors, which expands the gradient and calculates it
for each weight independently, starting from the output layer and propagating
backward to the input layer. This leads to a substantial reduction in computing
costs, which makes the training of large network architectures tractable.
As in the case of boosted decision trees, neural networks are prone to overfitting
features in the training data. Cross-validation can again be used to mitigate this,
combined with stopping the network training for each iteration when the perform-
ance on the test data set starts to get worse (the performance on the training data set
will always improve). Another option is to add an extra term of the form
LL2 = λ ∑ wi2 (5.161)
i
to the loss function, where λ is a parameter of the algorithm which is smaller than
one. The purpose of this term is to penalise situations where the weights wi become
too large, and the method is called L2-regularisation.
A multi-layer perceptron network, though potentially large, is in fact one of the
most simple network architectures. Many more can be defined by changing the
pattern of connections between layers, and even by adding connections between
neuron outputs and earlier layers. A particularly useful innovation for image
processing is the use of convolutional neural networks which are designed to mimic
the vision system of humans and animals. The first few layers detect high level
features in the images, whilst the later layers perform standard MLP-like classi-
fication on these high-level features. It is thus possible to use neural networks to
automatically learn features of data that are not obvious analytically, which explains
the rapidly expanding literature on neural-network-based jet imaging techniques.
Before we conclude, it is worth stating that neural networks are not limited to
supervised applications, but can also be highly useful for unsupervised problems,
such as anomaly detection. A particular network architecture known as an
autoencoder is shown in Figure 5.21; this symmetric architecture maps the input
variables to a set of output variables that is designed to match the input variables as
closely as possible. By training the autoencoder on simulations of SM events, one
5-83
Practical Collider Physics
obtains a system that is able to more-or-less accurately reproduce the input variables
for SM events—a generative network. When the system is run on new SM events that
were not in the training sample, it continues to provide a good prediction of the
input variables. When it is run on other types of event, however, it fails to reproduce
the input variables, and the discrepancy can be used to determine an anomaly score
for the event. Autoencoders can therefore in principle be used to define variables that
are highly sensitive to certain types of new physics, without having to know what
signal one wants to search for a priori—the devil, of course, is in the detail of
understanding the impacts of modelling and other uncertainties on the relatively
opaque analysis machinery.
Further reading
• There are many excellent books on statistics for high energy physics. Two
particular favourites of the authors are ‘Statistical Methods in Experimental
Physics’, F James, World Scientific, and ‘Statistical Data Analysis’,
G Cowan, Oxford Science Publications.
• The statistics chapter of the Particle Data Group ‘Review of Particle Physics’
(http://pdg.lbl.gov/) provides an excellent and concise review of statistics for
HEP applications.
• The original paper on asymptotic formulae for hypothesis testing, formally
introducing the Asimov dataset, (Eur. Phys. J. C 71 1554 (2011)) is highly
recommended.
• The Eur. Phys. J. C 70 525–30 (2010), arXiv:1 005.189 1 paper on
approximate calculations of look-elsewhere effect trial factors is a short
and enlightening guide to the LEE in general.
• A venerable toolkit for machine learning in high-energy physics is the Toolkit
for Multivariate Data Analysis (TMVA), part of the ROOT framework. Recent
trends however, have increasingly emphasised non-HEP statistics and machine-
learning platforms such as scikit-learn, Keras, TensorFlow and PyTorch. This
software landscape is changing rapidly.
• A comprehensive comparison of global optimisation algorithms for particle
physics applications can be found in 2021 A comparison of optimisation
algorithms for high-dimensional particle and astrophysics applications JHEP
05 108.
Exercises
5.1 A technical design report for a new proposed detector details the expected
performance for electrons. The detector is known to give a positive result
with a probability of 90% when applied to a particle that is actually an
electron. It has a probability of 10% of giving a false positive result when
applied to a particle that is not an electron. Assume that only 1% of the
particles that will pass through the detector are electrons.
(a) Calculate the probability that the test will give a positive result.
5-84
Practical Collider Physics
5-85
Practical Collider Physics
5.8 Verify the set of equations (5.142). Note that, in taking the derivative with
respect to θi , ki is a constant, but λi changes when we change θi .
5.9 A new experiment is designed to search for a new particle with a mass
between 0 and 10 GeV, which should appear as a bump over a smoothly-
falling background in the invariant mass distribution of the particle decay
products. The background is well-described by the function:
m γγbkg = A(a + bm γγ + cm γγ2 ) (5.161)
5-86
Part II
Experimental physics at hadron colliders
IOP Publishing
Chapter 6
Detecting and reconstructing particles
at hadron colliders
Having described the fundamental constituents of Nature, and how they interact, we
have not yet considered the details of the apparatus required to actually observe
these particles and interactions. Indeed, you may be wondering how we ever worked
out that the complex theory described in the previous chapters provides the correct
description of Nature, and how we can design and build future experiments that will
extend our knowledge of particle theory yet further.
The short answer is that every collider that smashes particles together (or into a
fixed target) has at least one particle detector that detects the results of those
collisions. The detector is composed of various sub-detectors that use a variety of
different technologies to target specific sorts of object. Typically, a particle collider
makes new particles that decay almost instantaneously to Standard Model (SM)
particles, and it is these SM decay products that are observed in the detector,
although we will see some important exceptions to this rule later. By exploiting the
known effects caused by different types of particle as they pass through different
materials, we can design and build detectors that tell us which particles were
produced in a particular collision, and what the energy and momentum of those
particles was. For experimentalists, building and testing the operation of detectors
(known as ‘detector commissioning’) is a hugely important part of the job. For
theorists, a basic knowledge of how particle detectors operate is no less essential,
since it tells us what the limits of our measurements are, what precision is achievable
in future experiments for testing new theoretical ideas, and even how to exploit
existing detectors in new ways to discover novel interactions beyond those of
the SM.
Modern circular collider facilities have a variety of detectors located around the
ring. The detectors that we will take most interest in are the ATLAS and CMS
detectors of the Large Hadron Collider, which are general-purpose machines for
measuring SM processes, and searching for particles beyond those of the SM.
ATLAS and CMS are roughly cylindrical, surrounding the beam pipe as shown in
Figure 6.1. The interaction point, which marks the location where the protons or
heavy ions of the LHC actually collide, is in the centre of the cylinder. To
understand how these detectors operate, we will first describe the basic approach
to measuring the outcome of particle collisions, before moving on to examine the
detailed apparatus that allows ATLAS and CMS to perform particle measurements.
Throughout, we will refer to the coordinate system that we introduced in Chapter 1.
We can see this more clearly by cutting a slice through the detector, giving
something like Figure 6.2.
Upon leaving the interaction point, our hypothetical particle first traverses the
tracking detector, before entering the electromagnetic calorimeter, which is designed
to stop electrons, positrons and photons. An electron, positron or photon would thus
normally stop here, although as one might expect the system is not perfect.
Electrons, positrons and photons that escape the electromagnetic calorimeter are
Beam Beam
6-2
Practical Collider Physics
ECAL
Tracking
detector
HCAL
Figure 6.2. Schematic cross-sectional view of a detector in a colliding-beam experiment.
said to punch-through, and you will often see ‘punch-through’ used as a noun to
describe the phenomenon of particles escaping beyond their expected final destina-
tion. If our initial particle was a quark or gluon, it initiates a jet of particles, which is
expected to stop in the next layer of the detector, known as the hadronic calorimeter
or HCAL. If the particle is a muon, it passes through both calorimeters without
interacting much, and leaves further signatures in the outer muon chambers.
The different behaviours of electrons, positrons, photons, jets and muons as they
leave the interaction point allows us to tell these particles apart. A photon will not
leave a track in the tracker because it is electrically neutral, and it will stop in the
electromagnetic calorimeter. An electron or positron will also stop in the electro-
magnetic calorimeter, but it will also leave a track in the tracking detector. A quark
or gluon will initiate a jet that leaves a bunch of tracks close together in the tracker,
before depositing most of its energy in the hadronic calorimeter. A muon, mean-
while, will leave tracks in the tracker and the muon chambers, and the careful
matching of these tracks allows us to reconstruct the muon four-momentum.
Referring back to the table of SM particles, you may be wondering why we have
not yet described the behaviour of neutrinos or tau leptons in the ATLAS and CMS
detectors. The answer is that both of these give yet further signatures in the detector.
Tau leptons decay promptly to lighter leptons, neutrinos and/or hadrons, and leave
characteristic patterns that one may search for specifically in the detector.
Neutrinos, meanwhile, are weakly-interacting, and pass through the detector
completely unseen. Nevertheless, we are able to at least determine that neutrino
production is likely to have occurred by checking the amount of ‘missing energy’ in
an event, as described in Section 6.4.4.
Clearly a detector that is capable of characterising individual particles escaping
the interaction point is also capable of characterising multiple particles that escape
at the same time. In that case, a key parameter of the detector is the number of
readout channels, which is related to the granularity of the detector. If this is not
large enough, multiple particles will overlap in the detector components, and we will
not be able to identify them as separate particles. As colliders move to higher
luminosity, the average number of particles passing through the detector will
6-3
Practical Collider Physics
increase, which necessitates new detector designs with a higher number of readout
channels.
6-4
Practical Collider Physics
6-5
Practical Collider Physics
produced by the passage of a charged particle, and the electron–hole pairs quickly
recombine. Hence one must find a way to deplete the material of charge carriers
before one can usefully apply the technique to measure the position of a particle.
The solution lies in doping. In an ‘n-type’ semiconductor, donor ions from Group V
of the periodic table are added, which introduce energy levels close to the lower end
of the conduction band, thus creating a surplus of electrons in the material. In a
‘p-type’ semiconductor, acceptor ions of group III are added that introduce energy
levels close to the top of the valence band, which absorb electrons from the valence
band and create a surplus of holes. When these two types of doped semiconductor
are brought together, a gradient of electron and hole densities results in the diffuse
migration of majority carriers across the junction. The ionised donors now have
positive charge, whilst the ionised acceptors acquire negative charge, and the
interface region becomes depleted of carriers. There is a potential difference across
this ‘depletion region’, which can be increased through application of an electric field
(adding a ‘reverse-bias voltage’). This increases the width of the depletion region.
Any electron–hole pairs produced by a charged particle passing through the region
will drift along the field lines to opposite ends of the junction, and if the p–n junction
is made at the surface of a silicon wafer, a prototype silicon detector is obtained.
Charge is collected on the surface of the detector, and is amplified before readout.
The density of tracks in ATLAS and CMS is highest nearest the beampipe, since
all particles that pass through the full radius of the detector must pass through a
much smaller surface area at the first layer of the detector. Accurate hit recon-
struction in this region, which is essential for vertex resolution and track recon-
struction, thus requires a very high granularity of detector modules nearest the beam
pipe. For this reason, both the ATLAS and CMS detectors use silicon pixel detectors
for the first few layers of the tracker. The CMS pixel detector has well over 60
million pixels, distributed between layers of a barrel in the middle of the detector,
and disks of the endcap at either side. ATLAS also has a barrel and endcap
structure, but with far fewer pixels than CMS.
At larger radial distances from the beampipe, the flux of particles drops
sufficiently for strip detectors to be a useful replacement for pixel detectors, which
brings a substantial cost saving. These comprise modules that contain a series of
silicon strips, plus readout chips that each read out a fixed number of strips. One can
therefore localise a track hit to at least the location of the strip that fired. The CMS
strip detector once again has a barrel and two endcaps, and is equipped with special
modules that have two silicon sensors mounted back-to-back, with their strips
misaligned at a 100 mrad relative angle. This allows for a 2D measurement of the
position of a hit in the plane of the module (rather than a 1D measurement based on
which strip fired). The ATLAS strip detector, called the SemiConductor Tracker
(SCT) is also arranged into barrel layers and endcap disks, with modules again
containing two pairs of wafers that are placed back-to-back to allow 2D hit
reconstruction (albeit with a different misalignment angle).
A difference between ATLAS and CMS is that ATLAS has much less silicon. The
reason is that ATLAS has another component of the inner detector, which is not
replicated in CMS, called the Transition Radiation Tracker (TRT). This is a drift
6-6
Practical Collider Physics
tube system, with a design that incorporates ‘straw’ detectors. Each of these is a
small cylindrical chamber filled with a gas mixture of Xe, CO2 and O2, in which the
aluminium coated inner wall acts a cathode whilst a central gold-plated tungsten
wire acts as an anode. Charged particles passing through ionise the gas, and the
resulting ionisation cluster is amplified by a factor of ≈ 2.5 × 10 4 whilst drifting
through the electric field in the straw. The wires are split in half at the centre and
read out at each end, and each channel provides a drift time measurement.
The space between the straws is filled with a polypropylene/polyethylene fibre
radiator which increases the number of transition-radiation photons produced in the
detector. These are produced when relativistic particles cross a boundary between
materials with different dielectric constants, and the threshold above which radiation
is produced is dependent on the relativistic factor γ = (1 − v2 /c 2 )−1/2 , where v is the
particle velocity, and c is the speed of light. The xenon in the straw gas presents a
high interaction cross-section to these photons and a signal is produced which has a
higher amplitude than the signal arising from minimally ionising particles. There are
thus two different categories of signal that one wishes to detect in each straw, and for
this reason each channel has two independent thresholds. The lower threshold detects
the tracking hits, and the higher threshold is designed for the transition-radiation
photons. This higher threshold aids particle identification, as, for example, electrons
start producing transition radiation when their momentum is close to 1 GeV, whilst
pions start to radiate only when their momentum is close to 100 GeV. At ATLAS, the
pion rejection factor is expected to be ≈100 for an electron efficiency of 90%.
The ATLAS TRT is intrinsically radiation-hard, and provides a large number of
measurements at relatively low cost. A barrel region covers ∣η∣ < 0.7 and the endcap
extends coverage to ∣η∣ = 2.5.
6.2.3 Calorimeters
The primary job of the ATLAS and CMS calorimeters is to stop electrons,
positrons, photons and jets and measure the energy that was released as they
stopped. They also play an important role in determining the position of particles,
measuring the missing transverse momentum per event, identifiying particles and
selecting events at the trigger level.
High-energy particles entering a calorimeter produce a cascade of secondary
particles known as a shower (not to be confused with the parton showers discussed in
Section 3.3). The incoming particle interacts via either the electromagnetic or strong
interaction to produce new particles of lower energy which react in a similar fashion,
producing very large numbers of particles whose energy is deposited and measured.
A calorimeter can either be built only from the material that is used to induce the
shower, in which case it is called a homogeneous calorimeter, or it can use separate
materials for inducing the shower and for detecting the energy emitted by particles,
in which case it is called a sampling calorimeter. A sampling calorimeter thus consists
of plates of dense, passive material, alternating with layers of sensitive material.
The thickness of the passive layers (in units of the radiation length) determines the
number of times the layers can be used, and thus the number of times that a
6-7
Practical Collider Physics
Electromagnetic calorimeters
The interaction with matter for electrons and photons is different from that of
hadrons. Electrons and photons penetrate much less deeply than hadrons, and
produce narrower showers. Energy loss occurs predominantly via bremsstrahlung1
for high-energy electrons (which for most materials means energies greater than
≈10 MeV ), and high-energy photons lose energy via the related process of e+e− pair
production. The characteristic amount of matter traversed by a particle before
undergoing one of these interactions is described by the radiation length X0, and is
entirely set by the properties of the material being traversed. The expectation value
of the energy of an electron E (x ) as a function of the distance x into the material is
given by
1
Bremsstrahlung (which translates as ‘braking radiation’) is produced by the acceleration of a charged particle
after deflection by another charged particle.
6-8
Practical Collider Physics
−x (6.4)
〈E (x )〉 = E 0 exp X0 ,
where E0 is the incident energy of the electron. We can write a similar equation for
the intensity of a photon beam entering the material (where I0 is the incident
intensity):
−7 x (6.5)
〈I (x )〉 = I0 exp 9 X0 .
6-9
Practical Collider Physics
Hadronic calorimeters
Hadrons interact with the nuclei of the calorimeter material via the strong force, and
the more complex development of hadronic showers relative to EM showers leads to
a number of distinguishing features. Firstly, the fraction of detectable energy from a
hadronic shower is lower than that from an EM shower, which leads to an
intrinsically worse energy resolution for hadronic species relative to electrons and
photons. Secondly, hadronic showers are longer than EM showers, since they are
characterised by a nuclear interaction length λ which is typically an order of
magnitude greater than X0. This length is a function of both the energy and type
of incoming particle, since it depends on the inelastic cross-section for nuclear
scattering. Longitudinal energy deposition profiles have a maximum at:
x ≈ 0.2 ln(E 0 /1 GeV) + 0.7, (6.6)
where x is the depth into the material in units of the interaction length λ and E0 is the
energy of the incident particle. The depth required for containment of a fixed
fraction of the incident particle energy is also logarithmically dependent on E0. This
ultimately means that hadronic calorimeters have to be larger than EM calorimeters,
and the increased width of hadronic showers relative to EM showers means that
their granularity is typically coarser than that of EM calorimeters. The deposited
energy in a hadronic cascade consists of a prompt EM component due to π 0
production, followed by a slower component due to low-energy hadronic activity.
These two different types of energy deposition are usually converted to electrical
signals with different efficiencies, the ratio of which is known as the intrinsic e/h
ratio.
The CMS hadronic calorimeter is a sampling calorimeter divided into four sub-
components, with most it composed of alternating layers of brass absorbers and
plastic scintillators. The layers form towers of fixed size in (η , ϕ ) space. The hadronic
calorimeter barrel sits between the electromagnetic calorimeter barrel and the
magnet coil, and is sufficiently small that hadronic showers might not be entirely
contained within it. For this reason, a hadronic calorimeter outer barrel detector is
placed outside the magnet coil, and the two barrel detectors together cover the
region up to ∣η∣ = 1.4. Endcap detectors cover the range 1.3 < ∣η∣ < 3.0, and a fourth
sub-detector extends coverage to large values of ∣η∣. For pions with an energy of
20 GeV (300 GeV), the energy resolution has been measured to be about 27% (10%).
The ATLAS hadronic calorimeters are also sampling calorimeters, and they cover
the range ∣η∣ < 4.9. A tile calorimeter is used for ∣η∣ < 1.7, using iron as the absorber
and scintillating tiles as the active material. The system consists of one barrel
and two extended barrels, and is longitudinally segmented in three layers.
For ≈1.5 < ∣η∣ < 4.9, LAr calorimeters (similar to those in the ECAL) are used,
6-10
Practical Collider Physics
6-11
Practical Collider Physics
6-12
Practical Collider Physics
ramps up. The trigger system also requires a hardware upgrade to meet the
demands of the high-luminosity LHC, and the inner tracker will eventually be
replaced by a new system entirely made of silicon pixel and strip detectors, known as
the ITk.
6.3 Triggers
The number of interactions at the LHC is very large indeed, and a majority of these
are of little interest to those interested in the frontier of particle physics. Thus, the
ATLAS and CMS detectors do not record every interaction but instead are designed
to trigger on interesting processes. Limitations on the amount of data that can be
stored require an initial bunch-crossing rate of 40 MHz to be reduced to a rate of
selected events of 100 Hz, and the challenge is to do this without missing any of the
rare new physics processes that motivate the entire experiment. As run conditions
evolve at the LHC, further improvements need to be made to the trigger systems to
ensure successful data taking.
6-13
Practical Collider Physics
to write the full event information, which allows a reduction of the total information
flow that can in turn be allowed to increase the total trigger rate.
The CMS trigger utilises two levels. Similar to the ATLAS trigger, the first is a
hardware-based Level-1 trigger that selects events containing candidate objects such
as detector patterns consistent with a muon, or calorimeter clusters consistent with
electrons, photons, taus, jets or total energy (related to E Tmiss ). Selected events are
passed to a software-based high-level trigger via a programmable menu that
contains algorithms that utilise the Level-1 candidate objects.
6.3.3 Prescaling
The very high rate of events at the LHC necessitates having relatively high
thresholds on the Level-1 trigger objects in order to obtain a suitable reduction in
the rate of events entering the high-level trigger. As the instantaneous luminosity and
centre-of-mass energy of the LHC increase, these thresholds must rise still further to
cope with the increased throughput. This has a major impact on physics analyses,
since the trigger requirements set the minimum transverse momentum of objects of
interest that can be probed by an analysis. For this reason searches for lower mass
objects that decay to objects with a relatively small transverse momentum are often
better covered by the earlier 8 TeV run of the LHC, than by the later, and much
more abundant, 13 TeV data.
For some processes that we wish to trigger on, it is not possible to perform the
required analysis with stringent requirements on the relevant object transverse
momenta. Examples include detector calibration and monitoring, various studies in
the ATLAS and CMS b-physics programmes, and the study of ‘minimum-bias
events’ comprised of both non-diffractive and diffractive inelastic-quantum chro-
modynamics (QCD) processes, and which must be studied using a very loose trigger
selection. To trigger on the relevant events, trigger prescale factors are used to reduce
the event rate, so as not to saturate the allowed bandwidth. For a prescale factor of
N, only one event out of N events which fulfill the trigger requirement will be
6-14
Practical Collider Physics
accepted. Individual prescale factors can be applied to each level of a trigger chain,
as required.
6-15
Practical Collider Physics
chain is not the same thing as the electron that we refer to as a particle of the
SM, being instead a defined collection of detector signatures that is designed to most
closely resemble those instigated by a physical electron. This is even clearer in the case
of jets, which do not formally exist as particles of the SM, rather as a phenomenon
that results from the complicated interactions of gluons and quarks. For this reason,
one can obtain fake particles from detector reconstruction, which are particles that
resemble a different object from the one that originated the detector signatures.
Examples include electrons faking jets, and charged pions faking electrons. The
selections used to define each type of object are carefully designed to reduce these fake
contributions as much as possible.
• seeds are classified as electrons if they match a track that points to the
primary vertex in the event;
• seeds are classified as converted photons if they match a track that points to a
secondary vertex;
• seeds are classified as unconverted photons if they do not match any tracks.
Electron and converted photon clusters are rebuilt with a size of 3 × 7 central
layer cells on the barrel, whilst unconverted photons, which typically have narrower
showers, are rebuilt with 3 × 5 central layer cells. In the endcap region, all categories
are built using regions of dimension 5 × 5 cells.
Photons have relatively few processes that can fake them, since the simple
criterion of having a set of electromagnetic calorimeter deposits that do not match
6-16
Practical Collider Physics
a track from the primary vertex turns out to be very stringent. Electrons can be
faked by more processes, and a particularly prevalent one is a jet faking an
electron. Although it is very rare to have a jet which is collimated enough
to resemble an electron, and entirely contained within the electromagnetic
calorimeter, an enormous number of jets are produced at the LHC, and thus
one obtains a reasonable fake rate once one combines these two facts. It is
therefore typical to define several different categories of electrons called
things like loose, medium and tight, with the exact definitions depending on the
current run period and version of the ATLAS reconstruction software. Moving
from loose to tight, the electron definitions are set to be more stringent to reduce
the fake jet background for electron identification, at the cost of reducing the
overall electron reconstruction efficiency. This trade-off between purity and
efficiency is characteristic of all attempts to reconstruct objects from detector
signatures.
Electron identification in CMS data proceeds slightly differently through the
particle flow algorithm, but the basic approach of matching a track to a set of
electromagnetic calorimeter deposits remains the same as ATLAS. Depending on
the position of an electron within the detector, and its momentum, it is sometimes
better to seed the electron from the tracker rather than from the calorimeter, and it is
sometimes the other way around. Very energetic and isolated electrons, where
isolated indicates that they are not surrounded by other particles, are seeded using
electromagnetic calorimeter information. Clusters with energy deposits greater than
a chosen threshold are combined into a supercluster, with the direction of the
electron set to the average position of the clusters weighted by the energy deposits
(this is called the barycentre of the supercluster). The location of the tracker hits is
then inferred from this direction.
For non-isolated electrons, this approach is clearly unsatisfactory, since extra
particles around the electron will contribute energy to the cells around the electron,
and bias the measured barycentre of the supercluster. It is also problematic for
electrons with a small transverse momentum which are highly bent by the solenoid
magnetic field. Electrons release photons via bremsstrahlung, and for highly curved
tracks, these photons are spread over a wide area in the electromagnetic calorimeter,
which makes measuring the position of a supercluster difficult. For these two cases,
the CMS reconstruction software instead uses all tracks obtained via the tracker as
the seeds of future electrons, which are then subjected to a series of quality cuts to try
and isolate electron tracks from other tracks.
Photons are identified in CMS by using all of the remaining electromagnetic
calorimeter clusters at the end of the particle flow process, at which point none of
them should have associated tracks.
6.4.2 Muons
Muons are, in principle, straightforward to reconstruct, given that the only other SM
particle that can reach the muon subsystem is a neutrino (which leaves no trace).
However, there are subtleties in muon reconstruction which arise from the
6-17
Practical Collider Physics
combination of signatures from different subsystems, and how to cover known gaps
in the muon spectrometer acceptance.
Muon reconstruction in the ATLAS data relies primarily on the signals found in
the inner detector and the muon subsystem, with occasional assistance from the
calorimeters. For muons within ∣η∣ < 2.5, the inner detector provides high-precision
measurements of muon positions and momenta. For example, muons with ∣η∣ < 1.9
will typically have three hits in the pixel layers, eight hits in the SCT, and 30 hits in
the TRT, which allows one to select muon tracks by placing stringent conditions on
both the number of hits in the different subsystems of the inner detector, and the
number of silicon layers traversed without a hit. These conditions define a high-
quality track which can be accurately extrapolated to the muon spectrometer, in
which the three layers of high-precision monitored drift tube detectors can provide
six to eight η measurements for a single muon passing through the detector within
∣η∣ < 2.7. A ϕ measurement is obtained from the coarser muon trigger chambers.
The hits within each layer of the spectrometer are combined to form local track
segments, then these are combined across layers to form a global muon spectrometer
track.
At this point, one can define four different types of muon depending on the
presence or absence of muon spectrometer and inner detector signatures:
1. Combined muons: in this case, there is both an inner detector and a muon
spectrometer track, and there is a good match between the two. The
combination of the two tracks yields the best possible precision on the
measurement of the muon properties, and these are the only sorts of muon
considered in most physics analyses.
2. Segment-tagged muons: in this case, there is an inner detector track, and
there are segment tracks in layers of the muon spectrometer which match
well with the inner detector track. However, there is no global muon
spectrometer track. This might happen in the case of a muon that has a
small transverse momentum, or is passing through a region of the detector
where only one layer of the muon spectrometer will typically be hit.
3. Standalone muons: in this case, there is no inner detector track, but there
is a muon spectrometer track. This typically occurs in the region
2.5 < ∣η∣ < 2.7, which is covered by the muon spectrometer, but not by the
inner detector.
4. Calorimeter-tagged muons: in this case, there is no muon spectrometer track,
but there is an inner detector track. In addition. there is an energy deposit in
the calorimeter that matches the inner detector track, and which is consistent
with the passage of a minimally-ionising particle. This type of muon has a
low purity (since the observed signatures can easily be caused by something
else), and the only reason to use calorimeter-tagged muons is in cases where
there are known holes in the muon spectrometer.
CMS muon reconstruction occurs first in the particle flow algorithm, and the
basic details are similar to ATLAS.
6-18
Practical Collider Physics
6.4.3 Jets
We have already seen that when quarks and gluons are produced in the proton–
proton collisions of the LHC, they initiate complex jets, comprised of collimated
sprays of hadrons. We can define jets using the jet algorithms presented in Section 3.6,
but these do not yet supply enough information to understand their reconstruction. In
the following, we will first describe the ATLAS approach to jet reconstruction (up to
the end of Run 2), before turning to CMS.
The main jet-identification algorithm used by the ATLAS reconstruction software
is the anti-kt algorithm with a distance parameter of R = 0.4. One can use various
objects as input to the algorithm, including inner detector tracks, calorimeter energy
deposits, or some combination of the two. Jets constructed from tracks, referred to
sensibly as track-jets are less sensitive to pile up corrections, since one can only count
tracks that point to the primary vertex. However, they have the obvious deficiency
that the jet must have passed through the tracker acceptance of ∣η∣ < 2.5 in order to
be reconstructed. Thus, it is most common to define jets using the calorimeter energy
deposits. The individual cells that contain activity in the calorimeter are first
grouped into significant supersets of clusters, bearing in mind the expected noise σ
which is calculated as the quadrature sum of the measured electronic and pile-up
noise. The clustering algorithm starts from seed cells that have energy deposits
greater than 4σ, and iteratively adds neighbouring cells if their energy exceeds 2σ.
This is followed by the addition of all adjacent cells. The resulting object is known as
a topo-cluster. If topo-clusters have multiple local signal maxima with at least four
neighbours whose energy is less than the local maximal cell, the topo-cluster is split
to avoid overlap between clusters.
It is important to realise that the energy of the topo-cluster is not in fact equal to
the energy that was deposited by the hadronic shower, since it has already been
stated that the calorimeters respond differently to hadronic and electromagnetic
particles. The energy of calorimeter cells is measured at a scale called the electro-
magnetic scale (or EM scale), established using electrons during test beam runs
during the development of the detectors. Topo-clusters that are classified as hadronic
6-19
Practical Collider Physics
can have their energy modified by a calibration factor known as the local cell
weighting (LCW), which is derived using Monte Carlo simulation of single pion
events. One can then choose to reconstruct jets from either the EM topo-clusters, or
the LCW topo-clusters. The jets then have their energy scale restored to that of
simulated truth jets using correction factors known as jet energy scale (JES)
calibration factors. Different sets of factors are defined for EM and LCW jets,
and the calibration accounts for various processes that may disturb the measured
energy of a jet. For example, an origin correction is applied to force the four-
momentum of a jet to point to the primary vertex rather than to the centre of the
detector, whilst keeping the jet energy constant. Another component must mitigate
the known effects of pile-up, that we have previously seen may bloat a jet energy
measurement if it is not removed. A separate correction accounts for the difference
between the calorimeter energy response and the ‘true’ jet energy, defined using
Monte Carlo simulation, whilst also correcting for any bias in the reconstructed η of
a jet that results from transitions between different regions of the calorimeter, or the
difference in the granularity of the calorimeter in different locations. One can also
apply a correction to reduce the dependence of the jet properties on the flavour of the
instigating parto, uing global jet properties such as the fractions of the jet energy
measured in the first and third calorimeter layers, the number of tracks associated
with the jet, the average pT -weighted distance in η–ϕ space between the jet axis and
the jet tracks, and the number of muon tracks associated with the jet that result from
punch-through. Finally, an in situ correction can be applied to jets in data, which is
calculated as the jet response difference between data and Monte Carlo simulation,
using the tranverse momentum balance of a jet and a well-measured reference
object.
The calculation of the jet energy scale comes with various associated systematic
uncertainties, which must be taken account of in all physics analyses. This is obvious
in the case of analyses that require the presence of several jets, in which case the jet
energy scale uncertainty can often be a leading source of the total systematic
uncertainty on the final results. It is also applicable to analyses that veto the presence
of jets, since a jet whose energy is systematically off may end up being below the
transverse momentum threshold of the jets that are vetoed in the analysis.
An important additional quantity in the measurement of jets is the jet energy
resolution, which quantifies, as a function of the detector location, the number of
fluctuations in the measurement of a reconstructed jet’s energy at a fixed particle-
level energy. The jet energy resolution can be carefully measured by studying the
asymmetry of dijet events in the detector, where the asymmetry must be generated
by misreconstruction within the jet energy resolution.
Early CMS analyses used calorimeter jets, reconstructed from the energy deposits
in the calorimeter alone. With improvements in the understanding of the detector,
the preferred modern approach is to use particle-flow jets, which are obtained by
clustering the four-momentum vectors of particle-flow candidates. Particle-flow jet
momenta and spatial positions are measured to a much higher precision than for
calorimeter jets, as the use of the tracking information alongside the high-
granularity electromagnetic calorimeter measurements allows the independent
6-20
Practical Collider Physics
e −, −, d
W−
_ _ _
e, ,u
Figure 6.3. Feynman diagram showing the most common decays of the tau lepton.
6-21
Practical Collider Physics
6-22
Practical Collider Physics
identification, in the same way that they find utility in searches for boosted objects
decaying to b-quarks.
One can also define the ϕ direction of the missing tranverse momentum as:
(
ϕ miss = arctan pymiss / pxmiss ) (6.8)
Theorists often think about pTmiss in the following way. Essentially, it is minus the
vectorial sum of the transverse momenta of all reconstructed particles in an event. If
this vectorial sum evaluates to zero, there is no missing transverse momentum. If it
does not, then there must have been something invisible that was produced in the
hard scattering process, and the pTmiss vector is the sum of the unseen transverse
momenta in the event. So an event that makes two neutrinos has a pTmiss which is the
sum of their transverse momenta. Of course, we never see this directly, and can only
access it via the sum of visible products, whose transverse vector points in the other
direction to balance the event.
Unfortunately, pTmiss is much more complicated than this simple picture would
suggest. In reality, measuring pTmiss involves summing signatures from all detector
2
Note that you will often see this erroneously written as ETmiss , which is a popular convention.
6-23
Practical Collider Physics
where pTmiss,soft results from the unassigned topo-clusters. At this point, we have
unwittingly over-estimated pTmiss,calo by counting electrons, photons and taus that
were also reconstructed as jets. The reconstruction software thus includes an overlap
removal step, by associating each observed cluster with only one object, in the order
in which they appear in equation (6.10). There are some important subtleties in this
procedure, since some objects are reconstructed using a sliding window algorithm in
the calorimeter (e.g. electrons), whilst others are defined using topo-clusters (e.g.
jets), necessitating a careful matching between the two. The individual objects are
also subject to various selections on transverse momenta, quality and/or jet
reconstruction parameters, and these are liable to change over time.
The keen reader will note that there are in fact two muon terms that contribute to
the missing transverse momentum calculation; one is internal to the calorimeter,
whereas the other is external. The external term is set to the sum of the measured
transverse momenta for reconstructed muons, and depending on the current
definition this may include a mixture of combined, segment-tagged and standalone
muons. In this case, it is necessary to be vigilant against biases in the external muon
term that can result from the mis-reconstruction of standalone muons. For the
internal calorimeter muon term, the prescription differs depending on whether a
muon is isolated (within ΔR = 0.3 of a jet) or non-isolated. For isolated muons, a
parametrised average deposition in the calorimeter is used to estimate the
6-24
Practical Collider Physics
contribution. For non-isolated muons, the contribution is set to zero, since any
deposits left by the muons cannot be identified and will have already been included
in previous terms.
The soft term is a little tricker to estimate, since it is not immediately clear how to
account for objects that did not reach the calorimeter, or which calibration constants
to use for unidentified calorimeter energy deposits. The ATLAS calculation starts by
selecting a good quality set of tracks down to a pT of 400 MeV. Any tracks with a
transverse momentum above 100 GeV are ignored. The tracks are extrapolated to
the middle layer of the electromgnetic calorimeter, which is selected due to its high
granularity, and a check is made as to whether there are any calorimeter deposits
with a certain ΔR of each track. If the answer is no, the track is added to an ‘eflow’
term which counts the contribution from missed objects. If the answer is yes, the
track is added and the topo-cluster is discarded, since the tracking resolution is better
than the calorimeter resolution at low transverse momentum. A track might in
principle match several topo-clusters, in which case the track is added and only the
highest-energy topo-cluster is discarded. Finally, there are remaining topo-clusters
which have not been discarded by track matching, and these are added to the eflow
term.
The performance of the missing transverse momentum reconstruction must be
carefully studied, and this can be achieved by comparing data and Monte Carlo
simulations for processes such as Z → μμ + jets and Z → ee + jets, where the
selection of events with lepton invariant masses near the Z peak can produce a clean
sample of events with no expected intrinsic pTmiss. Sources of true missing transverse
momentum can be studied in W → μν + jets and W → eν + jets events. It is
important to note that we have described only the current default procedure for the
pTmiss calculation, and this must often be studied carefully and re-worked for a
particular physics analysis.
In the CMS reconstruction, pTmiss is calculated from the output of the particle flow
algorithm, and is computed as the negative of the vectorial sum of the transverse
momenta of all particle flow particles. As in the ATLAS reconstruction, this could
be affected by the minimum energy thresholds in the calorimeter, inefficiencies in the
tracker, and non-linearity of the calorimeter response for hadronic particles. The
bias on the pTmiss measurement is reduced by correcting the transverse momentum of
the jets to the particle level jet pT using jet-energy corrections, and propagating the
correction into pTmiss via
where pT,corrjet represents the corrected values. This is called the Type-I correction for
pTmiss , and it uses jet-energy scale corrections for all corrected jets with pT > 15 GeV
that have less than 90% of their energy deposited in the electromagnetic calorimeter.
In addition, if a muon is found within a jet, the muon four-momemtum is subtracted
6-25
Practical Collider Physics
from the jet four-momentum when performing the correction, and is added back in
to the corrected object.
In both of the ATLAS and CMS reconstruction procedures, it is essential to apply
the event cleaning procedures that are used to remove non-collision backgrounds in
order to obtain a reasonable estimate of pTmiss .
3
Excited heavy-flavour states, such as B * or D* vector mesons, de-excite promptly by photon emission, but
emission of a much slower W boson is the only way to ‘lose’ the b or c hadron flavour.
6-26
Practical Collider Physics
6-27
Practical Collider Physics
x
sig(x ) = , (6.12)
σx
where x is the measured impact factor for a given track, and σx is its uncertainty. The
sign is taken as positive if the point of closest approach of a track to the primary
vertex is in front of the primary vertex with respect to the jet direction, and negative
otherwise. The algorithms then use template functions for the impact factor
significance, which have been previously calculated for different jet flavours, and
these templates can be calculated in the longitudinal direction along the beamline, in
the plane transverse to the beam line or a combination of both. The distributions are
much wider for b-jets than for light quark flavour jets, and they can be used to
determine the likelihood that a particular candidate jet belongs to a particular
category. To obtain the discriminant for each tagger, a ratio of the b-jet and light
quark jet likelihoods is used.
The ATLAS secondary-vertex tagger instead tries to reconstruct a displaced
secondary vertex, starting from all possible two-track vertices that can be identified
from the set of tracks in the event. To improve the performance, the next step is to
remove all vertices that are likely to come from photon conversions, hadronic
interactions with the inner detector material, or long-lived particles (e.g. KS, Λ).
Next, a secondary vertex is found by iteratively removing outlier tracks until a good
candidate is found. One it has been reconstructed, the properties of the associated
tracks can be used to define a likelihood ratio for the b-jet and light quark jet
hypotheses using quantities such as the energy fraction of the tracks fitted to the
vertex with respect to all tracks in the jet, the number of tracks in the vertex, the
vertex mass, and the angle between the jet direction and the b-hadron flight direction
as estimated from the primary-secondary vertex axis.
A further approach is given by the ATLAS decay chain reconstruction tagger
which fits the decay-chain of b-hadrons, by reconstructing a common b-hadron flight
direction, along with the position of additional vertices that arise along the direction
as the hadron decays to other hadrons. The tracks associated to each reconstructed
vertex along the flight direction can be used to discriminate between the b-jet and
light quark jet hypotheses, and to calculate the final discriminant.
The value of the tagging discriminant (or their combination) for a candidate jet is
known as the b-jet weight. Physics analyses can proceed by cutting on this quantity
directly, but it is much preferred to follow pre-defined working points, which each
define a fixed bound on the discriminant output. These working points come with an
estimate of the efficiency and purity, plus a prediction of the relevant systematic
uncertainties.
The CMS collaboration have developed a variety of b-jet tagging techniques over
the years, including a jet probability tagger that uses only impact factor information,
a combined secondary-vertex tagger that uses a wide range of information on
reconstructed secondary vertices, and a combined multivariate-analysis tagger that
combines the discriminator values of various other taggers.
There are two CMS jet-probablity taggers, called the JP and JBP algorithms. The
JP algorithm uses the signed impact-parameter significance of the tracks associated
6-28
Practical Collider Physics
with a jet to obtain its likelihood of originating from the primary vertex. This
exploits the fact that the negative impact-parameter significance values of tracks
from light flavour jets must arise from within the resolution on the measured track
impact-parameter values. Hence, the distribution of the negative impact-parameter
significance can be used as a resolution function R(s ) that can be integrated from
−∞ to the negative of the absolute track impact parameter significance, −∣IP∣/σ to
get the probability of a track originating from the primary vertex:
−∣IP∣ /σ
Pt = ∫−∞ R(s ) ds . (6.13)
Note that this resolution function depends strongly on the quality of the
reconstructed track, and the probability it assigns will be lower for a track with
lots of missing hits. Nevertheless, tracks that were created by particles from the
decay of a displaced particle will have a low value of Pt. The probability that all N
tracks in the jet are compatible with the primary vertex is defined as
N −1
(−lnΠ)n
Pj = Π · ∑ , (6.14)
n=0
n!
where Π = ΠiN= 1Pt (i ). One can then define −log(Pj) as a b-tagging discriminator, since
it gives higher values for b-jets. The JBP algorithm adds a requirement that the four
tracks with the highest impact-parameter significance get a higher weight in the
calculation of Pj.
The most common heavy-flavour jet definition within the CMS experiment results
from the application of the combined secondary-vertex tagger. This uses a neural
network to classify jets as b-jets, based on almost 20 input variables taken from the
measured jet and track properties. The algorithm is trained on inclusive dijet events,
separated into three categories; jets with a reconstructed secondary vertex, jets with
no secondary vertex but a set of at least two tracks with a 2D impact-parameter
significance value greater than 2 and a combined invariant mass at least 50 MeV
away from the K S0 mass, and events that pass neither of these sets of selections. In the
latter case, only track information can be used in the definition of the tagger.
A typical b-tagging working point chosen for analysis will have an efficiency 77%,
with c-jet and light-jet rejection factors of 6 and 130, respectively.
In recent years, both the ATLAS and CMS collaborations have been using
refined machine learning methods to improve their taggers. This includes the use of
recurrent neural networks that can be applied to an arbitrary sequence of inputs, and
the use of deep learning techniques with a larger set of tracks as input data.
Once b-tagging algorithms have been defined, they need to be carefully calibrated
by deriving correction factors that make simulated events match the observed data.
For any given simulated jet, these corrections depend on both the jet flavour and the
kinematics, and there are separate corrections for tagging real b-jets correctly, and
for mis-tagging c and light flavour jets as b-jets. In order to derive the correction
factors, one needs to take a sample of data events which are known to be rich in
6-29
Practical Collider Physics
b-jets, and for which the various contributing processes are well understood. One
example is to use leptonically decaying top-pair events, since these should be pure in
b-jets at the hard-scattering level. The actual event selection applied is to take events
with one electron, one muon and two or three jets. Events with two same-flavour
leptons can be used to derive data-driven corrections to Z + jets and top-pair
samples which dominate the contribution to this final state. The b-tagging efficiency
is then extracted by performing a 2D maximum likelihood fit as a function of the
leading and sub-leading jet pT , which allows the method to account for correlations
between the two jets. Charm jet mis-tag correction factors can be obtained by
reconstructing the decay chain of D*→D0(κπ )πs , since these events are enriched in c-
jets. After applying a b-tagging requirement, the sample of jets will consist of some
fraction of true b-jets which are correctly selected, and some fraction of c-jets which
are incorrectly selected. The contamination from real b-jets can be estimated by
fitting the proper lifetime of the events, which allows one to isolate the c-tag
efficiency. The light jet tagging efficiency is harder to extract, but it can be estimated
by inverting requirements on the signed impact-parameter significance and signed
decay length significance of jets to obtain a sample enriched in light quark jets. This
sample allows the light jet tagging efficiency to be measured once the contribution of
heavy flavour and long-lived particles is accounted for.
Charm tagging is harder to perform than b-tagging, due to the fact that the lower
mass of c-hadrons relative to b-hadrons leads to a less displaced secondary vertex,
with a lower track multiplicity. Nevertheless, charm tagging is an important topic in
its own right within both the ATLAS and CMS collaborations. Within the ATLAS
collaboration, charm tagging techniques have been developed using boosted
decision trees, with the same set of input variables as are used in b-tagging. Two
discriminants are used; one that separates c-jets from light-jets, and one that
separates c-jets from b-jets. An efficiency of ≈ 40% can be achieved for c-jets, for
c-jet and light-jet rejection factors of 4.0 and 20, respectively. The CMS collabo-
ration follows a similar procedure, with a long list of input variables related to
displaced tracks, secondary vertices and soft leptons inside the jets. The c-jet tagging
efficiency can be measured in data by obtaining a sample of events enriched in c
quarks, for example by selecting events with a W boson produced in association with
a c quark. This process occurs at leading order mainly through the processes
s + g → W −+c and s¯ + g → W ++c¯ , in which the c quark and the W boson have
opposite electric charge. The background, meanwhile, comes from W + qq¯ events, in
which the c quark will have the same charge as the W boson half the time, and the
opposite charge the other half of the time. After applying a preselection, a high-
purity sample of W + c events can be obtained in any variable of interest by
subtracting off the same-sign event distribution from the opposite-sign event
distribution in that variable. Another approach is to use single leptonic tt¯ events,
in which one of the W bosons has decayed to quarks. In this case, the decay contains
a c quark in about 50% of cases. Because of the particular decay chain of the top
quark, the energy of up-type quarks produced in W boson decay is larger, on
average, than the energy of the down-type quarks produced. This allows the
identification of samples of jets enriched and depleted in c quarks.
6-30
Practical Collider Physics
6.5.1 Pile-up
As we saw in Section 1.3, the protons at the LHC collide in bunches, with Ni protons
per bunch. The collision of these bunches not only results in proton–proton
interactions in which a large transfer of four-momentum occurs, but also in
additional collisions within the same bunch-crossing. These latter interactions
primarily consist of low-energy QCD processes, and they are referred to as in-time
pile-up interactions. These are not the whole story, however, since both the ATLAS
and CMS detectors are also sensitive to out-of-time pile-up, which consists of
detector signatures arising from previous and following bunch-crossings with respect
to the current triggered bunch-crossing. These contributions grow when the LHC
runs in a mode to increase the collision rate by decreasing the bunch spacing. The
sum of these distinct contributions is generically referred to simply as pile-up.
Pile-up cannot be directly measured, and its main effects are to make it seem like
jets have more energy in an event than they actually do, and to produce spurious jets.
This in turn will bloat the measurement of the missing transverse-momentum in an
event. Thankfully, pile-up effects can be mitigated by exploiting various low-level
detector measurements. For example, silicon tracking detectors have a very fast time
response, and are thus not typically affected by out-of-time pile-up. They can
therefore provide a safe measurement of the number of primary vertices, NPV, in an
event, which is highly correlated with the amount of in-time pile-up. For other
detectors, such as some calorimeters, the integration time for measured signals is
longer than the bunch spacing, and one must develop a careful strategy for pile-up
subtraction that takes into account the effects of both in-time and out-of-time pile-
up. Here, a helpful strategy is to use the inner detector to identify which charged-
particle tracks match the various energy deposits in the calorimeter. This allows the
identification of calorimeter deposits that mostly result from non-hard scatter
vertices, which can then be rejected as jets.
6-31
Practical Collider Physics
The actual pile-up correction procedures followed by ATLAS and CMS are
detector-specific, and subject to ongoing research. The first step for ATLAS data is
to build a pile-up suppression into the topological clustering algorithm that identifies
the energy deposits in the calorimeter that are used to build jets. In this procedure,
pile-up is treated as noise, and cell energy thresholds are set based on their energy
significance relative to the total noise. Raising the assumed pile-up noise value in the
reconstruction can be used to suppress the formation of clusters created by pile-up
deposits. However, the use of a fixed assumed pile-up noise value does not account
for pile-up fluctuations due to different luminosity conditions over a run period, or
local and global event pile-up fluctuations. Such effects can be taken account of by
attempting to subtract pile-up corrections to a jet on an event-by-event and jet-by-jet
basis, by defining the corrected jet transverse momentum:
pTcorr = pTjet − O jet (6.15)
where pTjet is the naïve measurement of the jet transverse momentum, and O jet is the
offset factor. By defining and measuring a jet area Ajet for each jet in η–ϕ space (see
Section 3.6), along with a pile-up pT density ρ, a pileup offset can be determined
dynamically for each jet via:
O jet = ρ × Ajet . (6.16)
For a given event, ρ itself can be estimated as the median of the distribution of the
density of many jets, constructed with no minimum pT threshold:
⎧ jet ⎫
⎪ pT, i ⎪
ρ = median⎨
⎪
⎬,
jet ⎪
(6.17)
⎩ Ai ⎭
where each jet i has a transverse momentum pT,jeti and area Ai jet . This can be
computed from both data and Monte Carlo events.
After this subtraction, there is still a residual pile-up effect that is proportional to
the number (NPV − 1) of reconstructed pile-up vertices, plus a term proportional to
the average number of interaction vertices per LHC bunch-crossing 〈μ〉 which
accounts for out-of-time pile-up. The final corrected jet transverse momentum is
thus given by:
pTcorr = pTjet − ρ × Ajet − α(NPV − 1) − βμ, (6.18)
where α and β are free parameters that can be extracted from simulated dijet events,
but which are carefully validated through studies of the ATLAS data.
The pile-up subtraction procedure described above removes pile-up contributions
to jets from the hard scattering process, and it is also sufficient to remove some
spurious jets. However, some of the latter type of jets survive the process, and these
are typically a mixture of hard QCD jets arising from a pile-up vertex, and local
fluctuations of pile-up activity. Pile-up QCD jets are genuine jets, and can be tagged
and rejected using information on the charged-particle tracks associated with the jet
6-32
Practical Collider Physics
(i.e. whether they point to a pile-up vertex rather than the primary vertex). Pile-up
jets arising from local fluctuations contain random combinations of particles from
multiple pile-up vertices, and these can also be tagged and rejected using tracking
information. One such method is to use the jet-vertex-fraction (JVF), defined as the
fraction of the track transverse momentum contributing to a jet that originates from
the hard scatter vertex. More recently, ATLAS has introduced a dedicated jet-
vertex-tagger (JVT) algorithm.
The CMS collaboration follow a qualitatively similar approach to pile-up
subtraction, but with some unique twists. For example, the use of particle flow
reconstruction allows CMS to perform charged hadron subtraction to reduce the in-
time pile-up contribution from charged particles before jet reconstruction. CMS also
applies the jet-area-subtraction method, although the residual correction (the α and
β terms in equation (6.18)) is parameterised differently with a specific dependence on
the jet pT . Finally, spurious pile-up jets can be identified using a boosted decision
tree discriminant.
It is important to note that these techniques fall down outside of the tracker
acceptance, and thus dedicated techniques are needed for pile-up suppression at
higher η. CMS again uses a multivariate discriminant, whilst ATLAS uses tracks in
the central region to indirectly tag and reject forward pile-up jets that are back-to-
back with central pile-up jets. This removes QCD jets, but does not reduce the
number of forward jets that instead arise from local pile-up fluctuations.
6-33
Practical Collider Physics
molecules inside the vacuum pipe which, despite best efforts, will never contain a
perfect vacuum. This is referred to as the beam gas background, and can be
generated either from inelastic interactions with gas molecules in the vacuum pipe
near the experiments, or from elastic beam-gas scattering around the ring. There is in
fact no clear difference between the beam halo background and the background
resulting from elastic beam gas interactions, since the latter will slightly deviate a
proton from the beam line, causing it to later hit the tertiary collimator. Inelastic
beam gas interactions, meanwhile, will produce a shower of secondary particles that
mostly have local effects, with the exception of high-energy muons which can travel
large distances and reach the detectors. These muons are typically travelling in the
horizontal plane when they arrive at the experiments.
Machine-induced backgrounds cause very funky detector signatures. Results
might include fake jets in the calorimeters, which then generate a high missing
transverse momentum because the spurious activity will not balance activity from
the hard scattering process. Fake tracks might also be produced which, when
matched with a calorimeter deposit, may generate a fake electron or jet. Factors that
affect the prevalance of machine-induced backgrounds include the beam intensity
and energy, the density of gas particles within the beam pipe, the settings of the
collimator and the status of the LHC machine optics. It has been possible to
simulate machine-induced backgrounds in ATLAS and CMS and obtain reasonable
agreement with the observed phenomenology.
The strategy for mitigating machine-induced backgrounds in the LHC detectors
first involves carefully studying their properties by, for example, triggering on non-
colliding bunches using dedicated detectors placed around ATLAS and CMS.
Beam background events can also be identified by the early arrival time of the
particles compared to collision products at the edges of the detector. This reveals
certain features of beam-induced background events that can be confirmed by
detailed simulation. For example, muons from beam–gas interactions near the
ATLAS detector will generate particles that are essentially parallel to the beam-
line, leaving long continuous tracks in the z direction in the pixel detector.
Particles that enter the detector outside of the tracking volume can be studied in
the calorimeter and muon subsystems. Fake jets in the calorimeter from beam-
induced backgrounds typically have two distinguishing features; their azimuthal
distribution is peaked at 0 and π (compared to a flat distributon in ϕ for jets from
proton–proton collisions), and the time of the fake jets is typically earlier than that
of collision jets. The standard approach for reducing beam-induced background in
physics analyses is to apply jet cleaning selections that exploit these differences in
calorimeter activity. Examples of typical discriminating variables include the
electromagnetic energy fraction (defined as the fraction of energy deposited in the
electromagnetic calorimeter with respect to the total jet energy), the maximum
energy fraction within any single calorimeter layer, the ratio of the scalar sum of
the transverse momenta of tracks associated to the jet to the transverse momentum
of the jet (which will be higher for collision jets), and timing properties in the
calorimeters and muon spectrometer.
6-34
Practical Collider Physics
The cosmic-ray muon background has also been studied in detail in both ATLAS
and CMS. If a cosmic ray muon traverses the whole detector, it will give rise to
separate muon tracks reconstructed in the upper and lower hemisphere of the
detector, reconstructed as back-to-back muons that do not pass through the interaction
point. The two tracks are reconstructed with opposite charge, due to the reversed flight
path of the cosmic ray track in the top sectors with respect to the flight path of particles
travelling from the interaction point. If the muon time-of-flight through the detector is
not fully contained within the trigger window, one might only see a track in one of the
hemispheres. Cosmic ray muon tracks can be rejected by placing selections on the
impact parameter of muon tracks (which will be substantially larger for cosmic ray
muon tracks), and by exploiting topological and timing differences between interaction
and cosmic muons in the muon spectrometer. The jet cleaning cuts developed for
machine-introduced backgrounds are also efficient for suppressing fake jets from
muons that undergo radiative loses in the calorimeters.
It is worth noting that not all signatures searched for in proton–proton collisions
at the LHC would involve particles that decay promptly at the interaction point.
Some theories give rise to long-lived particles that would themselves give a very
funky pattern of detector signatures, in which standard assumptions about tracks
and calorimeter deposits pointing back to the interaction point break down. In this
case, a dedicated re-tracking and re-vertexing for displaced vertices needs to be
implemented, and any signatures expected in a candidate theory need to be carefully
compared with those expected from non-collision backgrounds.
6-35
Practical Collider Physics
Further reading
• An excellent short guide to particle detectors can be found in ‘Particle Physics
Instrumentation’ by I Wingerter-Seez, arXiv:1 804.112 46.
• The ATLAS detector at the time of its installation is described in ‘The
ATLAS Experiment at the Large Hadron Collider’ JINST 3 S08003. The
CMS detector is similarly described in ‘The CMS experiment at the CERN
LHC’ JINST 3 S08004.
Exercises
6.1 Prove that the momentum p of a particle of charge q moving in a magnetic
field B is given by p = Bqr , where r is the radius of curvature of its circular
track.
6.2 Explain why the momentum resolution of an electron in the ATLAS and
CMS detectors can be expected to be worse at very high pT values. Why is it
also worse at very low pT values?
6.3 An electron with an energy of 30 GeV strikes the CMS calorimeter.
Estimate the expectation value of its energy once it has moved a distance
of 4 cm.
6.4 Match the following descriptions of detector activity to the particle types
electron, muon and photon.
(a) Track in the inner detector, no calorimeter deposits, track in the
muon chambers.
(b) Track in the inner detector, most of the energy deposited in the
electromagnetic calorimeter.
(c) No track in the inner detector, most of the energy deposited in the
electromagnetic calorimeter.
6.5 What, if anything, distinguishes an electron from a positron in the inner
detector?
6.6 Give two reasons why the granularity of silicon detector modules must be
increased in the detector region closest to the beam line.
6.7 You are asked to perform an analysis in a final state with two tau leptons.
What fraction of di-tau events can be expected to have two hadronic tau
decays in?
6.8 In a certain run period of the LHC, it is expected that there are an average
of 24 interactions per bunch-crossing. Assuming that the number of
interactions is Poisson-distributed:
(a) What is the probability that only one interaction occurs in a bunch-
crossing?
(b) What is the probability that ten interactions occur per bunch-
crossing?
6-36
IOP Publishing
Chapter 7
Computing and data processing
The first part of this book has described in some detail the theoretical picture, and
corresponding sets of tools and techniques, within which most collider physics
phenomenology takes place. The typical approach taken by physics-analysis
interpretations is to compare simulated event observables to measured ones, and
hence statistically quantify whether a particular model is sufficient or incomplete.
This statistical goal, and the need to understand the detector itself, leads to a great
need for simulated collider events as well as real ones. While not computationally
trivial—in particular, as we shall shortly see, modern high-precision event gener-
ation can be very costly—extending these ideas to data analysis within an experi-
ment implies a computational processing system at vast scale.
The conceptual structure of experiment data processing is shown in Figure 7.1.
The dominant feature of this diagram is the parallel pair of processing chains for real
experimental data as compared to simulated data. Real data is retrieved from the
detector readout hardware (after the hardware trigger, if there is one) via the data-
acquisition system (DAQ), usually passed through a software trigger for more
complex discarding of uninteresting events, and the resulting raw signals are
operated on by reconstruction algorithms. An important distinction is made here
between online processing such as the triggers, which run ‘live’ as the event stream
arrives from the collider and do not store all of the input events, and offline
processing, which is performed later on the set of events accepted and stored on disk
after the online filtering decisions. In parallel with the event stream, the operational
status and alignment information about the beam and subdetector components is
entered in a conditions database, used to inform the behaviour of the equivalent
simulation, and the shared reconstruction algorithms.
A significant caveat to this picture is due for trigger-level analysis, a relatively
recent innovation that blurs the distinction between online and offline data-
processing tasks. At the time of writing, all LHC experiments have initiated work
on trigger-level analysis software, for statistically limited analyses where the rate of
REAL SIMULATED
Event
Collider Generator
event record
primary particles
Detector Detector
Simulation
Subdetectors
HITS
Triggers
Digitization
DAQ
RAW RDO
Reconstruction
ESD
Analysis
Figure 7.1. Real (LHS) and simulated (RHS) collision data-processing flows in detector experiments.
candidate events is too high for un-prescaled triggering. Much of the pioneering
work in this area has been led by the LHCb collaboration, who have the double
incentive of a reduced instantaneous luminosity with respect to the other experi-
ments due to intentional beam defocusing, and a core physics programme dependent
on rare decays.
Simulated data (sometimes ambiguously known as Monte Carlo (MC) data), by
comparison, obviously has no physical detector as a source of events. Instead, a
chain of algorithms is assembled to mimic the real-world effects on data in as much
detail as is tractable, while taking advantage of opportunities for harmless computa-
tional short-cuts whenever possible. In place of a collider, simulated data uses event
generator programs, as described in Chapter 3; in place of a physical detector with
which the final-state particles interact and deposit energy, a detector simulation1 is
used; and in place of the readout electronics of the real detector, simulated data uses
a set of digitization algorithms. The result of this chain is essentially the same data
format as produced by the real detector, which is thereafter processed with the same
reconstruction and data-analysis tools as for real data. Simulation, of course, has no
online stage: everything is performed at an ‘offline’ software pace, although this is
1
Detector simulation is often referred to simply as ‘simulation’—which can be confusing as everything on the
right-hand chain of Figure 7.1 is simulated. Another potentially confusing informal terminology is ‘Monte
Carlo’ to refer to event generation, when detector simulation also makes heavy use of MC methods, and indeed
in some areas of the field ‘MC’ is more likely to mean detector simulation. When in doubt, use the full name!
7-2
Practical Collider Physics
not to say that the simulation chain does not discard unwanted events in a manner
equivalent to a trigger.
Were the simulated ‘uninteresting’ physics and detector effects perfectly equiv-
alent to those in the real world, most data analysis would be relatively trivial, but of
course in practice there are significant differences, and a great deal of the ‘art’ and
effort of collider data analysis lies in minimising, offsetting, and quantifying such
biases.
7-3
Practical Collider Physics
7-4
Practical Collider Physics
Figure 7.2. Part of a typical HepMC event-record graph for simulation of tt¯ production and decay by the
Pythia 8 MC generator. The beam particles are at the top of the graph, and simulation proceeds with dashed
lines for the pre-hadronisation matrix-element (a single vertex, indicated by the ME mark), PS, and multiple
partonic interactions (MPI) model products, and solid lines at the bottom for the final-state particles and
hadron decay chains. The complexity of the graph reflects a combination of computational modelling methods
and physics, and does not fully represent a classical history of event evolution.
While intuitive and reminiscent of Feynman diagrams, the impression given by LHE
and HepMC of an unambiguous event history is of course false: every event is produced
in a quantum process, with the hard scattering composed of multiple, interfering process
amplitudes, perhaps including an entirely different set of internal ‘mediator’ fields. It is
hence impossible to infer production by a particular Feynman graph: at best, the single
representation written into an event record will represent the amplitude with the largest
non-interference term in the squared ME. The same quantum ambiguity same applies to
the PS, which—despite appearances—represents not a physical and time-ordered set of
splittings, but the calculational techniques used to effectively resum multiple emissions
of quantum chromodynamic (QCD)-interacting particles.
A good rule of thumb is to treat the pre-hadronization ‘parton record’ as for debugging
purposes only, unless deeper interpretation is unavoidable and performed with great care;
the post-hadronization particle record, including particle decays, on the other hand, can
be happily interpreted as a semi-classical history of independently evolving, non-
interacting states. Status codes are attached to particles and vertices, so that they can
be unambiguously referred to in event records, and a final unique particle/vertex identifier,
the barcode, is frequently used to label the origin of an event element in the primary
generator or in later processing steps either at the generation or detector-simulation
stages2. In addition, the type of each particle is referred to using a numerical code called a
PDG ID code, or particle ID code. This is defined in a publication maintained by the
Particle Data Group, with specific codes attached to each particle (e.g. 11 and −11 for
electrons and positrons, 13 and −13 for muons and antimuons, and so on).
The kinematics and particle IDs of each event, and the internal structure are the
headline features of an event record, but several pieces of metadata are additionally
2
Note, however, that this convention is gradually being replaced with separate integer codes for the unique
identifier and origin-label facets of MC particle identity.
7-5
Practical Collider Physics
propagated. Particularly crucial are the MC cross-section estimate, and sets of event
weights—we deal in some detail with the latter in the next section. The former may
seem strange as an event-wise property, as a cross-section is a physical quantity
associated to a process type (and kinematic selection phase-space), rather than to
individual events. Although this is correct, an MC generator’s estimate of the
process cross-section is, like individual events, obtained by sampling from the
kinematic phase-space and evaluating the ME at each sampled point: while there is a
pre-generation sampling (a.k.a. phase-space integration), the statistical precision on
this estimate is improved by the further sampling that occurs during the generation
of events. This ever-improving (although in practice, hopefully, rather stable)
estimate is hence usually written on an event-by-event basis, with the last event’s
value being the best estimate, which should be used for physics purposes.
Imperfect sampling
ME phase-space points, i.e. the kinematic configurations of incoming and outgoing
partons on the ME, represented by the symbol Φ, are chosen from a proposal density
function, pprop (Φ), usually an analytic distribution or sum of distributions amenable
to the inverse cumulative density function (cdf) sampling technique discussed in
3
In mathematics, a tuple is an ordered list (sequence) of elements.
4
This motivated the construction of the approximate, pre-combined PDF4LHC PDF set, whose uncertainty
bands match those of the total envelope to a reasonable approximation for many purposes.
7-6
Practical Collider Physics
Section 5.7.1. The proposal distribution will not exactly match the true, far more
complicated squared ME function —if we knew this in advance there would be little
need for MC techniques—and hence a scaling is required to obtain the physical
distribution. This scale-factor is the sampling event weight, given by the ratio of the
true ME pdf 1/σ dσ /dΦ to the sampling proposal pdf pprop, evaluated at the phase-
space point Φ.
Systematics
One can explore the effects of systematic uncertainties such as those mentioned
above, by performing systematic variations of elements of the event-generation
calculation. Each mode variation is represented by a weight that departs from the
nominal weight for each event;
Phase-space biasing
Intentional biasing of the phase-space sampling to achieve a more efficient
population of events for the physics application.
The first two of these weight origins are accidental, and in an ideal world of
perfect proposal densities and PS algorithms would be equal to 1. But the latter two
are quite deliberate. We will address these two types of weight in order.
7-7
Practical Collider Physics
Figure 7.3. Distributions of event weights from the Sherpa event generator in W (→e +νe ) + ⩾2 jets events
simulated at LO (left) and NLO (right) in the QCD coupling. The wider the weight distribution, the less
efficient the unweighting step of the generation. It can be clearly seen that larger numbers of additional jets
make the distribution much wider, due to the increasing complexity of the ME as a function of the larger
phase-space, and that the intrinsically more expensive NLO calculations (show up to only two additional jets)
amplify this problem. CREDIT: plot by Stefan Hoeche/Sherpa Collaboration
retaining the significant negative weights—and the resulting weight vector divided
by that nominal absolute-weight to create an unweighted event set which (at least for
the nominal stream) converges faster to the physical phase-space distribution. This is
referred to as the unweighting procedure5.
Unweighting is a waste of events by some measure—after all, you went to the
trouble of sampling the ME and maybe generating a couple of extra parton
emissions via ME/PS matching—but it is usually beneficial overall because the
punitive expense of downstream processing (detector simulation and reconstruction)
is avoided for events not worth the expense. It would be much better, of course, to
use a generator with an efficient sampling of the ME and phase-space, but for multi-
leg and beyond-tree-level generation this is a tough challenge.
Figure 7.3 shows how additional jets and the inclusion of one-loop terms in MEs
both broaden the sampling weight distributions for Sherpa W (→e +νe ) + ⩾2 jets
events, due to the unavoidable mismatch of current proposal densities to the true
phase-space pdf. The deviations of these distributions from ideal delta-function spikes
to broad distributions creates two problems. First, low weights lead to poor
unweighting efficiencies and hence multiply the CPU cost for generation of unweighted
events. This, however, is expensive but not physically fatal; a potentially greater second
problem is the tail toward high weights—it is possible for the sampler to only discover
the true maximum-probability regions of the true matrix element relatively late in a
generator run. This creates a problem for unweighting: one cannot rejection-sample
events whose sampling weight is outside the [0, 1] range, and so their high weight must
be preserved. In bad cases, either from numerical instabilities in the ME code or from
genuine high-probability phase-space configurations, this can lead to single events with
huge weights which create spikes in observable distributions. There is no magic
solution for this, but some heuristics are discussed in Section 7.2.3.
5
In practice, even ‘unweighted’ events tend to apply a low cutoff so they do not waste time on completely
negligible events.
7-8
Practical Collider Physics
7-9
Practical Collider Physics
Figure 7.4. Illustrations of event-generation biasing via jet slices (left) and smooth enhancement (right). In
both cases, the same physical distribution (the blue line) is obtained, but using different approaches to bias
weighting. The slicing method assigns the same weight to all events in each slice, as indicated by the red
stepped distribution, with the number of events decreasing steeply within each slice (the grey-line sawtooth
pattern). The enhancement method by contract assigns a smoothly varying weight to each event as a function
of its lead-jet pT (the grey line), with an event popuation kept relatively flat across most of the spectrum, the
weight corresponding to the ratio of physical/event-count distributions.
7-10
Practical Collider Physics
smoothly weighted events whose number distribution does not approach the
physical one, i.e.
1 dσ 1 dσ
pprop (Φ) = pphys (Φ) ≡ → pprop (Φ) = penh (Φ) ≡ f (Φ) , (7.1)
σ dΦ σ dΦ
for enhancement function f (Φ). This can be thought of as an intentional version of the
proposal/ME mismatch that causes such problems for high-precision event generation,
with the resulting enhancement event weight being wenh = 1/f (Φ), the inverse of the
intentional bias. This way the population of generated events follows the enhanced
penh (Φ) distribution, while the distribution of weights follows the physical pphys (Φ).
The canonical example is again the falling pTlead spectrum, as shown on the right-
hand of Figure 7.4: events are sampled and unweighted according to a biased
distribution such as σ˜ = pˆTm σˆ for m ∼ 4, with each event carrying a weight 1/p̂Tm :
high- pT events are hence smoothly generated more frequently than they ‘should’ be,
with weights reflecting the degree to which they should be devalued to recover physical
spectra. In this case, with well-chosen m, a reasonable event-sample size, with near-
equal event population across the physical spectrum, is possible without the rigmarole
of juggling multiple event-slice samples and slice weights. While technically there is a
loss of statistical power in such weighting (as in the slicing scheme), what really matters
is the shape of the weight distribution within each observable bin: smooth sampling
enhancement achieves this near-optimally, without even the residual sawtooth pattern
of event statistics across the range of simulated scales. For event generators which do
not natively support enhancement functions, the same idea can be used post hoc to
improve sliced generation, by randomly rejecting generated events proportional to a
(pˆT /pˆT,max
i )
m factor: the only difference is that now, rather than weights much less than
unity for high- pT events, each slice contains weights ≫1 for events close to its p̂T,mini .
7-11
Practical Collider Physics
7-12
Practical Collider Physics
7-13
Practical Collider Physics
typically deposit some energy in the detector, which for sensitive detector elements
stimulates an electronic readout, modelled by digitization algorithms.
The ‘gold standard’ in detector simulation is the use of so-called full simulation
tools, in which every particle is individually, and independently, transported though
an explicit model of the detailed detector material map, with Monte Carlo (i.e.
random-number based) algorithms used to simulate continuous energy loss as well
as discrete stochastic processes like the production of secondary particles through
electromagnetic and nuclear interactions with the material. This requires a detailed
model of the detector geometry and material characteristics, as well as a map of the
detector magnetic field in which charged particles curve. These models can be
extremely large and complex for a modern general-purpose detector, meaning that
‘full-sim’ transport of interacting particles is very computationally expensive.
The main full-sim toolkits in collider physics are the Geant and Fluka libraries (and
FluGG, which is ‘Fluka with Geant geometry code’), both written in C++. As the
currently most used transport library, we base our summary here on Geant4. Most
experiments will implement their detector simulation as a custom C++ application,
building the detector geometry from nested volumes constructed from elementary 3D
shapes such as boxes, cylinders and cones, which can be added and subtracted to define
logical volumes corresponding to coherent detector elements. These logical volumes are
linked to material properties used to parametrise interaction processes, as well as being
linked to custom signal-handling functions for sensitive volumes—those in which
energy deposition can trigger a digitization signal.
Considering the complex shapes in e.g. large silicon tracking systems, and convoluted
layers of calorimetry, the resulting complete geometry map for an LHC experiment like
ATLAS or CMS can occupy O(1 GB) in memory. For maintainability, incorporation
of varying run conditions through the detector lifetime, and because misalignment of
coherent subsets of detector components is an important input to evaluating systematic
uncertainties on physics-object reconstruction, the in-memory assembly of detector
geometries is itself typically a complex application, maintained by the experiment’s full-
simulation expert team and linked to the conditions database.
6
And, in Geant4, a special ‘Geantino’ pseudoparticle which interacts with nothing and exists purely for
geometry debugging and timing tests.
7-14
Practical Collider Physics
One crucial detail does need to be included in the ‘copying’ operation, and that is
accounting for the spread of primary interaction-vertex positions due to the finite
widths and lengths of the interacting bunches. The former are usually negligible, on
the scale of 5 μm, but at the LHC the latter are substantial and give rise to a
beamline luminous region of approximately Gaussian probability density in primary
interactions, around 20 cm long. This is also called the beam spot, but is somewhat
larger than the name ‘spot’ might imply.
As the separation of primary vertices along the z-axis is key for the suppression of
pile-up overlays on triggered events, modelling of this longitudinal distribution is
essential in simulation, and is achieved easily by sampling from a Gaussian with
width and mean positional parameters appropriate to the beam conditions in the
run. For MC simulation performed in advance of data-taking runs, with the beam
parameters not yet fully known, it may be useful to model using a larger beam-spot
than in data, to permit reweighting to match the actual conditions; the statistical
wastage through the non-trivial weight distribution means this is not a preferred
mode. As MC generator events are all created with primary vertices at the origin, the
sampled z position of each event’s location can simply be added as an offset to all
vertex positions in the generator event record. Secondary modelling can be added for
the benefit of precision analyses such as W and Z masses and widths, and forward
proton tagging, to account for the fact that the beams have a slight crossing angle at
the interaction point, and that there is a ΔE /E ∼ 1 × 10−5 uncertainty in beam
energy: both these small effects are propagated into the generator event record via
sampling of small transverse and longitudinal momentum boosts, respectively,
applied to all event particles.
In practice, the selection of particle inputs to simulation is a little more complex,
and it is illustrative to spell this out as the exact treatment of different particle species is
of practical importance—particularly in precision measurements where the detector
effects are to be corrected for (see Section 11.3). The usual inputs to detector simulation
are not all final-state particles, but the subset of primary final-state particles with
macroscopic decay constants τ0 . This subset is most usually chosen to match the
condition cτ0 > 10 mm ∼ 3 ps. This definition treats the vast majority of hadrons as
decaying ‘promptly’ within the event generator, without concern about potential
detector interaction, while in particular treating the KS0 & π ± mesons and the Λ0 & Σ±
baryons as ‘generator stable’ and in need of decay treatment in the detector simulation
library. Their decay particles, while reasonably considered ‘final-state’, are identified as
produced by the transport library and are hence difficult to practically distinguish from
secondary particles produced by material interactions. Depending on the definition
required by each analysis, this distinct treatment may require some care by the analyser.
As experimental precision and process scales have increased, it has more recently
become necessary to also include more promptly decaying particle species, such as
b- and c-hadrons in the transport: these may be treated as ‘forced decays’, in which
the more precise simulation of the decay-chain timings and kinematics from the ‘in
vacuum’ event generator are interleaved with B-field bending and material inter-
action from the detector transport. Special custom treatments are increasingly used
for transport of BSM long-lived particles, which may be either electrically charged
7-15
Practical Collider Physics
or neutral, and may interact with detector material for macroscopic distances before
decaying within the detector volume and leaving a displaced vertex or emerging jet
experimental signature—a challenge for reconstruction algorithms.
7-16
Practical Collider Physics
7-17
Practical Collider Physics
The nature of these processes is to emit additional electrons and photons, which are
appended to the particle stack and increase the number of particles to be trans-
ported. In dense material, particularly electromagnetic calorimeter volumes which
are specifically designed to encourage such interactions, the resulting ballooning of
the stack and short step-lengths account for a very large fraction of processing time.
This can necessitate computational optimizations, such as aggregating hits to reduce
output event size, use of fast calorimeter simulation approximations, and use of
approximate but more performant electromagnetic modelling.
Electromagnetic multiple scattering is also applied as a correction to the post-step
position by the calculation of step corrections from tabulated material properties.
Optical photons
Non-ionising radiation is unimportant in many detectors and hence optical photon
handling is omitted from default electromagnetic (EM) transport. Optically sensitive
systems such as Cherenkov ring detectors do require this additional treatment, to
simulate the effects of Rayleigh scattering, refraction, reflection, photon absorption
and the Cherenkov effect.
Hadronic interactions
These are the least well-constrained interaction processes, as the plethora of
potential nuclear collisions and difficulty of their experimental measurement means
they are less empirically determined than EM interactions, and the complexity of the
underlying theory precludes full a priori calculation. Hadronic interactions are also
very sensitive to energy scale, and in detector simulations designed to operate from
cosmic ray energies down to nuclear and medical engineering applications, the
modelling may need to cover O(15) orders of magnitude in interaction scale. As a
result, a patchwork of hadronic interaction models is used at different energy scales.
Hadronic models are a mixture of data-driven parametrisations and theoretically
motivated models. The dominant models in Geant4 are based on nuclear parton-
string models, not so different in motivation from the Lund string model we saw
used for event generator hadronization modelling in Section 3.4.1.
The main two high-energy hadronic interaction models are the Quark–Gluon String and
Fritiof String models, denoted respectively as QGS and FTF in Geant4. These are valid at
7-18
Practical Collider Physics
7-19
Practical Collider Physics
Figure 7.5. The schematic relationship between event-generation ‘truth’, the increased distortions of that truth
induced by detector event-triggering, material interactions, and electronics readout, and the attempt by
reconstruction algorithms to regain the true, particle-level picture of what happened. Fast methods for
emulation of detector effects + reconstruction algorithms based on transfer functions, known as ‘smearing’, are
feasible because of the small—but crucially important—gap between the truth and reconstructed views of each
event.
extremely fast, and in fact acts as a replacement for not simply the simulation but
also digitization and reconstruction stages. The compromise, of course, is that these
parametrised responses are typically limited to one or two key variables, hence fast-
sims lack detailed sensitivity to the specific of each event and its objects. A
prominent example is that unless specific whole-event variables are included in the
parametrisation, the modelled physics-object reconstruction performance will lack
sensitivity to the overall detector occupancy.
In between these extremes lie a variety of semi-fast simulation approaches, in
which the full simulation is replaced by an approximation for certain detector
systems or particle types. Particular emphasis has been placed on calorimeter fast
simulation, given the large fraction of full-simulation CPU spent on high-multi-
plicity calorimeter showers. A first hybrid approach to solving this problem is to use
a shower library of pre-simulated calorimeter showers (or ‘frozen showers’), in which
particles that fall below a certain energy threshold within calorimeter volumes are
immediately replaced with a randomly chosen shower ‘completion’ from the library.
This approach is used, for example, in the ATLAS and CMS forward calorimeters,
where the high occupancy of the detector leads to a disproportionate CPU cost
compared to the physics benefit of that detector system. Even with this replacement,
EM calorimetry dominates the CPU cost of simulation, but a more dynamic
approach is needed in the physics-critical central region of the detector. This is
achieved in ATLAS by the use of parametrised calorimeter-cell responses to the
incidence of different particle species over a wide range of energies: longitudinal and
lateral shower profiles can be reproduced and tuned to data using a fine grid in
η − ϕ, at the cost of losing fluctuations (by use of average responses) and simulation
of punch-through of incompletely contained hadronic showers into the muon system
(where they can register a fake-muon signal). The former, and details such as
shower-splitting by wide-angle pion decays, are incorporated by refinements to the
naïve algorithm. This approach reduces the calorimeter simulation cost by a factor
of 20, and total simulation cost by a factor of 8. More recently, much effort has also
been expended on the use of generative neural networks for fast calorimeter
simulation, which have the potential to further reduce CPU cost and with better
7-20
Practical Collider Physics
7-21
Practical Collider Physics
QCD samples on top of those from the signal process, and necessarily includes the
in-time pile-up and out-of-time pile-up described in Section 6.5.1. The two classes of
pile-up influence the detector occupancy and response in different ways: out-of-time
pile-up can be reduced by fast detector systems, but others such as calorimeter read-
outs may have significant ‘dead periods’ due to activiation by particles from previous
beam-crossings. A correct distribution of pile-up primary vertex positions (partic-
ularly for the in-time overlays) is obtained automatically through treatment of the
beam-spot distributions in every independent event simulation.
An algorithm to implement pile-up overlay needs to model the bunch structure of
the collider by randomly assigning the signal event to a particular bunch-crossing
identifier (BCID) S, giving each BCID i a mean number of in-time inelastic
interactions μi from the run distribution with mean 〈μ〉, and finally randomly
sampling an actual number of interactions from the Poisson distributions to assign a
number of pile-up overlays to each BCID,
i
NPU ∼ P( μi ) − δiS , (7.2)
where the Kronecker-delta term subtracts one pile-up overlay for the bunch-crossing
containing the signal interaction (which ‘used up’ one beam-particle collision). The
digitization takes the resulting sequence of timed hit collections surrounding the
signal BCID to generate the set of RDOs corresponding to the triggered event. This
procedure, of course, produces the pile-up distribution expected before the collider
run: pile-up reweighting of this to match the actual pile-up 〈μ〉 distribution for each
data period is standard practice.
A complexity is added by the need for a library of pre-simulated QCD events to
be used for the pile-up overlay. With typical 〈μ〉 > 50, and the pile-up time window
for some subdetectors containing O(40) bunch-crossings, it is easy to require 2000 or
more pile-up events to digitize a single signal event. In addition, these must have
been simulated with the same beam and detector conditions as the signal event.
While minimum-bias QCD interactions have lower particle multiplicities than most
high-scale signal events, and can be processed using the fast-simulation techniques
described earlier, ‘wasting’ thousands of them per signal event is not tenable. The
approach taken is hence to sample soft-QCD events with replacement from a large
pre-generated library, as the re-use of generic and relatively uninteresting events in a
virtually boundless sequence of random combinations should not bias results. But
not all ‘minimum bias’ events are soft: realistic pile-up simulation also needs to
include secondary scatterings with significantly hard activity—from a generator
hard-QCD process rather than collective soft-inelastic and diffractive modelling—
and these events are sufficiently distinctive that their repetition (albeit at a
proportionately low rate) can cause noticable spikes in simulated-event analyses:
such events are hence mixed into the pile-up event sampling, but are never replaced.
An alternative approach to pile-up modelling, little used at the LHC (except in
heavy-ion events for which minimum-bias modelling is both much poorer and more
computationally expensive) but more common at previous colliders is data overlay: use
of real zero-bias data events rather than simulation to populate the overlay libraries.
7-22
Practical Collider Physics
This guarantees good physical modelling of the pile-up events and their μ distribu-
tions, as the ‘events’ here are really the full detector activity from a whole zero-bias
bunch chain containing thousands of primary interactions, but introduces significant
other technical challenges, such as matching of detector alignments between data and
simulation. As the outputs from data are by definition already digitized, aspects of the
digitization must effectively be inverted to obtain an estimated sum of hits from the
pile-up backgrounds, on to which the simulated signal event is added before (re-)
digitizing (although in practice the combination is performed more directly). This
requires a special detector run-mode for collecting the zero-bias inputs, as calorimeters
in particular suppress below-threshold signals in their normal output, but the sum of
below-threshold hits from signal and pile-up can be above threshold.
The potentially intimate interaction between pile-up overlay methods and
electronic response is illustrated in Figure 7.6, which shows the timing response of
cells in the ATLAS LAr calorimeters to cosmic-ray incidences as a function of time
from the first hit. The rapid rise of the electronic signal at around 100 ns is followed
by a negative sensitivity period around four times longer; the intention of this is that
the integral is zero, suppressing out-of-time pile-up effects. The structure of the
overlaid pile-up hits hence needs to account not just for spatial distribution and a
representative distribution of scattering kinematics, but also the temporal structure
in a rather complex way. Systems such as this cause additional complexity for data-
overlay, and for fast digitization strategies which attempt to perform the pile-up
overlay after single-scatter digitization.
Links between the digitization and the MC truth are propagated through
digitization from the simulated hits, but now introducing ambiguities as the summed
energy deposit in a sensitive detector may be due to several (primary or secondary)
particles, including those from pile-up. This ambiguity is maintained through to
reconstruction, and in general it is impossible to say that a reconstructed MC physics
object ‘is’ a particular MC truth object, although the higher-energy and better-
isolated a direct truth-object is, the greater chance of a clean reconstruction with
only a small fraction of its digit inputs traceable to contaminating activity.
Figure 7.6. Typical timing responses of ATLAS LAr calorimeter cells to cosmic-ray signals, in the hadronic
end-cap (HEC, left) and forward calorimeter (FCal, right), showing the use of pulse shaping for pile-up
suppression. Plot reproduced with permission from Atlas LAr Collaboration 2009 J. Phys. Conf. Ser. 160
012050. Copyright IOP Publishing. All rights reserved.
7-23
Practical Collider Physics
7.4 Reconstruction
We have already dealt with the key ideas of physics-object calibration and recon-
struction techniques in Chapter 6. The computational implementation of reconstruc-
tion repeats the structure and issues of the generation, simulation and digitization steps
discussed in the previous sections, with the distinction that it must be run on both
collider data and MC-simulated inputs. For MC, reconstruction is typically run in
concert with the digitization step, since the intermediate ‘raw’ data is not of sufficient
physics interest to justify the storage requirements. Reconstruction is also re-run more
frequently than the earlier simulation steps, to perform so-called data reprocessing that
accounts for differences in collider and detector conditions compared to expectations,
and the continual development of new and improved object calibrations.
Given the complexity of the detector system and the physics objects in need of
reconstruction, and the insatiable scientific demand for accuracy and robust uncer-
tainty estimates, computational resources of both speed and storage are critical
limitations. Developments in reconstruction are hence focused on improving data
throughput, e.g. by making use of novel computational architectures, and in the case
of MC simulations, determining whether truth-based shortcuts can be taken without
biasing the physics outcomes.
The basic output from the reconstruction step is a very detailed form of event data
including both the reconstructed physics objects, and full information of which
digitization-level information was used to identify them. This event summary data
(ESD) is essential for development of reconstruction algorithms, but is more detailed
than required by physics analyses. A partially reduced format for physics analysis,
the analysis object data (AOD) is hence the main output from reconstruction,
providing analysis-suitable views of all reconstructed physics objects.
7-24
Practical Collider Physics
Skimming
Using a combination of trigger states and reconstructed-object cuts to discard
unpromising events;
Slimming
Dropping of unwanted computed quantities at either detector-hit or reconstruc-
tion level, should the target analyses have no use for them; and
Thinning
Dropping of containers of physics objects in their entirety.
The data-derivation process can take place in several layers, e.g. construction first
of a ‘common DAOD’ from which physics-group DAODs are derived, and finally
are split apart into ‘subgroup’ DAODs, e.g. those focused on different jet- and
lepton-multiplicity samples in W , Z , t , and Higgs studies, or with different MET or
HT requirements for BSM searches. Increasingly, very focused ‘mini-AOD’7 formats
have become increasingly used and published by LHC experiments.
Contrary to the impression of strict reduction of information from layer to layer,
DAOD production may also compute new variables that will be widely used in their
target analyses, e.g. reclustering of large-R jets from small-R ones, whose compu-
tation may require inputs from other variable branches that can then be slimmed
away. The overall aim of this data reduction chain is to minimise CPU-costly
repetition, while retaining an agility within physics groups to provide analysers with
the evolving set of variables needed: many physicists in experiment working groups
spend time and effort on designing and managing these processes.
7
Or micro-, or nano-AODs. We still await pico- and femto-AODs, to current knowledge.
7-25
Practical Collider Physics
will need to be combined. But even in a single-person analysis, it is guaranteed that the
future-you will need to re-run elements of the analysis as bugs are found, collaboration-
review processes ask for extra plots and validations, and plot restylings are requested all
the way through to final journal approval. In short, plan ahead and take the extra
minutes to document and polish your interfaces: it will likely pay off in the long run.
As well as coding style and quality itself, the way in which you stage your data
processing is important in balancing fast reprocessing against the flexibility to change
definitions. Just as the centralised data-processing systems in experiments sequentially
reduce data volumes by the combination of skimming and slimming (and swapping of
raw variables for more refined, analysis-specific ones), this process should continue
within each analysis. Staged data-reduction not only uses resources more efficiently,
but also makes life far easier in the late stages of paper authorship: no-one wants to
have to re-run Grid processing of the full event set in order to tweak binning details, or
to have to re-run slow parameter fits or samplings to change a colour or line-style in a
final histogram: well-designed staging isolates the subsequent, more-frequently iterated,
analysis stages from the heavy-duty number crunching of earlier steps.
A typical data-staging scheme at analysis level will look like:
Pre-selection
Grid-based processing of MC and data DAODs resulting in pre-selected event
ntuples. These are the HEP software name for a set of n variables (i.e. a mathematical
n-tuple) per event: effectively a large spreadsheet of n features in columns, against Nevent
rows. Like earlier event data formats, ntuples are rather large files due to their linear
scaling8 with Nevent —this stage of reduction is likely to reduce many terabytes of
DAOD data to 100 GB–1 TB. While modern software chains tend to use ‘object
persistence’ to map composite in-memory code objects to file, in the analysis stage it is
more likely that ‘flat’ ntuples will be used, i.e. each of the n ‘branches’ in the data tree
will be treated as independent. A concrete example is that each component of a four-
vector will typically be an independent branch, rather than being combined coherently
into a single object: in many cases the first action is to build standard four-vector
objects from the less flexible stored components!
Analysis
Running of final analysis-cut optimisation and application, reducing to a smaller set
of focused ntuples (most useful for assessing correlations or performing unbinned fits)
and/or histograms (in which the per-event information has been lost). This processing is
typically achievable using an institutional computing cluster rather than the distributed
Grid, and will reduce data volumes to the O(GB) level.
Numerical results
Use of the final-analysis data in fits or other manipulations achievable on a single or
few machines via interactive sessions. This may include optimisations such as rebinning
(combination of narrow bins into wider ones with more statistically stable
8
Ntuple formats are usually compressed, using a dynamic compression algorithm such as gzip’s DEFLATE, so
more predictable (lower-entropy) branches may in fact scale better than linear in Nevent . Branches with the
same value through a file essentially disappear in terms of on-disk memory footprint.
7-26
Practical Collider Physics
populations), correlation extraction, fits, and detector unfolding. The outputs will be
typically 1 MB to 100 MB in size, but ideally unstyled at this point.
Presentation
Transformation of the numerical results data into visually parseable form: sets of
1D or 2D histograms, heatmaps, fit-quality or statistical-limit contours, formatted
summary tables, and data formats for long-term preservation, e.g. in the HepData
database. This step is likely to be iterated many times both by the analyser alone as
they attempt to find the best form, and in response to team and review comments: the
choices of colours, line and uncertainty styles, label and legend text, and theory model
comparisons involve a mix of physics and aesthetic tastes. Separating it from the vast
majority of number-crunching means that this step is quick to run—a sanity-saving
analysis life-hack, given how often you’re likely to do it.
Unlike the ROOT-based central processing, analysis teams are typically free to
use whatever tools and formats suit them best for these internal stages, e.g. text-
based CSV or binary HDF5 as favoured by many modern Python-based analysis
toolkits. All these points are as valid for phenomenological studies as in exper-
imental collaborations, and indeed as a phenomenologist you will be far more
responsible for choosing your tools, documentation, and analysis practices.
Practical tips
While analysis is a myriad-faceted and creative endeavour, and we can no more give
a comprehensive guide to performing it than one could explain how to ‘do’ music or
fine art, we end here with a few practical tips for managing analysis code and data
processing, to reduce time and repetitive strain:
1. Manage all your code via a version-control system, and commit often. Being
able to roll back to previous versions is a great reassurance, and allows you
to clean up old scripts rather than archiving them under ad hoc names9;
2. Use meaningful, systematic and consistent naming: of scripts, data files,
directories and subdirectory trees. It really helps if your different classes of
data can be identified (both by human and by code) just from the names of
the files or ntuple branches, rather than needing a look-up table. Systematic
naming and structuring also makes it easier to write secondary code to access
them all equivalently, without having to manage a raft of special-case
exceptions to your general rule. This includes tedious things like capital-
isation: do you really want to have to write out all your branch names by
hand because you used pT rather than pt in one of them?
3. Use relative rather than absolute paths in all your scripts, so you can relocate
your processing to different systems, e.g. from a departmental cluster to a
laptop, or between departments, with minimal pain. At worst, specify the
absolute path to your project data in a single place;
9
Enthusiasts can take this a step further: the rise of featureful code-hosting platforms like GitLab makes it
now easy to set up continuous integration tests to ensure that changes do not break other parts of the
framework, and containerize your code so you can revert not just to a previous collection of source code, but to
an immediately runnable previous setup.
7-27
Practical Collider Physics
Further reading
• A guide to recent MC generator theory and implementations can be found in
the MCnet 2011 ‘General-purpose event generators for LHC physics’ review,
Phys. Rept. 504 145–233, arXiv:1 101.259 9.
• The voxelisation process for detector simulation is described in ‘Recent
Developments and Upgrades to the Geant4 Geometry Modeller’, available
at https://cds.cern.ch/record/1065741. The GeantWeb pages and Physics
Reference Manual, available from https://geant4.web.cern.ch/are an excellent
resource.
• Surveys of the practical challenges in scaling MC event generation to higher
precisions and luminosities can be found in in CERN-LPCC-2020-002,
arXiv:2 004.136 87 and J. Phys. Conf. Ser. 1525 012023 (2020), arXiv:1
908.001 67, and the computational challenges in scaling detector simulation
in Comput. Softw. Big Sci. 5 3, arXiv:2 005.009 49 (2021).
Exercises
7.1 A single top-quark is often represented in an event as a chain of 1 → 1
‘interactions’, reflecting the modification of the ME event first by partonic
recoils against the PS and intrinsic kT of the generator, and by momentum
reshuffling as required to put the event on-shell. Eventually, a decay vertex
will be reached, for the top decaying to Wb. Which top is the ‘right’ one to
7-28
Practical Collider Physics
pick for physics, and does it depend on the application? How would you
design an algorithm to walk the event graph to find your preferred top quark?
7.2 A b-quark may be similarly represented in an event record as having several
stages of recoil absorption, before entering a hadronisation vertex attached
to all colour-charged parts of the event, and emerging as a b-hadron. This
hadron then undergoes a series of decay steps, e.g. B *→Bγ , before under-
going a weak decay that loses its b-flavour. Which object would you be
most interested in from the perspective of calibrating b-tagging perform-
ance? Which are you most interested in from the perspective of a physics
analyser looking for H → bb¯ events? Are they the same object?
7.3 Photons can be produced at many points in an MC event: in the ME or via
ME corrections, in the PS, and in charged-lepton and hadron decays. How
would you write an algorithm to determine if a final-state photon was not
produced in a hadronic decay? How could you distinguish a ‘direct photon’
from the ME from one produced by the PS? Is that a good idea? Is there a
physical difference between ME and PS emissions?
7.4 Calculate the mean transverse displacement of a B0 meson with mean
lifetime τ0 = 1.52 ps, if it has pT = 100 GeV . Remember to include its
Lorentz boost and hence time-dilation factor. How does this displacement
compare to the nominal cτ0 factor? What fraction of such particles
produced perpendicular to the beam will decay further than an inner
tracker layer at 3 cm?
7.5 How many CPU hours would be needed to generate and simulate detector
interactions equivalent to 300 fb for (a) LO 100 mb inelastic minimum-bias
events, at 0.01 s and 70 s per event for generation and simulation,
respectively? (b) 20 nb NLO W+b events at 200 s and 300 s, respectively?
How do these compare to the Grid CPU capacity?
7.6 Simulated pile-up events are generated, simulated, and stored for random
overlay on signal MC events. The chosen MC-generator model has too
hard a track- pT spectrum at low scales: for pT > 5 GeV , the MC model
shape is roughly p(pT ) ∼ (pT − 1 GeV)exp( −1.5pT /GeV) while the data
looks more like pT ∼ pT exp( −1.5pT /GeV). How could rejection sampling
be used to create an MC pile-up event store with the data-like distribution
in this range? Estimate the efficiency of the sampling.
7.7 To minimize data transfer, analysis jobs for the LHC are dispatched to the
Grid sites that store the data, rather than sending the data to the site on
which the job is planned to run. Site problems can make data inaccessible if
only one copy exists, and popular data can cause bottlenecks at a site, but
keeping multiple copies is expensive. How might you distribute the data to
the sites to minimize the analysis pain?
7-29
IOP Publishing
Chapter 8
Data analysis basics
1
Life is rather different at specialist collider experiments, in particular flavour-physics experiments.
8.1 Data-taking
Fundamentally, without the collider and experiment running, there can be no
physics analysis. The experimental data-taking process at a modern collider is
operated by teams of specialists, coordinating larger groups of general collaboration
members on various categories of operations shifts (from control-room activities to
oversight of data-quality and computing). Smooth operation of these aspects is key
to a high data-taking efficiency, as indicated in Figure 8.1, and hence more statistical
power in eventual physics interpretations.
Data-taking operations typically take place in multi-year runs, such as Run 1 of
the LHC from 2009–2012, and Run 2 from 2015–2018. The collider is run for
approximately half of each year within a run, both to allow for short detector-
maintenance periods and to minimise electricity costs, with 24-hour operations
during the data-taking periods. Multi-year ‘long’ shutdowns between the runs permit
more invasive maintenance and upgrading, including lengthy processes such as
warming and re-cooling the accelerator magnet systems.
As runs naturally divide into data-taking periods between shutdowns, it is
common to label these distinctly within the experiment data-processing, particularly
Figure 8.1. Left: integrated luminosity delivered by the LHC, recorded by ATLAS, and passing physics data-
quality requirements during LHC Run 2. Right: mean pile-up rate distributions delivered to ATLAS during
LHC Run 2. Credit: ATLAS luminosity public plots: https://twiki.cern.ch/twiki/bin/view/AtlasPublic/
LuminosityPublicResultsRun2.
8-2
Practical Collider Physics
when e.g. distinct beam or detector conditions have been used. The mean pile-up
rate 〈μ〉, i.e. the number of simultaneous pp collisions per bunch-crossing, has been
the main driver of distinctions between data periods at the LHC—the reason is clear
from the right-hand plot in Figure 8.1, showing the overlapping (and sometimes not
so overlapping) pile-up rates in ATLAS Run 2: to avoid major losses of statistical
power through MC-event reweighting, several different 〈μ〉 distributions were
explicitly simulated to match the real data conditions. A frequently used cross-
check within data analyses is to subdivide the total dataset into different data-period
subsamples with distinct beam conditions, to check for systematic bias from pile-up,
occupancy etc.
Within each run, further subdivisions of events are made. The smallest-scale
division (other than individual events or bunch-crossings) is the luminosity block (or
more commonly ‘lumiblock’) of constant luminosity and stable detector conditions.
Typically, these correspond to approximately one minute of data-taking, and hence
O(1 × 1010) bunch-crossings, each of which gets a unique event number within the
lumiblock from the collider/trigger clock system. The lumiblock is the smallest set of
events that can have a unique set of experimental conditions in the conditions
database, and be hence declared as suitable or not for various types of data analysis
via a good-run list—this encodes, for example, whether particular subdetectors were
in operation, allowing analyses with less stringent requirements to analyse events for
which the detector was in an ‘imperfect’ configuration.
Between the lumiblock and the run, lie the data period (and sometimes sub-
period) divisions. Each data period corresponds to a distinct setup of the beam, e.g.
the instantaneous luminosity and hence mean number of pile-up interactions per
bunch-crossing, or detector. Within each data period, the continual processes of
triggering and offline reconstuction (generally known as reprocessing, even for the
first pass) are performed via on-site computing farms, then distributed to the world-
wide computing Grid. It is from here that analysis—either for object calibration or
for physics measurement—begins in earnest.
8-3
Practical Collider Physics
of energy and quantum number isolated into a few, well-defined and well-separated
partons (including leptons and photons). Notable exceptions to this tend to include
studies of non-perturbative effects such as soft-quantum chromodynamics (QCD),
collective flows in heavy-ion collisions, or hadron spectroscopy—all cases where
single-parton field excitations are not obviously the most appropriate degrees of
freedom. On the whole, this picture is an effective cartoon lens through which to
approach high-scale interactions, where QCD asymptotic freedom makes the
assumption of object independence a workable approximation.
It remains the case, however, that stepping from this hard-process view with a
handful of particles in the final state, to realistic collider events with the added
complexities of underlying event, hadronisation, and particle decays, introduces a
number of unwanted complexities. To reduce realistic events back toward the
Feynman cartoon, we can motivate a series of processing techniques on particle-level
Monte Carlo (MC), which are analogously applied to data. These are what are often
called truth-object definitions, as compared to physics-object definitions in general.
8-4
Practical Collider Physics
This definition can naturally be iterated, starting from a set of final-state particles.
For each particle we start from an assumption that it is direct, and attempt to prove
otherwise: if it is an SM hadron it can immediately be labelled as indirect, otherwise
its parent particle(s) are asked the same question. Should any of the parents (perhaps
through several more layers of recursive questioning) answer that they are indirect,
the original particle must also be considered indirect.
The algorithm has to be refined somewhat to reflect non-standardisation in MC
event records: while decay chains are semi-classical and can be represented as an
unambiguous history of decay branchings of hadrons, leptons, and photons2, the
partonic and hadronization aspects may take many forms depending on the
generator code and its hadronization model: string and cluster hadronization are
naturally represented in very different ways, and multiple-parton interactions can
easily create connections between all colour-charged event structures, meaning that
every QCD particle is connected to every other at the hadronization stage. The
directness-labelling algorithm naturally fails when attempting to handle such
structures, and hence it is standard to block recursion into parton (quark or gluon)
parents. This fits with our physical picture where e.g. photon emission from quarks
in the hard scattering or its quantum-interfering parton-shower extension is to be
considered direct, and distinct from photon emission in hadron decays: in principle, a
perfect detector could resolve the latter’s origin3. Photon emissions from direct or
indirect leptons also naturally inherit their parent’s status.
Jet clustering
Having established a robust set of direct/indirect labels for final-state particles, we
are ready to construct definitions of each of our physics objects. We will start with
the definition of jets, for which the clustering together of many angularly collimated
particles is essential, before moving on to tackle the remaining direct leptons and
photons.
The details of jet algorithms have already been visited both in our overviews of
theory and of reconstruction methods, illustrating how central jet definitions are to
collider analysis at all levels. Note that we ultimately require both a definition of
reconstructed jets (that are obtained from detector-level quantities) and truth-level
jets, that are somehow to be defined from the event record of our Monte Carlo
generators. These definitions are required so that we can answer questions about jet
calibration, amongst other quantities. In terms of truth-level definitions, the
parameters of the clustering algorithm itself, and of any refinements, should match
those used in data. The aspect requiring special care for particle-level MC is which
particles to consider as jet constituents, i.e. the inputs to the clustering. The key
considerations here are directness and visibility, and the exact definition depends on
the experiment and jet reconstruction methods.
2
An awkwardness here is partonically modelled decays, which some event generators write into the hadron-
decay stage of the event record: this is discouraged, and in the cases where it remains extant has be to ‘fixed’ in
event generation frameworks as described in Section 7.2.
3
We will return to the more contentious issue of direct versus ‘fragmentation’ photons in Chapter 11.
8-5
Practical Collider Physics
4
Explicitly identifying and using truth-record neutrinos for physics is a distasteful business, but sometimes
pragmatic: see chapter 11.
8-6
Practical Collider Physics
8-7
Practical Collider Physics
experiment from the beginning has been able to use its full set of subdetectors
for particle-flow object reconstruction.
The reverse mis-indentification is also possible, where a hadron decays
electromagnetically or semileptonically to produce real photons or electrons
which then leave ECAL signatures. The most important examples of this are the
dominant (BR ∼ 98% ) neutral pion π 0 → γγ decays, and their internal-con-
version Dalitz decay relatives, π 0 → γe +e− and π 0 → 2e +2e−. As for the prompt
electron/photon discrimination discussed above, these are considered in e/γ
reconstruction, but methods are not perfect and fake electrons and photons may
be reconstructed where there is significant hadronic activity.
Jet/muon ambiguities: direct ambiguities between jets and muons are naturally
less prevalent than between jets and e/γ, because muons are minimally-ionising
particles and pass through the calorimeters with little energy loss: they are
unlikely to be misreconstructed as jets. But jets faking muons is still a significant
issue, due to leptonic decays of hadrons and punch-through. The latter term
refers to incomplete containment of a hadronic shower in the calorimeters, such
that charged particles ‘leak’ through the hadronic calorimeter (HCAL) into the
muon system and leave a signal used to seed muon reconstruction.
The leptonic decay issue is similar to that for non-prompt electron produc-
tion, but for muon production there is no contribution from π 0 decay as any
muons in that decay would need to be pair-produced, and the pion mass lies
below the 2mμ ∼ 210 MeV threshold. The main route is hence via the charged
pion and kaon weak decays {π +, K +} → μ+ νμ.5 While these decays are relatively
rare in the detector due to the particles’ long lifetimes—both species are usually
treated as stable at generator level—there are a lot of pions and kaons in a
hadron collider environment, and hence the rate of non-prompt muons is
significant. Once produced in a hadronic decay, a decay muon will pass through
the calorimeter and leave a track/hit in the outer muon system.
5
The electron variants of these decays are subject to helicity suppression, a consequence of angular momentum
conservation, the perfect left-handedness of the neutrino, and the chirality of weak interactions.
8-8
Practical Collider Physics
The standard techniques for controlling these errors are isolation and overlap
removal. These are similar, and hence often confused for each other, but they each
use a different set of objects. In the case of isolation, the motivation is to insist that
non-QCD hard objects be well separated from even diffuse QCD activity to reduce
fake rates, while overlap removal is about avoidance of double-counting where two
reconstructed hard objects correspond to the same truth object.
Isolation
Isolation is most usually conducted using a ΔR cone around a hard object such as a
charged lepton or photon, in which hadronic activity is counted. The cone radius
may either be fixed (typically to a moderate size e.g. ΔR ∼ 0.3) or be designed to
shrink with higher candidate momentum between two fixed limits, reflecting the
higher collimation of radiation patterns typical to high-energy particles. The use of
cones is not universal: notably ATLAS measurements of hard-photon final states
have historically used a square isolation region based on a sum over calorimeter cells
in η-ϕ, with a square 4 × 4 cell region around the photon candidate excluded from
the sum.
6
Helicity suppression again means that taus account for the largest of the D+ → τ +ντ branching fractions at
BR = 1.2 × 10−3, while the order is reversed in three-body semileptonic decays such as B − → D0τ −ντ .
8-9
Practical Collider Physics
The latter two sets of objects will typically be restricted to those associated with the
event’s primary vertex; this requirement is not possible for raw calorimeter deposits.
A typical isolation algorithm will hence involve looking in turn at each lepton/
photon candidate and summing the (usually transverse) energies or momenta of
clusters or tracks within the given cone radius, excluding those directly associated to
the candidate. A decision is then made to accept or reject the candidate according to
a maximum threshold of surrounding activity: this may be an absolute or relative
threshold for isolation of candidate i from activity in the surrounding patch Xi, e.g.
The thresholds E Tmax and f max may be functions of the lepton kinematics, most
usually parametrised in {pT , η}. As all combinations of track and calorimeter
isolation variables and absolute and relative thresholds contain some orthogonal
information, it is not unusual for several such variables to be used concurrently.
A remark is due on these isolation sums, because initial-state effects such as pile-
up of final-state particles from multiple coincident primary collisions can necessitate
recalibration: even leptons and photons which are well isolated from all other
activity in their own event may be scattered with minimum-bias event overlays from
the other interactions in the bunch crossing. These effects are exacerbated at hadron
colliders by the presence of in-event QCD initial-state processes such as underlying
event and initial-state jet production: the former has an effect largely uncorrelated
with the hard-scatter objects, much like an extra pile-up collision, while soft initial-
state radiation is genuinely part of the hard-scattering process.
An obvious approach to correct for these—in jet energy and mass separately—is
to subtract the average amount of such background activity that one would expect in
a collision event of this sort. This could be a fixed energy-offset correction in the jet
calibration, corresponding to a fixed average p T density in η–ϕ or y–ϕ space, e.g.
ρ = pT /ΔηΔϕ, multiplied by the geometric area of the jet cone. But this naïve
approach immediately hits two problems:
1. Pile-up and underlying event activity can fluctuate greatly from event-to-
event: a fixed density will frequently, perhaps usually, be significantly wrong
for any given event;
2. The effective jet area in terms of acceptance of soft, background activity is not
really the geometric A = πR2 , but something more complicated and jet-specific.
8-10
Practical Collider Physics
i.e. the fraction of the total ghost-populated area that is pulled into the jet by the
clustering. Such areas can either be computed actively or passively, the labels
distinguishing between whether the ghosts are added en masse at the beginning of the
clustering, or added one at a time at the end: the two definitions are equivalent in the
limit of large Nghost , and also equal to the Voronoi area based on real jet-constituent
areas in the limit of dense events with large numbers of non-ghost jet constituents.
As the computational cost of jet clustering grows with the number of jet constituents
N, scaling at best as N ln N using the optimisations available in FastJet, the Voronoi
area is faster to compute at the cost of less predictable convergence to the asymptotic
value.
Having obtained an area A jjet for each jet, the jet densities ρj = pTj /A jjet can be
jet
computed, and the median density ρmed = median{ρj } obtained. Studies show that in
LHC-style events with high pile-up rate μ, this is a good, and event-specific
representative of the characteristically lower pT density of minimum-bias jets from
pile-up and the underlying event. A jet-specific pT offset ΔpTj = ρmed
jet
A jjet can then be
used to calibrate the isolation-sum offset. Alternative methods such as CMS’ PUPPI
pile-up mitigation scheme are rooted in the same ideas, but attempt to infer per-
constituent pile-up labels by use of particle-flow reconstruction and primary-vertex
association.
Both photon and electron radiation patterns (and to a lesser extent muon ones)
are more nuanced than this picture suggests. Photon isolation in particular is a
theoretically thorny area. We have spoken of using cone isolation methods for both
leptons and photons, but in fact the latter are more troublesome than they at first
appear. Events can be expected to contain both direct photons (produced in the
perturbative partonic process, including QED radiation in parton showers) and
8-11
Practical Collider Physics
where Riγ is the angle between the photon γ and the ith hadron or QCD parton. The
radius-dependent energy scale X(R ) can be any function that goes to zero as R does:
this ensures that the perfectly collinear region where the fragmentation function
contributes is excluded from the sum, while the smooth turn-on of the isolation
tolerance preserves the KLN cancellations. In the original paper, it is argued that the
form
⎛ 1 − cos R ⎞
X( R ) ≡ E γ ⎜ ⎟ (8.5)
⎝ 1 − cos R 0 ⎠
⎛ R ⎞2
≈ E γ⎜ ⎟ (8.6)
⎝ R0 ⎠
7
This critique can be applied also to leptons from hadron decays, but the critical difference is the sheer rate of
fragmentation-photon production: they are the dominant final-state particle at a hadron collider, largely due to
the π 0 → γγ decay.
8-12
Practical Collider Physics
Overlap removal
Overlap removal (OR) is concerned with eliminating not just misreconstructed
prompt physics objects but duplicated ones, due to the sources of ambiguity above,
in particular jets masquerading as charged leptons (through ECAL showers and
leptons from hadron decays) and vice versa. The fact that misidentifications can
occur in both directions, and the desire for an unambiguous identity assignment to a
given hard object leads to a typical overlap-removal algorithm looking something
like the following:
1. For every isolated photon candidate (already disambiguated from isolated
electrons by reconstruction), compute the ΔR distance to every jet. If within
a radius ΔR < R γ , remove the jet.
2. For every electron candidate, compute the ΔR distance to every jet. If within
a radius ΔR < R e and the jet has no b-tag, and the electron accounts for
greater than a defined fraction of the jet pT , pTe /pTJ > feOR , remove the jet.
3. For every muon candidate, compute the ΔR distance to every jet. If within a
radius ΔR < Rμ and the jet has no b-tag, and either the jet has fewer than
NOR tracks or the muon accounts for greater than a defined fraction of the jet
pT , pTμ /pTJ > fμOR, remove the jet.
4. For every tau candidate, compute the ΔR distance to every jet. If within a
radius ΔR < R τ and the jet has no b-tag, and the tau has greater than a
defined fraction of the jet pT , pTτ /pTJ > fτ OR, remove the jet.
5. Finally, for every remaining jet candidate, compute the ΔR distance to every
non-jet object (photon, electron, muon and tau). If any is within ΔR < RJ ,
remove the non-jet.
This example is rather extreme: in practice it’s unusual to have an analysis requiring
all of high-energy photons, charged leptons, taus, and jets! And for the non-jet
objects, the rate of direct backgrounds containing extra electroweak objects is
usually sufficiently low that e.g. checks for unexpected hard taus are unnecessary.
8-13
Practical Collider Physics
8-14
Practical Collider Physics
logistical motivation is to reduce the volume of data on which to perform the most
complex elements of the proposed data analysis.
The number of events of process i selected by an analysis is known as the event
yield. Its expected value, i.e. the mean of nature’s Poisson process of event
generation combined with reconstruction and selection algorithms, is given by the
product
Niexp = σiϵi Ai L int, (8.7)
where σi is the particle-level cross-section, ϵi is the efficiency due to ‘accidental’ event
losses (e.g. from trigger and reconstruction errors), and Ai is the acceptance
efficiency for ‘intentional’ losses due to selection requirements. All these are specific
to process i, which may be either considered as signal or background. L int is the
integrated luminosity of the collider, i.e. the combination of how intense and how
long the relevant data run was. This equation shows the competition between cross-
section and luminosity against ϵ and A selection inefficiencies, determining the yields
of signal and background processes into the analysis event selection. As an analyser,
the cross-sections are given by nature, the available luminosity by the only slightly
less omnipotent collider-operations team, and the efficiency by current trigger and
object-reconstruction calibrations: the art of event selection is then in engineering
the acceptance A to find the best balance between accepting signal and rejecting
background event weights.
8
Note that in general one cannot tell exactly which process produced a particular event, and the question is not
even necessarily well-posed: sufficiently similar final states are irreducible and may even quantum mechanically
interfere such that the event was genuinely produced by a combination of processes. Such is quantum physics.
8-15
Practical Collider Physics
8.4 Observables
In the following sections we summarise some very common collider physics
observables, which are of importance not just in final physics analyses, but also
reconstruction and calibration tasks.
8-16
Practical Collider Physics
8-17
Practical Collider Physics
where the first line is the general definition in terms of system properties, the second
line is a specialisation to a two-object system, and the third a further specialisation to
a lepton–neutrino system (as from a leptonic W decay) in which both objects are
treated as massless. In the second line, mi, E T,i , and p T,i are respectively the mass,
transverse energy, and transverse momentum vector of object i. In the third line the
energies are replaced with the lepton pT and the missing energy, and the dot product
reduces to the term involving the azimuthal angles of the lepton and E Tmiss vectors, ϕℓ
and ϕMET . While clearly less informative than a fully reconstructed dilepton or dijet
mass, mT is often a useful variable for signal/background discrimination, partic-
ularly if the underlying kinematic distributions lend themselves to a clear Jacobian
peak around mT = mW /2—see Section 11.2 for more detail.
A useful generalisation of the transverse mass is the stransverse mass variable,
commonly denoted mT2 . This is designed to target final state events with a pair-
produced particle that decays semi-invisibly (i.e. leading to two weakly-interacting
particles that both contribute to the missing transverse momentum). Imagine that
two of the same new particle with mass mp are produced in an event, and both decay
to a massless visible particle V and an invisible particle χ of mass mχ . The parent
particle mass is bounded from below by the transverse mass
m P2 = m χ2 + 2⎡⎣E TV E Tχ cosh(Δη) − pTV · pTχ ⎤⎦ ⩾ mT. (8.11)
The transverse momentum of the single invisible object from the decay of a given
parent particle is of course unknown. Instead, we can construct an observable based
on a minimisation over the under-constrained kinematic degrees of freedom
associated with the two weakly-interacting particles as
2
m T2 (Mχ ) = χ
min
χ
p T 1 + p T 2 = p Tmiss
{ max [mT2(1), mT2(2)]}. (8.12)
This dense formula requires some explanation, particularly since it is a rather ugly
looking minimisation over a maximisation. m T2(1) is the transverse mass formed
using an assumed value of the mass of the invisible particle mχ, the measured visible
transverse momenta of one of the invisible particles pTV1, and an assumed value for
the transverse momentum of one of the invisible particles pTχ1:
8-18
Practical Collider Physics
where we now take the measured tranverse momentum of the other visible particle,
and assume some value for the transverse momentum of the other invisible
particle. For an assumed value of mχ , called the test mass, and an assumed set
of components for the two invisible transverse momenta, we can calculate these
two invariant masses and work out which is bigger. We can then vary the assumed
values of the components of the invisible transverse momenta (subject to the
constraint that the two transverse momentum vectors sum to the observed missing
tranverse momentum), and find the values of the components that minimise the
maximum of the two transverse masses. This is typically done by numerical
optimisation. This then gives us a value of mT2 which is unique for that event,
based on the assumed value of mχ . Different choices of mχ lead to different
definitions of the mT2 variable.
It can be shown that, with the correct choice of test mass, the mT2 distribution
over all events is bounded from above at the parent mass mp. In addition, if we
choose mχ = 0, the mT2 distribution is bounded from above by (m p2 − m χ2 )/mp. Note
that different types of mT2 variable have been defined in the literature for complex
decay processes, based on which of the visible particles are included in the transverse
mass calculations that enter the formula.
Composite objects built from a subset of the event are not limited to two-object
systems: obvious counterexamples are top-quark decays t → bW (→ {ℓ +νℓ , qq¯}),
three-body decays, or multiboson decays such as H → ZZ → {4ℓ, qq¯ + 2ℓ ,…}. In
these cases, three- or four-body composite systems are natural proxies for the original
decaying object (and indeed we will return to the interesting challenges of top-quark
reconstruction in Sections 11.2.2 and 11.2.3). Note, however, that neither of these
examples is a fundamental 1 → 2 process, but instead compositions of 1 → 2 Feynman
rules. This is a consequence of renormalizability in the SM: one cannot have more
than two fermions connected to a vertex while maintaining dimensionless coupling
constants. The only renormalizable four-point vertices in the SM are g → 3g (but the
massless gluon does not set a natural scale for a three-jet mass) and triboson
production: the latter of course are themselves reconstructed from final-state dilepton
pairs. While off-shell effects break this cartoon picture, it retains enough truth for
placing mass constraints recursively on pairings to be a useful reduction strategy for
better selecting such signals from their combinatoric backgrounds.
Other properties of the few-object composite may also be useful, such as its pT : in the
lowest-order Feynman amplitudes for simple processes such as pp → V (for any decaying
resonance V), the resonance pT is zero: it is only through (mostly QCD) radiative
corrections that a finite transverse momentum appears. If wanting to either select or
eliminate such processes, the momentum properties of the composite may be very useful.
The final set of composite scale variables is those which represent all, or nearly all,
of the event. As such, they are built from all or most of the elementary physics
objects. The most common such object is HT , the scalar sum of pT over all visible
‘hard’ physics objects:
HT = ∑ ∣ pT ∣ . (8.15)
objects
8-19
Practical Collider Physics
The meff symbol suggests that this variable is the ‘effective mass’ of the event, but it
only represents a mass in an illustrative sense—it is more an upper limit on the scale
of momentum transfer to have taken place in the hard process.
8-20
Practical Collider Physics
transverse MET vector point away from any single jet via a ∣Δϕ∣ cut is an effective
‘cleaning’ strategy to improve sample purity.
A subtle issue when working with Δϕ (or raw azimuthal angle ϕ) values is that
they are naturally bounded in [0, π ) or [0, 2π ) (or [−π /2, π /2) or [−π , π )), which
involve irrational numbers. One cannot book a computer histogram with range ends
exactly on irrational values, and close-enough approximations to avoid artefacts or
incomplete coverage9 are inconvenient to report. A good solution is to report such
angles in rationally-bounded forms like Δϕ /π , or e.g. cos ϕ where that is well-
motivated by the physics.
The periodic nature of azimuthal angles can also be troublesome when e.g.
supplying kinematics to a multivariate ‘machine learning’ toolkit, where it is often
implicit that distances between values are computed linearly, but the points at
opposite ends of ϕ and signed-Δϕ ranges are infinitessimally close.
In addition to the transverse angular variable ϕ, it is also useful to consider
longitudinal angular information, in the form of the rapidity or pseudorapidity.
Despite the usual problem of the unknown hadron collider beam-parton boost, use
of the longitudinal component is important in event selection for two reasons.
Firstly, the angular acceptance of a real detector relative to the beam-line is never
perfect: the acceptance of tracking detectors for charged-lepton identification and jet
flavour-tagging, and the high-resolution parts of calorimeters, are typically limited
to ∣η∣ < 3. This imperfection of the detection apparatus needs to be acknowledged in
both search and measurement analysis types. Secondly, asymmetric event rapidity
distributions tend to be associated with low-scale event types without very high-
energy or high-mass objects, as they indicate a combination of large and small
incoming parton x1 and x2 values: one cannot easily produce events with character-
istic scales greater than a few tens of GeV if only one incoming parton is supplying
the energy. This is the reason that the LHCb detector, primarily for the detailed
study of few-GeV hadron production and decay, is a one-armed spectrometer at
high-η, while the general-purpose detectors, more optimised for high-scale measure-
ments and searches, are symmetric and concentrate most subdetector precision
around η = 0.
In typical collider analyses, therefore, a cut to require key objects to have a small
∣η∣ or ∣y∣ value ∼1 or 2 is very common as a proxy for event scale, as well as ensuring
availability of the highest-resolution detector components. Rapidity differences are
also meaningful, particularly as collider pseudorapidities and rapidities are by design
invariant under changes of the unknown beam boost. A particularly striking
motivation for rapidity-difference cuts in event selection is the study of colour-
singlet fusion processes, in which there is no exchange of colour quantum numbers
between the primary incoming partons. In this case, as there is no colour dipole to
radiate, the central region (low-∣η∣) of the detector is expected to be relatively devoid
of activity, the leading-order cartoon picture being spoiled only by subleading
radiative and non-perturbative (e.g. multiple partonic interaction) physics. Searches
9
It is not uncommon to see a ϕ plot that finishes at 3.2, with a systematically low final bin as the [π , 3.2) part
cannot be filled but still contributes to the normalisation.
8-21
Practical Collider Physics
10
Unfortunately there is no standard notation to discriminate between the two definitions, and despite
arguments for the primacy of ‘true’ rapidity y, the pseudorapidity’s direct identification with the angular
position of detector elements means that ΔR ijη ≡ Δηij2 + Δϕij2 is widely found in use under the name ΔR .
8-22
Practical Collider Physics
Precision measurements of IR-safe event shapes were, and remain, key to the
measurement of the running of the strong coupling αS.
Thinking again of event structure by analogy with terms in a multipole expansion,
the zeroth term in the expansion is the average ‘displacement’ of the entire event, i.e.
the momentum balance (or pT balance at a hadron collider),
If the sum here runs over all particles in a perfectly measured event devoid of
invisibles, the balance is trivially zero by momentum conservation; in reality this
delta-function will acquire some width due to acceptance, energy scale calibration,
and resolution effects. This corresponds to the missing energy/missing pT physics
object, which plays a key role in distinguishing between fully resolved events such as
QCD scattering or Z → qq¯ , ℓℓ , against SM or BSM sources of missing energy
ranging from direct neutrinos to WIMPs, dark-sector particles etc.
In analyses, it is also not infrequent, to use a hard-object version of event balance,
e.g. the sum of jets and/or leptons to distinguish between processes in which particle
combinations should be symmetric and those expected to be asymmetric. An
example of this is the use of event hemispheres in lepton colliders, or azimuthal
event partitionings at hadron colliders.
The next set of variables characterise the degree of collimation or spherical
symmetry of the energy flows. We describe these as defined at lepton colliders, where
the unambiguous centre-of-mass frame allows all three spatial directions to be
considered equivalent (up to acceptance effects around the beam direction, to be
corrected): hadron-collider versions may be defined easily by setting all z compo-
nents to zero and hence reducing the dimensionality into the transverse plane. The
‘classic’ event shapes of this type are the thrust and sphericity (or spherocity11). We
begin with the thrust:
∑i ∣n · pi ∣
T = max , (8.18)
∣ n ∣= 1 ∑i ∣pi ∣
in which the unit three-vector n is oriented to maximise the sum of momentum-
projection magnitudes over all particles i. A neat algorithm exists to find exact
solutions from combinations of event-constituent vectors but this is nonetheless
expensive, scaling as O(n3) in the number of constituents. T is constrained to lie
between 1/2 and 1: a large value corresponds to ‘pencil-like’ events in which a large
fraction of energy flow goes forward and backward along the thrust axis, and the 1/2
limit to perfectly isotropic events. As QCD jet production is dominated by a lowest-
order two-parton final state with the two partons perfectly balanced at fixed order,
with three-jet configurations naïvely suppressed by a factor of αS, thrust distributions
11
Some authors draw a nomenclature distinction between the full and transverse forms, but the literature is not
unified in this respect.
8-23
Practical Collider Physics
dσ /dT tend to rise from low values at T ∼ 1/2 to a peak close to 1.12 In collider
physics we feel most comfortable with distributions that have a peak on the left-hand
side of the plot, and hence 1 − T is often used in place of the thrust, frequently with
the same name even though large values of 1 − T are not intuitively ‘thrusty’. The
plane transverse to the thrust vector n may be analysed in the same way, giving an
orthogonal thrust major axis and finally the thrust minor vector perpendicular to the
other two. The thrust projections along these axes, respectively denoted M and m,
are also useful event-shape variables, often combined into the single oblateness
variable, O = M − m.
Our second such variable characterises more or less the same shape information,
but in a different and more convenient way. The generalised sphericity tensor is
defined in terms of the three-momentum vector components pa (for a = 1, 2, 3), as
12
In practice, resummation of multiple emissions gives enhancements by kinematic-logarithm factors which
complicate this simple picture of power-suppressed contributions. They also shift the peak from a divergence at
exactly 1 to a finite peak slightly below. These corrections notwithstanding, the fixed-order expansion is a good
conceptual starting point.
8-24
Practical Collider Physics
with larger values of sphericities are more spherical, and those with smaller values of
aplanarity have their activity closer to exclusively occuring in a plane. The
aplanarity variable in particular is frequently used in full three-dimensional form
in LHC BSM direct search analyses, particularly computed from jets, as a
distinguishing variable between QCD-like and BSM decay-chain kinematics.
This brings us to the final set of event-shape variables, which arose as multi-jet
physics and jet substructure techniques rose in prominence during the LHC era.
Many variables can be found in the literature, but one of particular prominence is
the N-jettiness variable—as a precursor to its more famous direct relative, the
N-subjettiness (which we will encounter shortly). N-jettiness, originally motivated by
QCD resummation calculations for classifying Higgs-candidate events by their jet
multiplicity is defined as
τNd = ∑ min{da(pi ), db(pi ), d1(pi ), ⋯, dN (pi )}, (8.24)
i
where the i sum runs over a set of elementary event constituents with four-momenta
pi, and the da,b(pi ) and dk (pi ) functions are distance measures between pi and four-
vectors representing the beams (a, b) and a set of k ∈ 1…N jet axes, respectively.
These jet axes may be defined using several schemes and the distances may be any
IR-safe measure (i.e. linear in ∣pi ∣), but a not inaccurate characterisation of the
normal behaviour is to pick the set of N jet axes that minimise τN . The astute reader
will note a joint echo of hadron-collider inclusive jet clustering (in the unique
assignment of particles to jets via a distance measure, including beam distances) and
of the thrust event shape (in the choice of event axes by minimising the shape
measure). The same properties that made thrust an IR-safe shape variable with low-
values corresponding to two-jet events make the N-jettiness a powerful variable for
identifying events with N distinct jets, although the nomenclature here takes a sadly
unintuitive turn, with highly n-jetty events acquiring a low τN score.
We will return to and extend the concept of dynamic N-jet measures in the
following section. In conclusion regarding whole-event shapes, we should also
mention that there are many more such ways to characterise event energy flows,
from literal Fourier/wavelet expansions to machine-learning methods such as
energy-flow and particle-flow networks (the latter based on an again familiar-looking
distance measure). And again, these ideas can largely be applied to the internal
structure of jets as well as to whole events—a topic which in recent years has seen
prodigious activity on both the phenomenological and experimental sides, and
which we now address.
8-25
Practical Collider Physics
radius parameters R ∼ 1.0—historically called ‘fat jets’—are typically used for such
studies, applying a boosting heuristic that decay products from a heavy resonance of
mass M undergoing two-body decay will be captured within a single large-R jet cone
if its (transverse) momentum is greater than twice the resonance mass:
M
pTjet ≳ pTboost = . (8.25)
2R
Jet structure observables can also be important for the labelling of normal-width
light jets (without b- or c-tags) as being dominated by quark or gluon amplitudes.
These are usually referred to as quark jets or gluon jets, respectively, although of
course there is no perfect one-to-one correspondence: if nothing else, hadronic jets
are colour singlets and quarks and gluons obviously are not. In fact, it is the
difference in non-zero colour charge of quarks and gluons that enables experimental
distinction between their jets: initiating gluons carry the adjoint colour charge CA = 3
while quarks carry CF = 4/3, and for negligible quark masses it is the magnitude of
this charge that dominates the amount of QCD radiation. Gluon jets hence radiate
more than quark jets, and tend to have more constituents and to be ‘wider’ in η–ϕ.
We start, therefore, by defining a set of simple jet-structure properties: the
constituent multiplicity, width and mass. The multiplicity and mass are familiar.
The former is the number of clustered final-state particles (or frequently, for better
accuracy via tracking, the number of charged particles) in the jet, and the latter is the
invariant mass of the four-vector sum of constituent momenta,
⎛ ⎞2 ⎛ ⎞2
M 2 = ⎜⎜∑ Ei ⎟⎟ − ⎜⎜∑ pi ⎟⎟ (8.26)
⎝ i ⎠ ⎝ i ⎠
where the i index as usual runs over the jet constituents. The jet width is less familiar:
it is defined as the pT -weighted mean ΔR distance of jet constituents from the jet
centroid:
∑i ΔRipTi
W= . (8.27)
∑i pTi
As the width is linear in the constituent momenta, it is an IR-safe observable, with
the intuitive interpretation that larger widths correspond to jets whose energy flow is
spread more widely around the centroid.
In fact, all these measures are examples of a more general class of observables, jet
angularities. The generalised angularity is defined as
λ βκ = ∑ ziκθi β (8.28)
i
⎛ p ⎞κ ⎛ ΔR ⎞ β
⇒ = ∑⎜⎜ jet ⎟⎟ ⎜
T,i i
⎟ , (8.29)
⎝ ⎠
i ⎝ T ⎠
p R
8-26
Practical Collider Physics
where as usual the sum is over jet constituents. Here the zi variable is a fraction of a
jet energy measure and θi is an angular variable: various definitions exist in the
literature, but here they are made concrete in the second line as the ith constituent’s
pT fraction with respect to the containing jet, and the ratio of the constituent ΔR
angle from the jet centroid to the jet radius parameter R. These angularities are
parametrised by the κ and β parameters, whose variation defines the whole 2D
family of angularities, shown in Figure 8.2. The κ = 1 line is particularly important,
as it defines the IRC-safe subfamily of angularities: on this line, β = 1 is the jet width
and β = 2 is monotonic to jet mass; off it, (κ , β ) = (0, 0) is the IR-unsafe constituent
multiplicity. κ values greater than unity place more emphasis on high- pT constitu-
ents, much like the generalised exponent in the usual jet-clustering measure, and
large β values place more emphasis on wide-angle radiation: angularities are hence a
natural basis for characterising QCD radiation around its soft and collinear
singularities.
In fact, angularities are so closely associated with QCD phenomenology that a
leading technique in jet physics is directly motivated by them. The problem with fat
jets is that they contain extra gunk from pile-up and soft radiation that pollutes the
clean substructure that we wish to find. Before we can use fat jets, we must therefore
subject them to a process known as jet grooming, which aims to remove as much of
the gunk as possible. The study of angularities has led to the soft-drop method, one
of many approaches for removing ‘inconvenient’ QCD radiation from a jet and
exposing its hardest components, either from hard QCD splittings or heavy-particle
decays (see also ‘filtering’, ‘trimming’, ‘(modified) mass-drop’, and various other
Figure 8.2. Identification of ‘classic’ jet structure variables with points in the generalised-angularity (κ , β )
plane, highlighting the IR-safe subset of variables along the κ = 1 line. Diagram reproduced with permisson
from Larkoski A, Thaler J and Waalewijn W 2014 JHEP 11 129, arXiv:1408.3122.
8-27
Practical Collider Physics
methods you will find in the literature and the FastJet plugin collection). In soft-
drop, as with other methods, the key idea is to step backward through the jet cluster
sequence, first undoing the last clustering, then the second-last, and so on. At each
declustering step, the two branches with transverse momenta pT,1 & pT,2 , and
angular separation ΔR12 are used to compute the inequality
min{pT,1 , pT,2 } ⎛ ΔR ⎞ β
> zcut⎜ 12 ⎟ . (8.30)
pT,1 + pT,2 ⎝ R ⎠
This is the soft-drop condition. If it evaluates as true, the softer branch is sufficiently
hard and narrow and is retained; if it evaluates false, the softer branch is dropped
from the cluster sequence and the grooming procedure continues to the next
declustering. Note the role of the zcut variable, which sets the minimum pT fraction
needed for a branch to be retained, and the β exponent (cf. angularities) which makes
it easier for collinear branches to survive the cut: this is why it is a soft-drop
procedure: it removes low-energy contaminants while protecting the collinear QCD
singularity. Soft-drop is a powerful (and from a theory perspective analytically
amenable) technique for cleaning residual pile-up, underlying event, and soft QCD
radiation from jets, to more clearly expose their key structures.
We now proceed to jet-shape variables designed for sensitivity to the ‘pronginess’
of a jet’s internal energy flows. Here we find two classes of observable, which as it
happens bear close relation to the N-jettiness and sphericity event shapes. The first
really is N-jettiness, but limited to the subset of final-state particles clustered into the
jet: the N-subjettiness, denoted τN(β ). The β term again represents an angular
exponent, and is often dropped, assuming β = 1. As before, it is a variable which
takes values which asymptotically approach zero as the energy flows within the jet
approach perfect coincidence with a set of N subjet axes. Previously we did not
specify the IR-safe definition of distance to be used, as different measures are
possible, but for N-subjettiness we do not have the complication of beam distances,
and for the distance of constituent four-vector pi to in-jet axis n̂k it is rare to use a
definition other than
pT,i ΔR(pi , nˆ k ) β
d k(β )(pi ) = , (8.31)
∑i pT,i Rβ
where R is the original jet-clustering radius, giving an explicit form for first a fixed-
axis N-subjettiness τ̃N(β ) and finally the operational form in which the axes themselves
are optimised to minimise the measure:
1
τ̃N(β ) = ∑ min{ΔR(pi , nˆ1) β , ⋯, ΔR(pi , nˆN ) β } (8.32)
∑ pT,i Rβ
i i
8-28
Practical Collider Physics
While directly useful in their own right, these observables acquire additional power
when used—in an echo of the likelihood-ratio discriminant between hypotheses—in
the form of an N-subjettiness ratio between different hypothesised N,
(β )
τNM = τN(β )/ τM
(β )
. (8.34)
(β ) (β )
Most commonly, the distinction sought is between adjacent N and M, i.e. τ21 or τ32 .
These are, respectively, used in attempts to distinguish truly two-prong jets with
small τ2(β ) from less structured QCD jets which only achieve small values by chance,
and to discriminate three-prong decay like those of hadronic top-quarks from QCD
and hadronic W and Z decays. Use of soft-drop jet grooming can further improve
this resolving power. The N-subjettiness ratio is a powerful technique but problem-
atic phenomenologically, as fixed-order perturbative QCD calculations diverge
when the τN(β−) 1 denominator approaches zero: they are hence not IR-safe. They
are, however, safely calculable with all-orders resummation, an important practical
relaxation of the IR-safety concept called Sudakov safety.
Our final class of jet substructure variables is energy correlation functions (ECFs),
which as it happens capture some of the features of sphericity to complement
N-subjettiness’s extension of the thrust concept. The core definition is that of the
generalised N-point ECF,
⎛N ⎞⎛ N − 1 N ⎞β
ECF(N , β ) = ∑ ⎜⎜∏ pT,ia ⎟⎟⎜⎜ ∏ ∏ ΔR(ib, ic )⎟⎟ , (8.35)
i1< i 2<⋯< iN ∈ J ⎝ a = 1 ⎠⎝ b = 1 c=b +1 ⎠
where once again all the i terms are jet constituent indices, and β plays a role in
emphasising or de-emphasising wide-angle emissions. ECF(N , β ) = 0 if the jet has
fewer than N particles. Being linear in the pT s of the constituents, ECFs are IRC-
safe, and by comparison with τN(β ), there are now no axes to optimise: the angular
sensitivity is incorporated through the angular differences between every ordered
pair of constituents.
This form, while general, is rather difficult to parse. As with N-subjettinesses, we
are mainly interested in rather low values of N, to probe sensitivity to few-particle
splittings and decays. The more comprehensible explicit forms for N = 1…4 are
ECF(1, β ) = ∑ Ei (8.36)
i
8-29
Practical Collider Physics
The key feature of ECFs is that if there are only N significant energy flows (i.e.
subjets), ECF(N + 1, β ) ≪ ECF(N , β ), and this rapid dependence on the order of
the correlator means that again we can use adjacent-order ratios as discriminants:
ECF(N + 1, β )
rN(β ) = (8.40)
ECF(N , β )
behaves much like N-subjettiness τN(β ), going to small values for jets with N subjets.
Having drawn this analogy between rN(β ) and τN(β ), it is natural to ask whether we can
go further and also use ECFs to build discriminants like τN(β,)N −1: indeed we can, via
the double-ratio
rN(β ) ECF(N + 1, β ) ECF(N − 1, β )
C N(β ) = (β )
= . (8.41)
rN −1 ECF(N , β )2
The C notation is a nod to the fact that these ECF double-ratios are not just
superficially similar to sphericity (a two-point correlator of three-momentum
components), but are an exact generalisation of the C-parameter defined earlier in
terms of sphericity eigenvalues. As with τN(β,)N −1, small values of C N(β ) indicate that a jet
contains N subjets. C2(1) in particular is a provably optimal discriminant of boosted
two-prong decays against QCD backgrounds and is used in boosted-jet searches, as
is the related D2 variable for three-prong versus two-prong discrimination:
r3(β ) ECF(1, β )2
D 2(β ) = · (8.42)
r2(β ) ECF(2, β )
ECF(3, β ) ECF(1, β )3
= . (8.43)
ECF(2, β )3
Further generalisations introduce an extended family of N, M and U correlator
combinations, but these are less used. Energy correlators are amenable to analytic
QCD calculations, and have shown themselves to be a powerful general family of
observables for extracting analysing power from the inner structure of hadronic jets.
13
But beware over-optimizing for features of unrealistic straw-man models, such as BSM simplified models:
see Section 10.1.2.
8-30
Practical Collider Physics
• In a more generic model, the metric could be the expected limit maximisation
across the broad familiy of such models—perhaps, but not necessarily,
expressible as the volume within a higher-dimensional set of exclusion contours;
• In a single-number measurement, the aim is typically to minimise the total
measurement uncertainty;
• In a multi-bin differential measurement, the metric may fall into a nebulous
combination of maximising resolution (i.e. minimising bin sizes), and minimis-
ing uncertainty (perhaps by increasing bin sizes). The competition between
these two can motivate use of different bin-sizes, using smaller ones where event
yields and observable resolutions permit. Measurements may also be motivated
by discriminating between models, in which case a search-like metric of
optimising discrimination/limit-setting power may be appropriate.
The exact balance of expected signal and background yields to maximise analysis
performance will depend on these various priorities, and in turn on the signal- and
background-process cross-sections, the integrated luminosity (since absolute yield
values will enter the Poisson likelihood and discriminant measures presented in
Chapter 5), and on the nature of the systematic uncertainties which (through
nuisance parameters) dilute the statistical power of the results. Unfortunately, in
most situations one cannot reasonably put the whole analysis chain—running over
perhaps hundreds of millions of events, estimating hundreds of uncertainties, and
performing expensive likelihood fits—into a numerical optimisation code to
‘automatically’ find the optimal event-selection cuts: we must usually work instead
using heuristics and judgement.
8-31
Practical Collider Physics
before assessing the tricker effects of systematic mismeasurement (or in the fortunate
limit that they are negligible). Considering also the statistical uncertainty in S, and
combining them in quadrature, the statistic S / S + B is similarly useful.
If the statistics are expected to be large enough that the background yield should
be stable and easily allow statistical resolution of S, we move instead to the
systematics-dominated regime. Here a full evaluation of the a priori uncertainty
(before likelihood profiling), and hence expected significance, will require very
expensive runs over systematic variations of the object reconstruction, and over the
set of theory uncertainties associated with the MC event-generation. But as
systematic uncertainties are fixed relative errors, which unlike statistical relative
errors do not reduce with accumulating luminosity, a shortcut is to look at the ratios
S /ϵB or S /(ϵSS + ϵBB ) ∼ S /ϵ(S + B ), where ϵ is a representative total relative
systematic uncertainty, and ϵS,B a pair of such uncertainties specific to signal or
background event characteristics.
A less simple approach is to use the result given in equation (5.145), that for most
variants of the log likelihood ratio qμ, the median expected significance of a
statistical test based on that LLR is the square-root of its Asimov-dataset value,
i.e. the value obtained when the data are exactly equal to the expected values. For a
single Poisson-distributed event count k, the Asimov LLR is
⎛ P( k = s + b ; λ = s + b ) ⎞
tA = 2 ln ⎜ ⎟ (8.44)
⎝ P( k = s + b ; λ = b ) ⎠
⎡ ⎛s + b⎞ ⎤
= 2⎢(s + b) ln⎜ ⎟ − s⎥, (8.45)
⎣ ⎝ b ⎠ ⎦
Given the explicit probabilistic picture used here, these expressions can be gener-
alised to multi-bin likelihood-ratio tests by summing the LLR contributions from
each bin to form a composite LLR before taking the square-root.
8-32
Practical Collider Physics
The same situation is also natural in object reconstruction The space of these
indefinite values hence contains a set of working points for the algorithm, many of
which might be reasonable. Again, we need a quantitative metric for how to pick the
optimal working point for the analysis’ purposes. When developing a new algorithm,
it may also be interesting to consider how the overall performance of the algorithm
compares to others, across the whole space of reasonable working points.
The key quantities to summarise selection performance are again functions of
event counts from signal and background, but now including the pre-selection
numbers from signal and background as well as the post-selection ones: we denote
these as Stot, Btot and Ssel, Bsel , respectively.
A first pair of important quantities are already quite familiar: the signal and
background efficiencies. The signal efficiency, also known as the true-positive rate
(TPR) is
ϵS = Ssel / Stot, (8.48)
and the background efficiency, also known as false-positive rate (FPR) or fake-rate, is
ϵS = Bsel / Btot. (8.49)
‘Positive’ in this nomenclature means that an event was selected, and ‘true’ or ‘false’
whether it should have been.
The second kind of quantity important for characterising selection performance is
the purity,
Ssel
π= , (8.50)
Ssel + Bsel
i.e. a measure of the extent to which the selected sample is contaminated by
background events which we would rather have discarded. There is a natural tension
between efficiency and purity: one can trivially achieve 100% efficiency by simply
accepting every event; equally, full purity can be approached by tightening cuts until
virtually no events pass. The space of interesting selection working points lies
between these two extremes.
A useful tool for visualising the space of working points that produce these signal
and background yields, and hence efficiency and purity metrics, is the ROC curve—
the name derived from the quasi-military jargon ‘receiver operating characteristic’
and reflecting its origins in wartime operations research. ROC curves come in
several formulations, but the classic (and arguably easiest to remember) form is as a
scatter plot of working points in the 2D space of TPR versus FPR. Very often, the
set of working points plotted in the ROC are drawn from variation of a single
operating parameter and can hence be joined by lines to give the classic ‘curve’. If
the TPR is plotted on the horizontal axis, and the FPR on the vertical, a typical
curve will tend to curve from bottom-left to top-right below the diagonal, reflecting
the trade-offs between high acceptance and high purity. Examples of such ROC
curves are shown in Figure 8.3. The general aim in optimising the choice of working
point on this curve is to pick a point with large TPR and small FPR, i.e. as far to the
8-33
Practical Collider Physics
Figure 8.3. Example ROC curves, both in the TPR versus FPR format described in the text. Left: ATLAS
‘tagging’ of quark-like and gluon-like jet categories using charged-particle multiplicity cuts in various jet- pT
bins. Credit: ATL-PHYS-PUB-2017-009 Right: Discrimination of colour-singlet boosted H → bb¯ decay from
the dominant QCD background using a variety of jet substructure observables. Reproduced with permission
from Buckley et al 2020 SciPost Phys. 9 026, arXiv:2006.10480.
right-hand side and toward the bottom of the plot as possible. As it is unlikely that
both aims can be met simultaneously, the exact ‘best’ working point within this
region requires a quantitative metric beyond informal ‘eyeballing’. Such a metric
will depend on the requirements of the analysis, and the relative rates of its signal
and background event types.
An oft-cited measure of ROC curve performance is the area under the curve or
AUC. This is defined as you might expect, for the TPR/FPR axis layout we have
described. A larger area under this curve indicates a more performant selection
algorithm in general. But note that AUC does not characterise a single working
point, but the whole method: as such, it often does not answer the question of most
interest, namely ‘What specific settings are best for my analysis?’ What is needed is
usually one operating point well-suited to the application, at which point the rest of
the curve is irrelevant.
And indeed the ROC curve cannot fully answer that question, for the simple
reason that it deals entirely in efficiencies for signal and background selection
individually, losing the crucial context of the relative sizes of pre-selection signal and
background, Stot and Btot . As it happens, plotting the signal efficiency against purity
does give a handy visual indication of a relevant metric, since the rectangle between
the origin and any working point on the ROC curve is
S S
π×ϵ= × (8.51)
S+B Stot
S2 1
= × (8.52)
S+B Stot
∝ (S / S + B ) 2 , (8.53)
8-34
Practical Collider Physics
our summary significance from earlier, in the case of dominant statistical uncer-
tainties. This gives a useful rule of thumb for picking a statistics-dominated working
point from an efficiency–purity curve: the point which maximises the area of the
subtended rectangle is a priori interesting. Of course, the effects of systematic
uncertainties must also be considered in a final optimisation.
8.6.1 Backgrounds
The simplest way to estimate backgrounds is to simulate them using MC event
generation and simulation. The MC generator, or perhaps a state-of-the-art parton-
level calculation, will provide an estimate of the total cross-section, and the event
kinematics, detector simulation, reconstruction, and analysis algorithms will furnish
the efficiency and acceptance terms.
But of course this presupposes that the calculations, the phenomenological parts
of event generation, the simulation geometry and material interactions, digitization
and pile-up are all modelled accurately for the phase-space being selected: while the
technology is very impressive, this is a huge demand. It is also an unacceptable
assumption for an experimental measurement: other than processes whose expected
upper cross-section estimate is orders of magnitude below the experimental
resolution, MC estimates must be challenged and refined using in situ data as
much as possible. Even then, a combination of theory and experimental uncertain-
ties will be associated with the background contributions: we will consider these in
the next section.
8-35
Practical Collider Physics
A more complex type of fake is the combinatoric background, which occurs in the
reconstruction of multi-body decays such as top quarks or exclusive hadron decays.
For example, a hadronic top-quark will decay to a b-jet and at least two other,
usually light-quark, jets, but hadron collider events are awash with QCD (also
mostly-light) jets: it is easy to choose a wrong combination of jets which looks
reasonable, but does not correspond to the set of jets involved in the hard interaction
and decay. As hard-process scales increase, the potential for such errors can also
increase as the phase-space for extra QCD jet activity opens up. Combinatoric
backgrounds also occur between non-jet elements of events, such as pairing leptons
with the correct b-jets in dileptonic tt¯ production, or finding correct lepton pairs in
diboson ZZ → 4ℓ events. These problems are reducible in principle by use of further
cuts in the assembly of the parent system, for example:
• invariant-mass requirements in the object selection: an SM top-quark decay
passes through a W resonance, and as such the majority of correct light-jet
pairs will have an invariant mass close to the W pole mass ∼80.4 GeV.
Similar restrictions on lepton-pair invariant masses are standard in Z
reconstruction, and may be either a double-sided window cut requiring
mlow < m inv < mhigh , or a single-sided one: the choice depends on the
dominant source of contamination, and the usual trade-off between purity
and efficiency. The size of the window is limited by the experimental
resolution on m inv .
Note that these cuts bias the selection, perhaps unacceptably. In diboson
events in particular, for example H → ZZ* → 4ℓ , the fully resonant mode
with both Z bosons on-shell may be only part of the desired signal process.
• lepton flavour and charge requirements: if attempting reconstruction of a
rather degenerate system such as dileptonic tt¯ , ZZ → 4ℓ , WW → 2ℓ 2ν etc,
and unable to make tight invariant mass cuts, there is in principle power in
the choice of lepton flavours, and use of charge identification. The former
necessarily involves a loss of events, but with the benefit of great purity: in tt¯
the same-flavour {ee, μμ}+jets+MET events are discarded to give less
ambiguous eμ+jets+MET configurations; ZZ → 4ℓ is knocked down to
2ee 2μμ, and so on.
If losing half your events isn’t an option, identifying the lepton charges can
help in highly degenerate circumstances: there are far fewer pairings in an
event with known 2e +2e− than in a 4e view of the same. But the kicker is that
charge identification has its own inefficiency and mistag problems: another
source of uncertainty to be investigated and propagated. The charge-
identification error rate increases with lepton energy, as it is reliant on track
curvature in the detector B-field, and high- pT tracks asymptotically approach
a charge-agnostic straight line.
• MET cuts to reject backgrounds from e.g. W.
• angular cuts between physics objects, e.g. to select back-to-back systems
compatible with a two-body production process, and reject combinations of
kinematically uncorrelated objects.
8-36
Practical Collider Physics
In this case, the top and W channels are reducible to some extent by MET cuts (the
missing momentum being low in the signal process, as compared to the backgrounds
with direct neutrinos), and the others are amenable to some extent by use of tighter
b-tag and lepton working points. But the combination of several tight cuts can
destroy the signal efficiency, usually below what is acceptable: given large incoming
luminosity, many analyses would rather populate more bins or explore further
pT -distribution tails than discard signal events. So backgrounds are to some extent a
fact of life: how, then, to improve on their MC estimates?
8-37
Practical Collider Physics
8-38
Practical Collider Physics
Figure 8.4. Matrix-method sidebands for background estimation in the 2D ‘ABCD’ configuration, with
‘buffer’ zones between regions for continuous variables (top), and the 16-region A–S configuration used by
ATLAS’ all-hadronic boosted top-quark measurement (bottom). Bottom reproduced with permission from
ATLAS Collaboration 2008 Phys.Rev. D 98 012003, arXiv:1801.02052. Copyright IOP Publishing. All rights
reserved
8-39
Practical Collider Physics
B×C
or D ≈ , (8.55)
A
where in the last term we switch to a simpler notation NI → I . Note that here, the
‘path’ taken from the fully-light A region to the D signal region can either be
A → B → D ⇒ f1 f2 or A → C → D ⇒ f2 f1, which are multiplicatively equivalent.
This method of extrapolating background estimates from fully background-domi-
nated selections into signal-sensitive ones can be extended further by use of more
binary cuts. An interesting, if extreme, example can be found in ATLAS’ Run 2 all-
hadronic boosted tt¯ measurement, in which two large-R jets are used as top-quark
candidates. The signal region requires both such jets to possess both a b-tag and a
top-tag—the latter being a label derived from jet-substructure observables and
designed to favour jets with a clear SM-top-like hadronic decay structure. With two
tags on each jet, there are now a total of 24 = 16 regions, 15 of which are sidebands
to the 4-tag signal region. Labelling the regions (and their estimated background-
event counts) from no-tag A to 4-tag S, as shown in Figure 8.4, gives a naïve ABCD-
like estimator between the first and fourth matrix indices of S ≈ (J × O )/A, but this
assumes an absence of correlation between tagging rates on leading and subleading
jets. These correlations can also be estimated, via alternative paths through the first
three row and columns of the matrix (containing only untagged or incompletely
tagged jets, and hence background-dominated), resulting in a refined four-dimen-
sional matrix estimator of the background in region S,
J × O D/B G /I F /E H /I
S= · · · · (8.56)
A C /A E /A C /A B /A
J × O H × F × D × G × A4
= · . (8.57)
A (B × E × C × I ) 2
Note that the correlation-term numerator consists of the incompletely-tagged
regions in the 2 × 2 submatrix of the second and third matrix columns, and the
denominator from those regions in the first column or row, i.e. with at least one fully
untagged candidate jet.
8.6.2 Uncertainties
The experimentalist Ken Peach noted that ‘the value you measure is given by
Nature, but the error bar is all yours’. As this suggests, the place where an
experimentalist gets to show their skill and ingenuity is in assessing and reducing
the impact of uncertainties. This is a balance: the smaller the error bar (or more
generally, the smaller the covariance entries, or the more tightly localised the
8-40
Practical Collider Physics
likelihood or posterior pdf), the more accurate and powerful the measurement, but
above all it must be a fair representation of the measurement uncertainty.
Underestimation of error is a worse crime than conservative overestimation, as
the former can actively mislead the whole field while the latter simply reduces the
impact of your measurement.
As ingenuity is key to exploration and constraint of uncertainties, we certainly
cannot give a comprehensive guide here. But in the following we will illustrate a few
key techniques for handling both statistical and systematic uncertainties. As for
backgrounds, which are themselves a source of uncertainty, central elements will be
the use of:
• nuisance parameters (such as the background rates) to encode a space of
reasonable statistical and systematic variations;
• and sidebands or similar control regions (CRs) to act as a constraint on the
otherwise unfettered nuisances.
Taking the Poisson likelihood from equations (5.131) and (5.132) as a canonical
measure for data compatibility over a set of signal-region or histogram-bin
populations and model expectations, we can extend it easily to include a set of
control-region measurements:
L(k ; θ = {ϕ , ν}) = ∏ P(ki ; bi (ν ) + si (ϕ , ν )) · ∏ P(mj ; bj (ν )) (8.58)
i j
mj
e−(bi +si )(bi + si ) ki e−bj b j
=∏ · ∏ . (8.59)
i
ki ! j
mj !
8-41
Practical Collider Physics
Statistical uncertainties
Statistical uncertainties are in fact rather easily handled, either by use of discrete
probability distributions, or by estimators derived in their continuum limit.
If passing the measurements to a Poisson-based likelihood calculation or similar,
the pdf (or rather probability mass function (pmf)) ‘automatically’ contains the
statistical variance. You should not have any problem with discreteness in the
observed count: the number of observed events passing cuts and falling into bins or
signal/control regions is always an integer. The same is not generally true of MC
predictions: these contain weights from MC biasing, from scaling the effective
integrated luminosity of their sample to match the observed luminosity, and from
pile-up reweighting and other calibration scale-factors. In most cases, this is fine:
there is no discreteness requirement on the Poisson mean, λ, and that is what MC
estimates are mostly used for. When using weighted MC samples as pseudodata (e.g.
to create an Asimov dataset), it can become necessary to round their continuous
values to integers, creating a small, but unconcerning non-closure in checks to test
that the MC parameters can be re-obtained in a circular fit.
If converting instead to a continuous variable, for example by normalising, a
Gaussian estimator for the statistical error can be obtained via the variance,
standard deviation and standard error on the mean given earlier in Section 5.3.
Statistical uncertainties, unlike most systematic ones, are also amenable to
improvements by widening or merging bins: the statistical jitter of bin values (or
profile means) and random empty bins visible in overbinned plots are a consequence
of a bin’s relative statistical error scaling like 1/ Nfills , and this can be ameliorated by
making underpopulated bins wider. While it is simplest with available tools to ‘book’
histograms with uniform bin widths, and various heuristics exist to pick optimal
uniform bin widths, a more optimal strategy is to apply a binning motivated by the
expected distribution density—this way, each bin will be roughly equally populated,
with an expected relative uncertainty matched to the analysis requirements. A simple
and common approach to variable binning, in particular for rapidly falling
distributions such as the jet pT spectra shown in Figure 8.5 is log-spaced binning,
with Nbin + 1 bin edges uniformly spaced in the logarithm of the x-axis variable.
Such binning optimisations should be tempered by sanity: it is far easier to report
and interpret ‘round-numbered’ bin-edge positions in tables and numeric data
publications, so some human rounding to e.g. the nearest 10 GeV is usually wise
following the numerical optimisation. The same imperative applies to binnings in
irrational-number ranges, most notably azimuthal angles: if reporting e.g. binned Δϕ
values, expressing the angle either in degrees or in fractions of π is far preferable to
having to record cumbersome and inaccurate bin edges like 0, 0.31415, 0.62832, …!
8-42
Practical Collider Physics
Figure 8.5. Illustration of uniform versus log-spaced binning for a leading-jet pT distribution, filled from the
same MC events. The increasing bin widths in the log-spaced version help to compensate for the decreasing
statistical populations (while binning as small as detector resolution permits at low- pT , with fewer bins), while
statistical jitter and empty-bin gaps are evident in the uniformly binned variant.
Bootstrapping: Estimating means and variances is one thing, obtainable from the
set of events populating the analysis bins, but a tougher challenge is estimating the
correlations between those bins. The reason is that the bin values in any measure-
ment are the best estimate of physics quantities, and to be that best estimate the
maximum available number of events needs to be used. But correlations are
computed by comparing multiple datasets and seeing how their elements co-vary:
if we only have a single, all-events dataset, we have no ensemble to compute the
correlations from. How can we get an impression of the statistical distribution from
which our universe drew our single observed dataset, without wasting our precious
number of recorded events? The answer is found in a neat technique called
bootstrapping14.
In the simplest form of bootstrapping, we compute our observable Nreplica times
using sampling of values from our available dataset with replacement, i.e. it is
allowed to sample the same event or entry more than once. This results in several
replica datasets, one from each sampling, which fluctuate relative to one another in
and provide the ensemble from which we can compute covariances and so on. Note
here that there is no assumption that the bin values are event sums: they can have
been transformed, scaled, divided, or normalised, and will still statistically fluctuate
in a representative way. The power of the bootstrap method is that it can accurately
compute the correlated statistical uncertainties between these complex quantities,
such as the negative correlations that must be introduced by a normalisation
procedure—if the integral is fixed and one bin fluctuates upwards, all others must
on average fluctuate down to compensate. The bootstrap technique also accurately
14
From the idiomatic phrase ‘pull yourself up by your bootstraps’, nowadays meaning to improve oneself
without external help. There is also a degree of cynicism embedded: try as you might, pulling up on your shoes
will not actually help to lift you across any obstacle, much as the accuracy of bootstrapping is limited by the
actual observation of only one dataset.
8-43
Practical Collider Physics
Systematic uncertainties
Systematic uncertainties propagate into an analysis from every ingredient that is
imperfectly known, other than the statistical limitations of the data. Sources of
systematic uncertainty occur all the way down the processing chain from event
generation, to collider and detector simulation, to reconstruction and calibration.
This definition is necessarily vague, because some such uncertainties may result
from statistical limitations either in calibration data or in MC samples: nevertheless,
if it can be rendered into the form of a pdf on a nuisance parameter, it can be treated
as a systematic effect. For ease of application, it is usual for these sources of
uncertainty to each be decomposed into a maximally orthogonal set of elementary
nuisances, for example by the covariance-diagonalisation procedure used to derive
parton density Hessian error-sets. Each elementary nuisance is hence a priori
uncorrelated, but as they are likely to influence bin populations and hence data-
compatibility in structured ways, correlations between systematic nuisances and
across measured bin-values are likely to be induced by any fitting procedure: the
corresponding pre-fit and post-fit distributions and errors are often seen in analysis
documentation, with the post-fit version being the best estimate of the true
calibration values and uncertainties specific to the analysis phase-space.
Examples of sources of systematic uncertainty include (but are certainly not
limited to):
Physics modelling: despite best efforts by many theorists and experimentalists,
MC event generation remains imperfect, particularly for processes subject to
large radiative QCD corrections (notably Higgs-production), loop effects (the
tt¯ pT distribution and related quantities have been found to be difficult to model
8-44
Practical Collider Physics
8-45
Practical Collider Physics
8-46
Practical Collider Physics
8-47
Practical Collider Physics
8-48
Practical Collider Physics
Lepton, photon, tau, and jet-flavour identification: all these physics-object types
are identified through relatively complex algorithms—often based on machine-
learning methods—operating on a multi-variable set of reconstruction inputs.
These include ECAL shower-shape discriminators for e/γ discrimination,
counting of charged-particle ‘prongs’ and other jet-shape variables for tau
discrimination from hadronic jets, and a large variety of impact-parameter and
decay-chain characteristics in the case of heavy-flavour tagging. The cuts
implicit in these algorithms are dependent on the modelling used in their
training (for example, the calorimeter response and heavy-hadron production
and decays), and on other reconstruction uncertainties feeding into their data-
driven calibration. These lead to effective systematic uncertainties on the
efficiency and mis-ID rates assumed in the interpretation of the reconstructed
events.
Practically, systematic variations of this sort tend to enter the final stages of a
physics analysis in the form of templates—histograms representing the alternative
histograms obtained using variations of simulation and reconstruction behaviour by
‘1σ ’ shifts away from the nominal behaviour one elementary nuisance at a time. The
effects of changes are often decomposed into independent shape uncertainty and
normalisation uncertainty components, especially for systematics from MC model-
ling, where the expected shapes and normalisations of background processes often
come from different sources.
It is usual for these variations to be provided as a pair of templates h± for each
systematic nuisance νi , corresponding to νi = ±1 deviations from the nominal
configuration ν = {0, 0, … , 0} with its corresponding template histogram h0.
Provided the deviations are small enough, the three template copies of each
histogram can then be interpolated as a function of each νi , permitting a histogram
estimator h̃ to be constructed from the templates for any point in the nuisance-
parameter space, e.g.the piecewise linear
ν (h − h ) for νi ⩾ 0
h˜ (ν ) = h 0 + ∑ − ν i(h + − h 0) for νi < 0
, (8.60)
i i − 0
8-49
Practical Collider Physics
8-50
Practical Collider Physics
considering decays that go through both W and Z decays: not only are there various
e and μ combinations, but also the number of charged leptons can be different,
inducing different sets of systematic uncertainties in each channel. Separating and
comparing the channels can flag problems and permit uncertainty constraints;
Cut loosening/inversion: as for background estimation, techniques such as tag-
requirement inversion (cf. the ABCD/matrix method), and cut and working-
point loosening provide control regions for uncertainty constraints;
Studying isolation variables: jet backgrounds are often key, and are naïvely
poorly constrained due to the relative unreliability of QCD-multijet event MC
modelling. Explicit study of lepton/photon isolation spectra without the
isolation cut, combined with loosening of various cuts, can provide information
on how much residual contamination can enter from misidentification of non-jet
physics objects.
Standard candles: if the analysis itself is not aiming to measure a fundamental SM
parameter, such as an EW coupling or Higgs/W/Z/t mass, the prior knowledge
from earlier experiments of what these values ‘should’ be (assuming they are
known to better precision than available via the current analysis) permits
constraint of systematics biasing the observables away from their standard values.
Pruning systematic uncertainties: practically, fits and scans on systematics suffer
from the usual curse of dimensionality, and may be extremely slow to process,
or fail to converge at all: it is trivially easy to acquire a parameter space
dominated by tens or hundreds of systematic nuisance parameters, but
comprehensiveness can render an analysis computationally impractical.
Ironically, in statistical fits, the largest problems can arise from the least
important parameters, as they create flat directions in the parameter space
which confuse optimization and sampling algorithms.
The solution to both problems is to prune the large list of naïve nuisance
parameters to focus only on those which significantly impact the final results.
This can be done approximately, but usually sufficiently, directly on the
reconstruction-level histogram templates, e.g. retaining only those nuisance
parameters whose 1-σ variations make at least one bin value shift by a relative
or absolute amount (or a conservative combination of both).
Further reading
• Jet clustering algorithms, the pros and cons of various distance measures, and
some simple are discussed in the classic pre-LHC Towards Jetography paper,
Salam G P Eur. Phys. J. C 67 637–86 (2010), arXiv:0906.1833.
• Some interesting discussions on the distinctions between rapidity and pseudor-
apidity, and between missing momentum versus energy are discussed in the
short preprint paper Gallicchio J and Chien Y-T Quit using pseudorapidity,
transverse energy, and massless constituents (2018), arXiv:1802.05356.
8-51
Practical Collider Physics
Exercises
8.1 Show that at η = 0, the pseudorapidity variable behaves as a longitudinal
equivalent to the azimuthal angle ϕ.
8.2 Given a conical jet of radius R and a smooth background activity of pT
density ρ, what are the biases ΔpT and Δm on the jet’s pT and mass,
respectively?
8.3 Derive the phase-space of the ΔR observable, i.e. for pairs of points
randomly distributed on the η–ϕ plane, what is the functional form of the
resulting probability density p(ΔR ), and why does it peak at ΔR = π ? (As
an extension, how would you expect finite η-acceptance to modify the
naïve distribution?)
8.4 An analysis pre-selection yields a pair of 1D signal and background
histograms in an observable x, given in the table to the right. The final
selection is to be defined by a requirement x > xcut , where xcut needs to be
optimised.
x S B
0 0 0
1 2 2
2 4 3
3 3 8
4 2 6
5 2 2
8-52
Practical Collider Physics
8.5 In a results histogram with two bins, the error bars for each bin can be
constructed either by finding a confidence (or credibility) interval on the
marginal likelihood (or posterior) distribution for each bin, or by
projecting the equivalent 2D interval’s contour on to each bin-value, bi.
If the likelihood function L(b1, b2 ; D ) is a two-dimensional Gaussian
distribution, will the marginal or global construction give larger error
bars? [Hint: think about the role of the tails in the 2D distribution when
integrating or projecting.]
8.6 Inclusively simulated W production has a 1 × 10−5 selection efficiency as
the dominant background process in a BSM search requiring high- pT
leptons. If the W cross-section is 44 nb, what is the expected background
yield from 300/fb? Roughly, what is the lowest cross-section for a 20%
efficient signal process that can be excluded at 95% CL? How many
background events would need to be naïvely simulated to perform this
analysis, and how can this be made more tractable than naïvely simulating
all W-boson production and decay events?
8.7 Show that equation (8.47) reduces to equation (8.46) in the case of small
background uncertainties.
8.8 A signal process is calculated to produce an expected yield of 55 events
after analysis cuts, and a mean background yield of 500 events. If there are
no uncertainties on the background prediction, what is the expected
significance of the measurement? If the only uncertainty on the back-
ground-yield estimate comes from MC statistics, how small does the
relative statistical uncertainty need to be to achieve 95% CL significance?
(You may want to make a plot.) How many background events need to be
generated to achieve this, compared with the central expectation of 500?
8.9 Given the suggestion of equal-population as an optimal binning strategy,
how many bins can be used on a 10 000-event sample if 2% statistical
precision is desired in each bin? How will the position of the free bin edges
be related to the cumulative density function (cdf) of the expected
distribution, and hence how is the problem of binning related to inverse-
cdf sampling?
8.10 Suggest good binning schemes in a) lepton azimuthal angle ϕ, b) Δϕ
between the two leading jets in dijet events, c) leading-jet pT in inclusive-jet
events, d) dilepton mass from 70 GeV to 110 GeV, and e) H → bb¯ di-b-jet
mass in the range mbb¯ ∈ 100 GeV to 150 GeV .
8-53
IOP Publishing
Chapter 9
Resonance searches
If we square both sides, we simply get the rest mass of the particle A on the left-hand
side (in natural units), and we get the invariant mass of the two daughter four-
momenta on the right-hand side. Since this is a Lorentz scalar, this is true in any
frame, including that of the LHC detectors, and this gives us a convenient recipe for
searching for A particles at the LHC.
Imagine that our theory of the A particle predicts that it can decay to two photons.
In that case, we can simply take all of the proton collision events that produced pairs of
photons, add the two photon four-momenta together and square the result to get the
invariant mass. Events in which the two photons were produced by an A particle will
have an invariant mass close to the A mass. The reason that it is close to and not
precisely equal to the A mass is due to that fact that quantum mechanics allows A
particles to be produced with a four-momentum that slightly violates the relativistic
invariant mass formula, which broadens the invariant mass distribution of the decay
products. In addition, the four-momenta of the decay products are not reconstructed
with 100% accuracy, which leads to a further broadening of the invariant mass peak.
Both of these effects will be dealt with in more detail below.
By selecting all events with two photons in, we do not only select events that contain
an A particle decaying to two photons. The SM can give us pairs of photons by other
processes (including those with two real photons, and with one or two fake photons),
and these give a smooth continuum of invariant masses with no peaking structure.
Thus, discovering the A particle requires seeing a bump in the diphoton invariant mass
spectrum that is large enough to be clearly visible above the background. The height of
the A bump is related to the product of the A production cross-section and its
branching ratio to photons, and the shape of the bump may also be distorted by
interference between the SM background and the A signal. The definition of clearly
visible must also take account of the statistical and systematic uncertainties on the SM
background, which might conspire to make normal SM processes appear bump-like in
a limited region of the invariant mass spectrum. As with most topics in physics, hunting
for bumps starts off simply, but rapidly becomes a very complicated business indeed.
1
By ‘visible’, we mean either that the particle is directly reconstructed, or it decays to particles that are directly
reconstructed. We therefore do not count direct production of neutrinos.
9-2
Practical Collider Physics
also have more than two SM particles in, and could be arbitrarily complex
depending on the nature of the BSM physics involved.
In the following, we focus on the first two of these options, on the basis that the
techniques we cover could be generalised to the more complicated third case.
Typically, LHC searches for two-body resonances target cases where the new particle
decays to pairs of the same sort of object, for example a pair of muons (which may have
opposite sign), an electron–positron pair, a pair of photons, a pair of b-jets, and so on.
Examples of theoretical models that would produce this behaviour include models
with extra Z-like bosons (commonly called Z′ bosons), and models with extra Higgs
bosons. However, there is no reason not to search for all possible pairs of SM particle
in the final state, since at least one well-motivated extension of the SM can be found for
each case. Different flavour lepton pairs can be produced by the decay of super-
symmetric particles in the case that R-parity is violated. Exotic particles called
leptoquarks are expected to decay to a quark or gluon along with a lepton. If there
are extra W-like bosons in Nature, these might decay to different flavours of quark.
Excited quarks or excited leptons might decay to a photon, W boson, Z boson or Higgs
boson, plus a lepton, whilst excited quarks would decay to a quark or gluon along with
a boson. Vector-like top and bottom quarks can decay to a top quark plus gauge
boson, Higgs boson, quark or gluon. Extra Higgs bosons, meanwhile can produce a
variety of final states, including a Z or W boson plus a photon and another Higgs
boson. For researchers on ATLAS and CMS, finding options that remain uncovered
by existing searches is an excellent way to build and sustain a career.
It might be the case that a resonance is produced that is moving at high speed in the
frame of the ATLAS or CMS detector (frequently called the lab frame). For example,
the resonance might recoil from a jet that is radiated off an incoming parton in the LHC
collision process. In this case, there is a sizable Lorentz boost between the lab frame and
the rest frame of the particle. Thus, the four-momenta of the decay products, which are
well separated in the particle rest frame, may appear collimated in the lab frame, to the
extent that they cease to be reconstructed as distinct objects. We then say that the
resonance is boosted, and we can perform a special type of analysis to try and find
the resonance, using the jet substructure techniques introduced in Section 8.4.5.
9-3
Practical Collider Physics
reduces the background. Nevertheless, there are many details to get right, the first of
which concerns the event selection. It is typical to choose photons that satisfy tight
selection criteria on the shower shapes in the electromagnetic calorimeter, which helps to
reduce backgrounds that produce fake photons. In addition, the analysis can be restricted
to the region where the calorimeter behaviour is best, which in ATLAS means focussing
on the region ∣η∣ < 2.37, excluding the transition region 1.37 < ∣η∣ < 1.52 between the
barrel and endcap calorimeters. Finally, one can impose isolation criteria on the photons,
using combinations of tracker and calorimeter information.
We should note again at this point that we have passed an important milestone in
the journey from theoretical to experimental physics. What a theorist calls a photon
is a gauge boson of the SM. What an experimentalist refers to as a photon carries the
same name as that particle of the SM, but is in fact something quite different; it is an
object that results from applying a set of careful definitions to a series of detector
signals. Changing those definitions would change the observed number of events at
the LHC, the properties of those events, and the relative proportion of different
contributions to the set of events with two photons.
Having decided what to call a photon, we must now decide how to choose the
events that have two photons in. As we learnt in Chapter 6, the only events that are
recorded by the LHC detectors are those that pass some trigger condition, and there
is an extensive trigger menu that contains various conditions on the recorded events.
The ATLAS search utilised a trigger that required events to have one photon with a
tansverse energy ET > 35 GeV , plus another photon with ET > 25 GeV , where the
photon definition at the trigger level differs from the analysis in requiring a relatively
loose selection on the photon shower shapes. In theory, a trigger should pass no
events that do not satisfy this criterion, and pass all events that do satisfy the
criterion. In practice, however, the efficiency of events passing a trigger turns-on
somewhere near the ET threshold, and then plateaus at some final efficiency which is
lower than 100%. The selection of the events for analysis is usually required to be
more stringent than the trigger condition, to ensure that the trigger is maximally
efficient for the final selected events. Measuring the efficiency of the trigger then
becomes an important part of the experimental analysis.
In addition to the diphoton selections, there are standard data quality selections
that are designed to reject data from run periods where a sizable proportion of the
detector was malfunctioning. It is also necessary to apply a minimum selection on
the diphoton invariant mass (m γγ > 150 GeV ), since the effect of the photon ET
selections is to sculpt the low m γγ region, and distort the expected shape. This leads
to an important general consideration for resonance searches: more stringent trigger
requirements will lead to a greater region of the invariant mass that can no longer be
searched for resonances unless alternative techniques are used.
9-4
Practical Collider Physics
distribution of the ATLAS or CMS events that pass those selections, and comparing
to the distribution expected if there are no new particles in Nature beyond those of
the SM. Searching for a signal of any kind involves first developing a detailed model
of the background that is as accurate and precise as possible, and in our present case
that means predicting the shape of the diphoton mass distribution that arises from
each particular background contribution, and also determining the relative con-
tribution of each background source. This model will eventually be fitted to the LHC
data along with a model of the resonant signal we are searching for, allowing us to
extract the relative normalisation of the signal and the background. If the resulting
signal normalisation is a small enough number, we can conclude that there is no
evidence for a signal. If it is large, we have a potential discovery on our hands.
For the newcomer, it is by no means obvious how to model the background, and
the following questions may have already occurred to you:
• How can we identify which processes in the SM can produce two real photons?
A general answer is that we can work out which Feynman diagrams involve
incoming gluons and quarks and end in two photons, and we can then try and
work out which diagrams will contribute most (crudely based on the number
of vertex factors in the diagram, and some rough knowledge of the relative
size of the parton distribution function for the incoming state). Whilst
possible, this is largely unnecessary, since the large number of analyses at
ATLAS and CMS means that almost all final states have already been
investigated. Thus, looking at papers that cover similar final states to that of
interest will already tell you what the dominant background contributions
will be (particularly if you are careful to refer to papers in which the particles
in the final state have similar typical energy and momenta as those you are
interested in). Those who have spent decades analysing collider data can
usually rattle off the dominant SM background processes in a given final state
off the top of their head, a feat which never ceases to terrify students.
• Is there any contribution from backgrounds that have a slightly different final state
than the one of interest? The answer here is always ‘yes’, and you can again refer
to previous work to get a quick estimate of the processes of interest. For
diphoton searches, you need to consider backgrounds that can produce jets that
fake photons, since the small probability of a jet faking a photon is compensated
for by the vastly higher production cross-section for dijet events at the LHC.
For other final states, you might have to consider either fake particles, or real
particles that accidentally fall out of the acceptance for the analysis (by either
passing through a detector region that is outside of the accepted volume, or by
having transverse momenta that fall below the threshold for selection).
Neglecting these possible contributions is a frequent source of error for
theoretical papers proposing new searches at the LHC.
• Having identified a background, how do we predict the invariant mass distribution
expected from that background? There are several possible answers to this
problem, and techniques for doing this are constantly evolving. As we saw in
Chapter 8, one answer is to find the best Monte Carlo generator available for
9-5
Practical Collider Physics
your particular SM process, and run events from that through a detector
simulation to get an invariant mass distribution. This may require a separate
calculation of the production cross-section for the overall normalisation (for
example, if the latter results from a higher-order calculation than that available
in the Monte Carlo generator). We also saw in Chapter 8 that there are a variety
of data-driven techniques for estimating backgrounds directly from the LHC
data, and these are usually preferred when they are available. We will spend a
great deal of this section exploring these techniques and presenting concrete
examples that might prove useful in future work. It is common for different
members of an analysis team to try different methods in order to check their
consistency. The model with the smallest sytematic uncertainty should then be
selected as the nominal method, with the others remaining as cross-checks.
If we examine recent searches for diphoton resonances, we find out quickly that
there is a reducible background that arises from SM events in which one or both of
the reconstructed photon candidates is a fake (e.g. a jet that mimics a photon), and
an irreducible background in which an SM process has produced two real photons.
The irreducible background is comprised of three main processes:
• The Born process: qq → γγ which is of O(e 2 ) in the electromagnetic coupling
constant e.
• The box process: gg → γγ which is of O(e 2gs2 ), in the electromagnetic and
strong coupling constants e and gs. Note that, although this suggests that the
process is suppressed relative to the Born process, this is more than
compensated for by the gluon parton distribution function dominating over
that of the quarks.
• The bremsstrahlung process: qg → qγγ which is of O(e 2gs ).
Photons can also be produced by jet fragmentation, in which case they are still
deemed to be part of the irreducible background (note that this is distinct from a jet
faking a photon). Thus, the production of a photon plus a jet, or the production of
multiple jets with one or more jets producing photons, are both considered to be part
of the irreducible background.
Two distinct approaches to modelling these backgrounds can be pursued, with
different strategies required depending on how far into the tail of the diphoton invariant
mass distribution the search aims to delve. For a search that aims to reach very high
invariant mass values, where few events remain, the background can be modelled using
a combination of Monte Carlo simulation and data-driven methods. For a search that
does not aim as high in the invariant mass, it is possible to fit a functional form to the
background distribution to describe its shape, then use that in the final comparison with
the observed LHC data. We will deal with each of these approaches in turn.
9-6
Practical Collider Physics
where x = m γγ / S and pi are free parameters. This may appear to be a very random
choice of function, but typically functions of varying complexity will be tried, before
the simplest function that gives a good answer is selected. The uncertainty in the shapes
determined from the control samples can be investigated by varying the loose photon
definitions that are used to select the photon candidates that fail the tight criteria.
Having identified the shapes of the different contributions to the background, it
still remains to determine their relative proportion. The composition of the total
diphoton background can be determined using a variety of data-driven techniques,
and we here give two recent examples for inspiration. It is important to realise that
there is no strictly correct answer, but instead different members of an analysis team
will typically try different methods, leaving the one that ended up having the
smallest systematic uncertainty to be selected as the preferred option. The lesser
methods can be then be used to assess the accuracy of the nominal method.
• The 2 × 2 sidebands method: In this method, control regions of the data are
defined by taking the normal signal selection, but playing with part of the
definition of a photon, similar to the techniques that we first described in
Section 8.6.1. For each photon candidate, one can define different photon
selections as: a) the regular signal selection; b) the isolation requirement is not
met instead of being met; c) part of the tight identification requirement is not
met instead of being met (relating to electromagnetic (EM) shower shapes in
the calorimeter); and d) both the isolation and tight requirements are not met.
9-7
Practical Collider Physics
This gives rise to 16 different regions of the data in which two photons
appear, but there are differing selections on the photon candidates (e.g. one
photon matches (a), the other matches (a); one photon matches (a), the other
matches (b), and so on). Now assume that there is some number of diphoton
events Nγγ with two true photons; some number of events Nγj with one photon
and a jet, where the photon has the highest transverse momentum; some
number of events Njγ with one photon and a jet, where the jet has the highest
transverse momentum; and some number of events Njj in which the photon
candidates both result from jets. In addition, we can write the proportion of
true photons that pass the isolation requirement as ϵγisol , which is usually
referred to as the (isolation) selection efficiency, and we can write a similar
efficiency for passing the tight selection requirement. We can also define
quantities that give the proportion of jets that pass the isolation and tight
selection requirements, and these are conventionally referred to as fake rates.
It is assumed that the isolation and tight selection requirements are uncorre-
lated for background events. The number of observed events in each region
can then be written as a series of functions of Nγγ , Nγj , Njγ , Njj, the efficiencies,
the fake rates, and a correlation factor for the isolation of jet pairs. This gives
a system of 16 equations, each of which gives the observed yield in a
particular region. Assuming that the efficiencies for true photons have been
measured previously (which is usually true), the system of equations can be
solved numerically using a χ 2 minimisation method with the observed yields
and true efficiencies supplied as inputs. The method allows the simultaneous
extraction of Nγγ , Nγj , Njγ , Njj, plus the correlation factor and the fake
efficiencies, which ultimately determines the background composition in the
signal region. The method can be applied in bins of the diphoton invariant
mass, which allows the shape of the diphoton invariant mass distribution
from each contribution to be estimated separately.
• The matrix method: In this method, diphoton events where both photons
satisfy the tight selection requirement are classified into four categories
depending on which photons satisfy the isolation requirement: both photons
(PP), only the photon with the highest transverse momentum (PF), only the
photon with the lowest transverse momentum (FP), or neither photon (FF).
One can then write a matrix equation that relates the number of observed
events in data in each of these regions to Nγγ , Nγj , Njγ and Njj, where the
elements of the matrix all involve factors of the isolation efficiencies for
photons and jets. Once these have been measured in a separate analysis, the
sample composition can then be obtained by inverting the matrix equation.
As with the 2 × 2 sideband method, the matrix method can be applied in bins
of kinematic variables, and thus can be used to determine the shape of the
invariant mass distribution arising from each contribution to the data.
Once the background composition has been measured, the template shapes for
the irreducible and irreducible background can be summed with their relevant
proportions to determine the shape of the total diphoton background. The overall
9-8
Practical Collider Physics
normalisation of this total background is left as a free parameter when the shape is
compared to the LHC data using the statistical test described in Section 9.2.4.
Uncertainties in the measured proportion of each background contribution can be
considered as systematic uncertainties on the shape of the total background, in
addition to the shape uncertainties that are unique to each contribution.
9-9
Practical Collider Physics
Detector resolution
Previous experience with diphoton resonance searches had shown that the ATLAS
detector resolution could be described well by a double-sided crystal ball (DSCB)
function, given by
⎧ e−t 2/2 if − αlow ⩽ t ⩽ αhigh
⎪
⎪ 2
e−α low /2
⎪⎡ ⎤nlow
if t < −αlow
⎪ ⎢ αlow ⎛⎜ nlow − α − t⎞⎟⎥
⎪ low
N × ⎨ ⎣ nlow ⎝ αlow ⎠⎦ . (9.4)
⎪ 2
⎪ e−αhigh /2
if t > αhigh
⎪⎡α ⎛n ⎞⎤nhigh
⎪ ⎢ high ⎜ high − αhigh + t⎟⎥
⎪
⎩ ⎢⎣ nhigh ⎝ αhigh ⎠⎥⎦
This function is Gaussian within some range α low to α high , outside of which it follows
a power law. The Gaussian is given in terms of t = (m γγ − μCB )/σCB (with a peak
position μCB and width σCB), and α low and α high are given in units of t. nlow and nhigh
are the exponents of the power law on the low and high mass sides of the Gaussian,
respectively. Finally, N is a parameter that sets the overall normalisation. Note that
this function describes the effects of the finite detector resolution on the diphoton
invariant mass, which is really an effective description of what results from the slight
mismeasurement of each photon four momentum.
There is unfortunately no way to know the parameters of this function a priori.
Furthermore, the parameters of the DSCB function vary with the mass of the
9-10
Practical Collider Physics
Signal lineshape
The signal lineshape for a given model can be calculated by multiplying the analytic
differential cross-section for resonance production and decay to photons by a function
that parameterises the incoming parton luminosity. We will develop one example in
this section, but the basic techniques would easily generalise to other processes.
Consider a spin zero resonance being produced by incoming gluons, which then
decays to photons, as shown in Figure 9.1. Such a resonance is called an s-channel
resonance. Assuming some new physics creates couplings between the scalar, gluons
and photons at energies much higher than the LHC, we can use the effective field
theory techniques of Chapter 4 to write a Lagrangian that is valid at LHC energies:
gS2 g 2 i μνi g ′2
LS = c3 a
GμνG μνaS + c2 W μνW S + c1 BμνB μνS, (9.5)
Λ Λ Λ
where S is the new scalar field, and Gμνa i
, W μν and Bμν are the gluon and electroweak
field strength tensors. These terms are the gauge-invariant terms of lowest mass
dimension that can couple the SM gauge fields to the new scalar. The gi factors are
the various coupling constants of the SM, and the {ci} are free parameters that adjust
the normalisation of each term. Λ is an additional constant (see the exercises). Note
that the Lagrangian is written before electroweak symmetry breaking, so we see the
i
W μν and Bμν fields rather than the gauge boson fields; the coupling to photons will
emerge from the second two terms after electroweak symmetry breaking.
9-11
Practical Collider Physics
2
Note that this is given up to a normalisation factor that can be neglected in the present discussion.
3
At this point, theorists might be confident that this is a relatively straightforward calculation in quantum field
theory, but experimentalists may feel the onset of terror. In fact, such calculations can now be performed by
software codes for matrix element calculations, which substantially lowers the bar for experimentalists, whilst
ensuring that theorists can correct mistakes in their algebra.
9-12
Practical Collider Physics
For a given set of parton distribution functions, the gluon luminosity can be plotted
as a function of q2 (the momentum exchange in the scattering process), and a
functional form can be fitted to the distribution. Past experience has shown that the
function is well described by
⎛ ⎛ q ⎞1/3⎞
10.334
⎛ q ⎞−2.8
Lgg (q ) ∝ ⎜1 −
2 ⎜ ⎟ ⎟ ⎜ ⎟ , (9.12)
⎝ ⎝ 13 000 ⎠ ⎠ ⎝ 13 000 ⎠
9-13
Practical Collider Physics
There is a lot to unpack in this equation. σ is the signal yield, and ν are the
nuisance parameters. Parameters with a hat are chosen to unconditionally maximise
the likelihood L, whilst parameters with a double hat are chosen to maximise the
likelihood in a background-only fit. Hence, the numerator of this quantity expresses
the likelihood of the background-only hypothesis, in which the nuisance parameters
are chosen to maximise the likelihood. The denominator is the likelihood of the
signal hypothesis where both the signal yield and the nuisance parameters are chosen
to maximise the likelihood.
One method for presenting the results of the search is that presented in Figure 9.2,
in which the local p-value for the compatibility of the background-only hypothesis is
shown in the plane of the assumed mass and width of the resonance. As explained in
Chapter 5, local p-values are susceptible to the look-elsewhere effect, in which the
fact that we have looked in many regions of the data can be expected to have
produced a locally significant p value at least somewhere. One way around this is to
estimate a global significance by generating a large number of pseudo-experiments
that assume the background-only hypothesis. For each experiment, you perform a
maximum-likelihood fit with the width, mass and normalisation of the resonance
signal model as free parameters (constrained to be within the search range). The
local value of each pseudo-experiment is computed, and the global significance is
then estimated by comparing the minimum local p-value observed in data to the
distribution derived from the pseudo-experiments.
An alternative approach to presenting results is to fix the assumed width of the
resonance to some assumed value, then display the upper limit, at 95% CL, on the
resonance production cross-section times branching ratio to photons. An example is
shown in Figure 9.3, where the limit has been calculated using the CLs technique
Figure 9.2. An example presentation of resonance search results, consisting of the local p-values for the
compatibility of the data with the background-only hypothesis, in the plane of the assumed mass and width of
the resonance. Reproduced with permission from JHEP 09 001 (2016).
9-14
Practical Collider Physics
introduced in Section 5.7.1. Evidence for a signal in this presentation manifests itself
as a difference between the expected and observed CLs limits.
Figure 9.3. An example of an upper limit on the resonance production cross-section times branching ratio to
photons. Reproduced with permission from JHEP 09 001 (2016).
9-15
Practical Collider Physics
cross-section times branching ratio limits, which are not well-defined in the case of
interference.
A model-specific way to deal with interference effects in resonance searches is to
perform a full Monte Carlo simulation for specific model parameter points that
include interference, and state whether these points are excluded or not by the
statistical analysis in the presentation of the results. This could be presented in the
form of an exclusion limit in a plane of model parameters, similar to the semi-
invisible searches for new physics presented in Chapter 10. A model-independent
approach has recently been proposed in which the shape of the distribution in data is
modelled by the usual background template functional form, plus a suitable sum of
even and odd functions that can span the space of possible interference effects, thus
obviating the need for a precise model of the new physics. A positive result could
then be cast into different models by reproducing the approach of the template
function analysis.
• If a resonance decays to two jets that are reconstructed as separate objects, then
the search involves looking for a bump-like feature in the dijet invariant mass
distribution of two jet events. This is directly analogous to the diphoton case.
However, it might be the case that a resonance decays to SM particles that
themselves decay to produce jets. The boost of the final state partons in the
laboratory frame causes the resulting jets to overlap in the detector, and one must
apply the jet substructure techniques of Section 8.4.5 to search for the resonance.
• Diphoton resonance searches are assisted by the relatively low cross-section
of diphoton production at the LHC, which helps in two ways. Firstly, it
means that the background to the resonance signal is quite small, which
means that the cross-section times branching ratio to photons that can be
probed is relatively small. Secondly, it means that triggering on, and storing,
diphoton events is straightforward, since there are not too many of them to
record even if the transverse momentum of the photons is relatively low. For
dijet events, both of these points change dramatically for the worse.
Practically every event at the LHC is a dijet event (or more generally an
inclusive jet event), with everything else being a tiny correction. This makes
the background for resonance searches huge. It also means, of course, that a
jet trigger is only feasible if we raise the transverse momentum requirement of
the jet (or jets) which, as we learned above, sculpts the low mass region of the
dijet invariant mass distribution in such a way that we cannot safely perform
9-16
Practical Collider Physics
In the following, we sketch a recent CMS dijet resonance search to give a sense of
the techniques that are applicable to dijet searches in general. The first step is to
decide on a trigger strategy, and in our chosen example this involved using two
triggers, and accepting events if they passed either one of them. This amounts to
taking a logical OR of the two sets of trigger conditions. The first trigger simply
required a single jet with pT > 550 GeV , where this threshold can be expected to
change with the operating conditions of the LHC over its history (whilst remaining
fairly stable within a given run period). The second trigger required HT > 1050 GeV ,
where HT is the scalar sum of the jet pT for all jets in an event with ∣η∣ < 3.0. HT is a
useful concept for event analysis at the LHC, although its definition often changes
depending on what one is searching for in an analysis. Having passed the trigger
requirement, events were selected for further analysis if the dijet invariant mass mjj
was greater than 1.5 TeV for jets with an ∣η∣ separation of less than 1.1, or if
mjj > 1.5 TeV for 1.1 < ∣Δη∣ < 2.6. The jets used for this selection were specially-
defined ‘wide jets’, in which jets reconstructed with the anti-kT clustering algorithm
with a distance parameter of 0.4 were merged based on their spatial separation. In
the wide-jet algorithm, the two leading jets in the event are used as seeds, and the
four-vectors of all other jets are added to the nearest leading jet if
ΔR = (Δη)2 + (Δϕ )2 < 1.1, eventually forming two wide jets which are used to
define the dijet system for the calculation of mjj. This procedure mops up gluon
radiation from nearby the final state partons, which improves the dijet mass
resolution. If one were to instead increase the distance parameter used in the
original definition of the jets, this would add extra pile-up and initial state radiation
contributions that are less welcome. The signal region is defined by requiring the two
wide jets to satisfy ∣Δη∣ < 1.1. This is because the background contribution arising
from t-channel dijet events has an angular distribution which is roughly proportional
to 1/[1 − tanh(∣Δη∣/2)]2 , which peaks at large values of ∣Δη < 1.1∣4.
The dominant background for the dijet search is the production of two or more
jets via QCD processes, which is dominated by the t-channel parton exchange
referred to above. The CMS analysis utilised two methods for modelling the dijet
background, the first of which was to fit the invariant mass spectrum with the
functional form
dσ a (1 − x )a1
= 0 a +a ln(x ) (9.15)
dmjj x 2 3
where the ai are free parameters, and x = mjj / s . The second method, which was
found to reduce the systematic uncertainties, involved deriving the background
4
Scattering enthusiasts will recognise this as the same angular distribution as Rutherford scattering!
9-17
Practical Collider Physics
model for the signal region using two control regions in the data. It is thus another
example of a data-driven method. The first control region CRmiddle requires the
difference in η between the two wide jets to satisfy 1.1 < ∣Δη∣ < 1.5, whilst the second
control region CRhigh requires 1.5 < ∣Δη∣ < 2.6.
To get the shape of the dijet invariant mass distribution in the signal region, the
number of events observed in data in each bin of the mjj distribution in CRhigh is
multiplied by an appropriate ‘transfer factor’. For a given bin, the transfer factor is
defined using a leading-order Monte Carlo simulation of QCD dijet production,
coupled to a suitable simulation of the CMS detector. The predicted number of
events in a given bin i of the mjj distribution in the signal region, Nipred is given by:
NiSR,sim
Nipred = K × CRhigh,sim (9.16)
Ni
CR
where NiSR,sim and Ni high are the simulated yields in the corresponding bins of the
mjj distributions in the SR and CRhigh region. K is a factor which corrects for the
absence of higher-order QCD and EW effects in the Monte Carlo generator,
in addition to experimental systematic effects. This itself can be derived from
data using the second control region CRmiddle , in which the wide jets have a more
similar η separation to the signal region than those in CRhigh . First, we define
CR
R = NiCRmiddle /Ni high , which can be evaluated using either data events or background
events, and which expresses the ratio of the number of events in a given bin i of the
mjj distribution in the CRmiddle region with the number in the corresponding bin in
the CRhigh region. We then define K as
R data
K= = a + b × (mjj / s )4 (9.17)
Rsim
where a and b are free parameters. In other words, we assume that the factor which
corrects the simulation to the data can be parameterised by the expression on the
right, and we must extract a and b from a fit to the data. a and b are in fact included
as nuisance parameters in the final comparison of the resonance signal and
background models to the observed CMS data.
Signal models are harder to generate than for the diphoton case, since the width
of the expected lineshape depends on whether the new resonance decays to a pair of
quarks (or a quark and an anti-quark), a quark and a gluon, or a pair of gluons. The
simplest way to obtain signal predictions for the dijet invariant mass distribution is
to perform a Monte Carlo simulation of suitable physics models, from which one
finds that the predicted lineshapes for narrow resonances have Gaussian cores from
the jet energy resolution, and tails towards lower mass values from QCD radiation.
Having defined the signal and background models, the extraction of limits on the
cross-section times branching ratio to jets as a function of the resonance mass can
follow a similar treatment to that defined in the diphoton case. Note that, in the case
of the data-driven background estimate, the fit of the signal and background model
to data can be performed simultaneously in both the signal and CRhigh regions, thus
accounting for possible signal contamination of the CRhigh region.
9-18
Practical Collider Physics
Further reading
• An excellent recent summary of resonance searches at the LHC is given in
arXiv:1907.06659, which includes a summary of the theoretical origin of
different resonances, plus the results of searches at the second run of the
LHC.
• The ATLAS diphoton search that we worked through is based on JHEP 09
001 (2016), arXiv:1606.03833. An excellent resource for understanding this
search in detail is Yee Chinn Yapʼs PhD thesis, available at http://cds.cern.ch/
record/2252531/files/?ln=el.
• A model-independent approach for incorporating interference effects in
resonance searches can be found in Frixione et al 2020 Model-independent
approach for incorporating interference effects in collider searches for new
resonances Eur. Phys. J. C 80 1174 (2020).
• Our dijet resonance search example is based on CMS-PAS-EXO-17-026.
Exercises
9.1 Draw the Feynman diagrams for the three main SM processes that produce
two photons in the final state.
9.2 Write the matrix equation that expresses the number of events in each
region defined by the matrix method in terms of the quantities Nγγ , Nγj , Njγ ,
and Njj.
9.3 Explain what the normalisation constant Λ in equation (9.5) represents, and
why it is necessary.
9.4 By browsing the literature or otherwise, write down the main backgrounds
for a WW resonance search in the fully-hadronic final state.
9.5 As explained in the text, a common deficiency of dijet resonance searches is
that jet triggers often require a large jet transverse momentum for the
leading jet in each event.
(a) Explain why this makes it impossible to search for resonances at low
invariant mass.
(b) How might initial state radiation be used to circumvent this
problem?
9.6 Plots such as that shown in Figure 9.3 often lead to confusion in the high
energy physics community.
(a) Which feature of Figure 9.3 indicates the possible existence of a
signal in the data?
(b) What is the likely mass of the resonance if this signal were true?
(c) Would the same conclusion automatically apply to a resonance with
a relative width of 10%? Give a reason for your answer.
9-19
IOP Publishing
Chapter 10
Semi-invisible particle searches
We have now described how searches for new particles at the LHC can be performed
if we see all of the decay products of the particles. Things become more complicated
if the new particles decay semi-invisibly, which will generically occur if one or more
of the decay products are weakly-interacting, and hence exit the detector without
trace. This will happen if the new particles decay to one or more neutrinos of the
Standard Model (SM), or if they decay to one or more new particles that are
themselves unable to interact with the detector. It is important to note that we are
using ‘weakly-interacting’ in a colloquial sense, meaning that the particle might
interact through a new force that has a small coupling constant, rather than
necessarily interacting through the weak interaction of the SM.
In this chapter, we will describe techniques for dealing with semi-invisible particle
decays in LHC searches, and we will see that a variety of strategies can be employed
to infer the presence of new invisible particles, and even to measure their masses and
couplings in some cases. Semi-invisible particle decays are a generic feature of any
model that predicts a dark matter candidate in the form of a weakly-interacting
massive particle, and are a particularly important feature of supersymmetry
searches.
1
An exception would occur if we wanted to unambiguously model interference between the hypothetical signal
and the background, which is rarely done.
Faced with this mass of models, we should immediately ask if there is anything
that can be said in general about models that include semi-invisible particle decays.
Thankfully, there is: all events with invisible particles in will contain missing
transverse energy from the escaping particles. We can therefore frame semi-invisible
particle searches from the outset as searches in the final state ‘missing transverse
energy plus something else’. We may then start to classify what the ‘something else’
might be, as follows.
It might be that our model predicts a single new weakly-interacting particle, that
interacts with quarks or gluons such that it can be made directly in proton–proton
collisions. For example, the simplest approach to dark matter is to hypothesise that
there is a single new particle responsible for it, and that its interactions with SM
fields can be described either by an effective field theory with a contact interaction,
or by a theory with an explicit mediator between the SM fields and the new dark
matter particle. In such models, the new dark matter particle will typically be
produced in pairs, and if only a pair of dark matter particles is produced, there is
nothing for the detector to trigger on in the event. Instead, one must search for
events in which a parton was radiated from the initial state, which gives rise to events
with a single jet, plus missing transverse energy. Other possibilities include events
with a single photon, Higgs boson or weak gauge boson plus missing transverse
energy.
It might instead be the case that there is a plethora of new particles in Nature, and
the particles that are produced by the proton–proton collisions are not the invisible
particles themselves. A classic example is that of supersymmetry: here one would
expect to produce the coloured sparticles at a much higher rate than the weakly-
coupled sparticles (if their masses are similar), and hence one should generically
produce squarks and gluinos. These could then decay to supersymmetric particles
which can themselves decay further, producing what is known as a cascade decay,
culminating in the release of various SM decay products, plus one lightest neutralino
for each sparticle that was produced. The lightest neutralinos generate anomalous
missing transverse energy, whilst the SM decay products can give rise to high
multiplicity final states that are very rare in the SM.
There is a very important difference between these two types of scenario. In the
second case, as we shall see below, it turns out that we can form various invariant
masses from the SM decay products, and these allow us to infer something about the
masses of the particles involved in the cascade decay chain. In the case where we
directly produce the weakly-interacting particle, it is much harder to access the
properties of the particle. Nevertheless, how we actually go about discovering either
of these scenarios looks remarkably similar, and we can break the process down into
the following steps:
10-2
Practical Collider Physics
Each of these short descriptions hides myriad complexities, and it should not
surprise you to learn that the precise details of how to implement these steps depend
on the model being searched for. We will expand our knowledge of each of these
steps in turn, using examples from past ATLAS and CMS searches.
10-3
Practical Collider Physics
easier to organise efforts across hundreds of researchers if their work can be grouped
into similar signatures, allowing methods to be easily shared amongst search teams
that are looking for similar final states. Although there are techniques on the horizon
that would be more ambitious in their combination of information across final states
(e.g. using unsupervised machine learning), we will assume at this point that the
dominant paradigm will persist for some years.
Having decided that we need to select a particular final state, how do we go about
choosing it? In models with multiple particles, we can proceed by identifying the
particles that have the highest production cross-section, and then choosing the final
state that results from the highest decay branching ratio for those particular
particles. A classic example is given by supersymmetry, in which the LHC
production cross-section is dominated by squark and gluino production if these
sparticles are light in mass. The simplest squark decay produces a jet and a lightest
neutralino, with more complex decays featuring more jets and/or leptons. The
simplest gluino decay produces two jets and a lightest neutralino, with more complex
decays also producing more jets and/or leptons. On average, we would expect to see
jets more often than leptons in these events. One can therefore assume that there will
be a high combined branching ratio for producing events with a few jets plus missing
transverse energy, and there will be reasonably high branching ratios for events
containing several jets plus missing transvrse energy. The flagship searches for
supersymmetry at the ATLAS and CMS experiments thus focus on final states with
two jets plus missing energy, three jets plus missing energy, and so on.
Having identified our final state, we must also consider the potential SM
backgrounds to see if our search is really going to be viable. Given that the LHC
search programme has been developed for well over a decade now, the simplest way
to identify the backgrounds in a given final state is to read existing LHC search
papers that target a similar signature. Nevertheless, we can provide some general
principles for identifying backgrounds if you are either going beyond the existing
literature, or do not want to assume existing knowledge.
First, we can look up the production cross-sections of a large range of SM
processes, such as those shown in Figure 4.1 of Chapter 4. Next, we have to work
out which of those processes can possibly mimic the final state of our hypothetical
signal process, and their relative production cross-sections multiplied by the relevant
branching ratios for producing particular final states then give us a first idea of the
relative proportion of each of these backgrounds. It must be stressed, however, that
this is by no means the final answer. The imposition of kinematic selections will have
different effects on each of these background contributions, so that the dominant
background in the final analysis may not be that with the highest production cross-
section. It can also happen that SM processes that do not produce the final state of
interest at the level of their lowest-order Feynman diagram can still form part of the
background for a search if some of their particles fall out of the acceptance of the
final state (e.g. due to some particles having a low transverse momentum), or if
initial state radiation increases the object multiplicity.
In Figure 4.1, the total LHC production cross-section is many orders of
magnitude higher than the highest individual cross-section shown, which is that of
10-4
Practical Collider Physics
10-5
Practical Collider Physics
10-6
Practical Collider Physics
Figure 10.1. An example of a simplified model used for the optimisation of ATLAS and CMS supersymmetry
searches. It is assumed that neutralinos and charginos are pair-produced, each decaying only to an electroweak
gauge boson and a lightest neutralino. This results in final states with leptons, jets and missing energy.
10-7
Practical Collider Physics
over-optimisation on such models removes sensitivity to any models that differ from
the underlying assumptions, and we must always strive to invent better search
techniques that are less model-dependent.
ETmiss : Models with extra invisible particles in, or even anomalous production of
SM neutrinos, would be expected to give a different distribution of E Tmiss than
that generated by the SM backgrounds within a specific final state. For heavy
supersymmetric particles decaying to lightest neutralinos, the E Tmiss distribu-
tion is considerably broader than the backgrounds, making E Tmiss a very
effective discriminant.
10-8
Practical Collider Physics
There is an entire industry devoted to inventing additional new variables for semi-
invisible particle searches, and a few of the most popular are:
E Tmiss E Tmiss
• and m eff
: Both of these variables are attempts to normalise the E Tmiss to
HT
some meaningful scale of the total hadronic activity in the event (assuming
that HT and meff are defined using jets only). They often provide extra
discrimination between multijet backgrounds and signal processes, beyond
the use of E Tmiss , HT and meff alone. When to use either is usually a case of
E Tmiss E Tmiss
trial-and-error, but it has been found that outperforms m eff
for final
HT
states with a low number of jets.
• Δϕ(obj, pTmiss ): This gives the polar angle between the direction of an object,
and pTmiss . In SM multijet events, pTmiss often arises from jet mismeasurements,
in which case the jets with the highest transverse momenta in the events are
often closely aligned with the direction of pTmiss . In searches for new physics, it
is thus common to place a lower bound on Δϕ(jet, pTmiss ) for some number of
the highest- pT jets.
• αT : The αT variable is designed to characterise how close an event is to being a
dijet event. This is particularly useful in semi-invisible searches in purely
10-9
Practical Collider Physics
hadronic final states, since the dijet background is so large. First, all jets in the
event are combined into two pseudo-jets in such a way that the difference in
the E T of the two jets is minimised. Then αT is defined as:
E T2
αT = , (10.1)
2pT1 pT2 (1 − cos ϕ12 )
where E Ti is the transverse energy of the i’th pseudojet, pTi is the transverse
momentum of the i’th pseudojet, and ϕ12 is the polar angle between the two
pseudojets. For a perfectly measured dijet event, αT = 0.5, and it is less than
0.5 if the jets are mismeasured. Events with αT > 0.5 either have genuine
missing transverse energy (and are thus likely to be signal-like), or they are
multijet events where some jets have fallen below the pT threshold for the
analysis. In practice, the αT distribution for multijet events is very steeply-
falling at αT = 0.5.
• mll: For events with two opposite-sign, same-flavour leptons in, it is
straightforward to reduce the SM Z boson background by placing a selection
on the dilepton invariant mass that excludes the Z peak.
• mCT : This is another example of a variable whose distribution has a well-
defined endpoint that is a function of the masses of the particles that we are
observing. Imagine that a particular new particle δ can decay to two particles
α and ν, where α is invisible, and ν is visible (i.e. a SM particle that interacts
with the ATLAS and CMS detectors). The visible particles ν1 and ν2 are
assumed to have masses m(ν1) and m(ν2 ), and we measure their four-
momentum in some frame to be p(ν1) and p(ν2 ). Their invariant mass,
formed from p(ν1) + p(ν2 ) is invariant under any Lorentz boost that is
applied to the two visible particle four-momenta simultaneously. However,
we can also construct quantities which are invariant under equal and opposite
boosts of the two visible particle four-momenta. That is, when we boost one
particle by a particular boost, and we apply the equal and opposite boost to
the other particle. It turns out that the following variable is invariant under
these ‘back-to-back’ or ‘contra-linear’ boosts:
m C2(ν1, ν2 ) = m 2(ν1) + m 2(ν2 ) + 2[E (ν1)E (ν2 ) + p(v1) · p (v2 )] (10.2)
This is very similar to the normal invariant mass formula, but with a plus sign
instead of a minus sign on the three-momenta term. To give an example of
where a contra-linear boost might be useful, imagine that the two particles ν1
and ν2 are produced by separate decays of δ particles that are pair-produced in
an event. We will call these two particles δ1 and δ2 , where the subscript is
merely intended to distinguish the fact that two particles were produced. Then
if we start in the δ1δ2 centre-of-mass frame, contra-linear boosts would be the
boosts that would be required to put us in the δ1 mass frame and the δ2 mass
frame. Then this formula would be applied after the boosts by taking the ν1
four-momentum from the δ1 rest frame, and taking the ν2 four-momentum
from the δ2 rest frame. mC would be useful if the δ1δ2 rest frame was the same
10-10
Practical Collider Physics
In the special case that m(ν1) = m(ν2 ) = 0 and there is no upstream momen-
tum boost of the δ particles (from e.g. initial state radiation), the maximum
value of mCT is given by:
max m 2 (δ ) − m 2 (α )
m CT = (10.4)
m(δ )
Various formulations exist in the literature for correcting mCT to be more
robust under the effects of initial state radiation.
• In models such as supersymmetry, particles can undergo very complicated
cascade decay chains, such as the squark decay
∓
q˜ → χ˜20 → qℓ ±ℓ˜ →qℓ ±ℓ ∓χ˜10 , (10.5)
⎧ ⎡ ˜ ˜ ˜ ˜ )(ψ˜ − ℓ˜ ) ⎤
˜ ˜ − ψχ
⎪ max ⎢ (q˜ − ψ˜ )(ψ˜ − χ˜ ) , (q˜ − ℓ )(ℓ − χ˜ ) , (qℓ ⎥
⎪ ⎣ ψ˜ ℓ˜ ψ˜ ℓ˜ ⎦
2 edge
(m ℓℓq ) =⎨ 2 2
; (10.7)
⎪ except when ℓ˜ < q˜χ˜ < ψ ˜ 2 and ψ˜ 2χ˜ < qℓ ˜˜
⎪
⎩ where one must use (m q˜ − m χ˜10)2
10-11
Practical Collider Physics
10-12
Practical Collider Physics
Figure 10.2. Examples of decay trees from a recursive-jigsaw analysis of sparticle production. Reproduced
with permission from Phys. Rev. D 97 112001 (2018).
N
four-momenta in the V frame pVi = ∑ j =i 1pj is calculated for each group, where
i = a, b, and there are Ni jets in group i. The combination that maximises the
sum of the momentum of the two groups ∣pV a + pV b ∣ is chosen. This choice
implicitly defines an axis in the V frame which is equivalent to the thrust axis
of the jets, and the masses of each four-momentum sum MVi = pV2i are
simultaneously minimised.
The remaining unknowns in each event are associated with the two invisible
systems, each of which represents a four-vector sum of the invisible particles
produced by the decay of each new particle that was produced in the hard-
scattering process. We do not know the masses of these particles, nor their
longitudinal momenta, and we do not know exactly how they sum to form the
single E Tmiss vector that was reconstructed for the event. The RJR algorithm
determines these unknowns by iteratively minimising the intermediate particle
masses that appear in the decay tree. This allows a guess of the rest frame to be
chosen for each particle in the tree, and one can construct variables in each of these
rest frames simply by boosting the observed momenta in the event to the frame, and
constructing variables as functions of the boosted four-vectors. Generically, one
can construct variables that have units of mass (equivalent to energy and to
momentum in natural units where c = 1), and variables that are dimensionless, such
as angles between objects. This gives a rich basis of largely uncorrelated variables
that have proven to be very powerful in searches for semi-invisible particles. RJR
can get complicated very quickly, but there is a standard library called RestFrames
that can be used to define decay trees and their associated variables.
10-13
Practical Collider Physics
Table 10.1. Definition of the signal regions used in the 2014 flagship ATLAS supersymmetric search for
squark and gluino pair production. Credit: JHEP 09 176 (2014).
Signal Region
Requirement
2jl 2jm 2jt 3j
ETmiss [GeV ] > 160
pT (j1 ) [GeV ] > 130
pT (j2 ) [GeV ] > 60
pT (j3 ) [GeV ] > —- 60
Δϕ(jet1,2,(3) , pTmiss )min > 0.4
ETmiss / HT [GeV 1/2] > 8 15 —-
ETmiss /m eff (Nj) > —- 0.3
m eff (incl. ) [GeV ] > 800 1200 1600 2200
‘m’ means medium and ‘t’ means tight). In all cases, events are vetoed if they contain
electrons or muons with pT > 10 GeV .
We can understand the logic of the analysis by working through the table row-by-
row, and rationalising where each specific selection comes from. The selections in the
first two rows are necessary to ensure that the events are taken from the region of the
data where the trigger is fully-efficient. A detailed reading of the original analysis
shows that events were required to have passed a ‘jet-met’ trigger, which requires a
trigger-level jet with pT > 80 GeV , and GeV. The trigger is fully efficient for offline
selections of E Tmiss > 160 GeV and pT ( j1) > 130 GeV , where it is assumed in the
table that the jets are arranged in order of decreasing GeV. Note that the use of this
trigger has already introduced some model-dependence to the analysis, since the
analysis is already blind to any signal that would generate a smaller typical value of
the the E Tmiss , or leading jets with lower pT . Next, we see various selections on the pT
values of the other jets in the event, depending on how many are selected in the final
state for each signal region. These will have been tuned carefully on the simulated
benchmark signal and background models, to provide extra background rejection
without substantially reducing the number of expected signal events.
The fifth selection in table 10.1 is on the polar angle between the leading jets and
the pTmiss vector. This significantly reduces the multijet background, for which the
jets are frequently well-aligned with pTmiss , since the pTmiss arises from jet mismeasure-
ment. Each of the jets in the signal regions must have Δϕ( jet1,2,(3), pTmiss )min > 0.4.
Finally, we see selections on various quantities that provide most of the
discrimination between squark and gluino production and the SM backgrounds.
E Tmiss / HT is the main discriminant variable for squark pair production, which is
expected to populate the two-jet signal regions. For gluino pair or squark-gluino
production, E Tmiss /meff (Nj ) was found to be more effective in Monte Carlo studies,
where the meff definition for each signal region depends on the number of jets
10-14
Practical Collider Physics
10-15
Practical Collider Physics
N (CR, obs)
N (SR, scaled) = N (SR, unscaled) × , (10.10)
N (CR, unscaled)
where N(CR, obs) is the observed number of events in the control region,
N(SR, unscaled) is the Monte Carlo yield in the signal region, and N(CR, unscaled)
is the Monte Carlo yield in the control region. The normalisation factor can also be
applied to histograms of kinematic variables, by scaling the whole histogram by the
constant normalisation factor. This formula makes it clear that we are correcting the
Monte Carlo yield for the background in the signal region by the ratio of the observed
and Monte Carlo yields in the control region, and is the simplest way to picture what
is going on. The dominant alternative in the literature, however, is to regroup the terms
to give
⎡ N (SR, unscaled) ⎤
N (SR, scaled) = N (CR, obs) × ⎢ ⎥, (10.11)
⎣ N (CR, unscaled) ⎦
where the quantity in square brackets is called the transfer factor, and it is common
to refer to it as a quantity that extrapolates the observed control region yield to the
signal region. In reality, transfer factors are not usually defined by hand, but are
floated in a combined likelihood fit that compares the Monte Carlo background
yields with the observed yields in all signal and control regions simultaneously. This
allows for a consistent normalisation of each background process across all defined
regions, since the likelihood fit must balance the contribution of all processes in all
regions to match the observed data. The fit derives both the central value for each
transfer factor, and an associated uncertainty.
Choosing good control regions for SM background processes is an art in itself,
and arguably constitutes most of the effort in performing an analysis within an
experimental collaboration. As a general rule, one should aim to make a control
region for each dedicated signal region, also trying to ensure that the kinematic
selections in each control region are as similar as possible to their corresponding
signal region. This minimises systematic uncertainties on the transfer factor that
result from extrapolating from one region in ‘kinematic variable space’ to a wildly
different region. However, if the kinematic selections for a control region hardly
differ from those of its signal region, then we should expect it to be dominated by the
signal rather than the background process we are targetting. This is called signal
contamination, and it is very dangerous for the following reason. If a control region
had an excess related to the presence of a signal, our data-driven background
procedure would increase the normalisation of the relevant background, and
‘calibrate away’ any observed excess in the signal region. To guard against this, it
is common to test all control region definitions with the generated signal Monte
2
The fact that we do not know exactly what any potential signal at the LHC will look like, making any
attempt to test signal contamination doomed to failure, is usually politely ignored. At least we can say that the
particular signal we are optimising on does not contaminate the control regions. Any more general solution
would require a model-independent analysis technique.
10-16
Practical Collider Physics
Table 10.2. Examples of control regions used in the 2014 flagship ATLAS supersymmetric search for squark
and gluino pair production. Credit: JHEP 09 176 (2014).
CRW W (→lν )+jets W (→lν )+jets 1 e ± or μ± , 30 GeV < mT(l , ETmiss ) < 100 GeV , b-veto
CRT tt¯ and single-t ¯ ′lν
tt¯ → bbqq 1 e ± or μ± , 30 GeV < mT(l , ETmiss ) < 100 GeV , b-tag
Carlo samples to assess the level of signal contamination, tweaking the control
region selections as necessary until the contamination is reduced2.
To illustrate effective control region design, table 10.2 contains two of the control
region definitions from the supersymmetry search that we referred to above. These
regions were defined for each of the signal regions in the analysis, and in each case the
selection on the inclusive meff variable was retained from the signal region definition.
The other selections were modified to define regions dominated in particular back-
grounds. The first row gives details of a control region for the W+jets background,
which involves selecting events with 1 electron or muon in, imposing a b-tag veto (since
b-jets are expected only rarely in W+jets events), plus restricting the transverse mass in
events to lie between 30 GeV and 100 GeV . This selects a sample that is over-
whelmingly dominated by W+jets events where the W boson has decayed leptonically,
creating real missing transverse energy from the escaping neutrino. Furthermore, the
transverse mass is well known to be concentrated in this range for leptonic W boson
events. The dominant W+jets contribution in the signal regions instead arises from
events where the W boson decays to a tau neutrino and a hadronically-decaying tau
lepton, which gives events with jets plus missing transverse energy. To use the control
region events to model this, we can simply pretend that the electron or muon in our
control sample is a jet with the same four-momentum. This is a very neat trick for
defining the control region, since the 1 lepton selection makes it very distinct from the
signal region, but with similar kinematics for the final state objects. The second row of
table 10.2 contains selections for a control region that selects both top pair and single
top events. Once again, single lepton events are selected, and are used to model the fully
hadronic top decay background by pretending that the lepton is a jet with the same
four-momentum. The b-veto requirement of the W control region is changed to a b-tag,
which preferentially selects events with a top quark (since top quarks will decay to a b
quark and a W boson).
Control regions give us the means of refining our Monte Carlo estimates of each
background contribution, but ideally we should have a way of testing that our
background models accurately reproduce the LHC data before we look in the signal
region. Otherwise, we will remain unsure that any discrepancy observed in the signal
region actually results from new physics. The trick is to use validation regions which
are distinct from both the signal and control regions, whilst being carefully chosen to
reduce signal contamination. Defining validation regions is often difficult, since any
insight we had on how to enrich a given region with particular background events
has usually already been used in defining the relevant control region. For the control
10-17
Practical Collider Physics
regions in table 10.2, various validation regions were defined. The first took events
from the W and top control regions and reapplied the Δϕ( jet1,2,(3), pTmiss )min and
E Tmiss / HT or E Tmiss /meff (Nj ) selections that were missing from the control region
definitions. This makes the validation region much more similar to the signal region,
whilst still remaining different due to the 1 lepton selection. The lepton was then
either treated as a jet with the same four-momentum or, in a separate set of
validation regions, treated as contributing to the E Tmiss . Extra validation regions were
obtained by taking regions in which at least one hadronically decaying τ lepton was
reconstructed, with a separate b-tag requirement used to separate the W from the top
validation region. When designing an analysis, it is worth doing a thorough
literature review of analyses in a similar final state in order to learn strategies for
effective validation region design.
So far, we have assumed that the Monte Carlo samples we obtained at the start of
our analysis process provides an accurate description of the shape of each background
contribution, and the only correction we need to apply is to the overall normalisation.
For some backgrounds, however, this is woefully inadequate. The classic example is
that of the multijet background, for pretty much any analysis that requires a moderate
missing transverse energy cut. It requires a vast amount of computing time to generate
enough multijet events to populate the typical signal and control regions we encounter
in these searches, to the extent that reliable Monte Carlo background models are not
available. Thankfully, we already have a much better generator of multijet events,
which is the Large Hadron Collider itself! Since, to first order, every event at the LHC
is a multijet event, with everything else being a tiny correction, we can instead define a
purely data-driven approach to modelling the multijet background that removes the
need for Monte Carlo generators at all. For the ATLAS supersymmetry analysis, a jet-
smearing method was used, defined as follows:
(1) ‘Seed’ jet events are first defined by taking events that pass a variety of single jet
triggers with different pT thresholds, and the selection E Tmiss / E Tsum < 0.6 GeV1/2
is applied, where E Tsum is the scalar sum of the transverse energy measured in the
calorimeters. This selection ensures that the events contain well-measured jets.
(2) A jet response function is defined using Monte Carlo simulations of multijet
events, by comparing the reconstructed energy of jets with the true energy.
This function quantifies the probability of a fluctuation of the measured pT
of a jet, and takes into account both the effects of jet mismeasurement and
the contributions from neutrinos and muons that are produced from the
decay of jet constituents of heavy flavour. To do this, ‘truth’ jets recon-
structed from generator-level particles are matched to detector-level jets
within ΔR < 0.1 in multi-jet samples. The four-momenta of any generator-
level neutrinos in the truth jet cone are added to the four-momentum of the
truth jet. The response is then given as the ratio of the reconstructed jet
energy to the generator-level jet energy.
(3) Jets in the seed events are convoluted with the response function to generate
pseudo-data events. These are compared to real multijet events in a
10-18
Practical Collider Physics
dedicated analysis, and the response function is adjusted until the agreement
between the pseudo-data and real data is optimised.
(4) The seed jet events are convoluted with the final jet response function that
emerged from step (3). This provides a final sample of pseudo-multijet
events.
Once this method is complete, the pseudo-multijet events can be treated the same
as a Monte Carlo sample. A dedicated multijet control region was defined for the
ATLAS supersymmetry analysis, in which the signal region requirements on
Δϕ(jet1,2,(3) , pTmiss )min and E Tmiss / HT or E Tmiss /meff (Nj ) were inverted. The second
of these requirements could then be reinstated to define a corresponding validation
region.
In other analyses, fully data-driven background estimation methods are typically
used to measure backgrounds that contain fake objects, such as fake leptons or fake
photons. Both ABCD and matrix method approaches are common in the literature.
We are now at the point where we can state the number of events expected in the
signal region for each of the SM background processes. However, it is equally
important to attach an uncertainty to each contribution, which combines both the
statistical uncertainty related to the finite number of Monte Carlo events generated,
and a long list of systematic uncertainties, a detailed inventory of which was
provided in Section 8.6.2. As we saw there, for systematic uncertainties, an
experimentalist typically has to run their analysis code several times, each time
using a particular variation of systematic quantities within their allowed ranges such
as the jet energy, resolution and mass scales, the resolution and energy scale for
leptons and photons (plus their reconstruction efficiencies), the scale of missing
energy contributions, the efficiency for tagging b jets, the trigger efficiencies and the
overall luminosity. Theoretical systematic uncertainties can be defined by, for
example, varying cross-sections within their allowed uncertainties, comparing
different Monte Carlo generator yields, varying the QCD renormalisation and
factorisation scales for a Monte Carlo generator, and varying the parton distribution
functions. These variations define a series of predicted yields in each control,
validation and signal region, which can be used as inputs to likelihood fits. One
can also define bin-by-bin uncertainties for kinematic distributions in each region,
before using these to define systematic uncertainty bands on histograms of the
variables that were used for the particle search.
10-19
Practical Collider Physics
comparison of our background models with the observed data, before maximising that
likelihood and defining confidence intervals on its free parameters.
A typical likelihood for an LHC semi-invisible particle search includes some
parameters of interest (e.g. the assumed supersymmetric particle masses, or the rate of
a generic signal process), the normalisation factors for the background processes, and
a nuisance parameter θi for each systematic uncertainty i. The systematic parameters
can be defined such that θi = ±1 corresponds to the ±1σ variations in the systematic
uncertainties, whilst θi = 0 corresponds to the nominal yield. One can then construct a
total likelihood that is a product of Poisson distributions of event counts in the signal
and control regions, and of additional distributions that implement the systematic
uncertainties. This will be given by a function L(n, θ 0∣μ, b, θ ), where n is a vector
containing the observed event yields in all signal and control regions, b contains the
predicted total background yields in all signal and control regions, and μ contains the
parameters of interest. θ 0 contains the central values of the systematic parameters
(which, like n should be considered as observed quantities), whilst θ contains the
parameters of the systematic uncertainty distributions. Assuming for simplicity that
we have only one signal region, a suitable choice for L is
L(n , θ 0∣μ , b , θ ) = P(nS ∣λS ( μ , b , θ )) × ∏ P(ni ∣λi ( μ , b , θ )) × Csyst (θ 0 , θ ), (10.12)
i ∈ CR
where nS is the observed yield of events in the signal region, ni is the yield of events in
the i’th control region, and Csyst(θ 0 , θ ) is an assumed distribution for the systematic
uncertainties. It is common to take this to be a multidimensional Gaussian. The
expected number of events for each Poisson distribution is given by a function
λj ( μ, b, θ ), which contains the details of the transfer factors between the control
regions and the signal region, and between control regions. For example, imagine
that we had a single parameter of interest μ which gave the strength of a hypothetical
signal in units of the nominal model prediction. We could then write
λS (μ, b , θ ) = μ · CSR→SR(θ ) · s + ∑ CjR→SR(θ) · bj , (10.13)
j
and
λi (μ, b , θ ) = μ · CSR→i R(θ ) · s + ∑ CjR→iR(θ) · bj , (10.14)
j
where the index j runs over the control regions. The predicted number of signal events for
our nominal model is given by s, and the predicted yields in each control region are given
by bj. C is the matrix of transfer factors, whose diagonal terms CSR→SR are equal to unity
by construction, and whose off-diagonal elements contain the transfer functions
between the different regions (control regions and signal region) defined in the analysis.
Hypothesis testing can then be performed as in Chapter 5 by defining the profile
log-likelihood test statistic,
10-20
Practical Collider Physics
⎛ ⎞
⎜ L(μ, θˆˆ ) ⎟
qμ = −2 log ⎜ . (10.15)
ˆ ⎟
⎝ L(μˆ , θ ) ⎠
10-21
Practical Collider Physics
Figure 10.3. An example of a simplified model limit for a supersymmetry search, generated by performing
separate hypothesis tests on benchmark Monte Carlo models in the parameter plane. Reproduced with
permission from JHEP 09 176 (2014).
contamination of the control regions, and take into account the experimental and
theoretical uncertainties on the supersymmetric production cross-section and kine-
matic distributions, and the effect of correlations between the signal and background
systematic uncertainties. Repeating this for different benchmark models leads to
exclusion plots such as that shown in Figure 10.3, in which model points have been
generated for a simplified model of gluino pair production, with all gluinos decaying
to two quarks and a lightest neutralino. Different models are then given by different
choices of the gluino mass and the lightest neutralino mass. The observed limit is
calculated from the observed signal region yields for both the nominal signal cross-
section, and the red line delineates the region for which points are excluded at the
95% confidence level (points below the line are excluded). This line can be
determined by, for example, interpolating CLs values in the plane. The uncertainty
band on the red line is obtained by varying the signal cross-sections by the
renormalisation and factorisation scale and parton distriibution function uncertain-
ties. The expected limit is calculated by setting the nominal event yield in each signal
region to the corresponding mean expected background, whilst the yellow uncer-
tainty band on it shows how the expected limit would change due to ±1σ variations
in the experimental uncertainties. A major disrepancy between the observed and
expected limits (which has not occurred in this case) might indicate evidence for an
excess consistent with the region of the parameter space where the deviation is
observed. In plots like these, it is common to use the signal region with the best
expected significance to determine the exclusion at each point in the plane, rather
10-22
Practical Collider Physics
3
Note that this tells us that we can never use the LHC alone to distinguish a completely stable weakly-
interacting particle from a meta-stable one, which ultimately means that the LHC alone cannot unambigu-
ously discover a dark matter candidate. This is independent of the need to correlate LHC observations with
astrophysical measurements, which is the only means of testing whether a particle produced at the LHC can
provide the dark matter that we see in our universe.
10-23
Practical Collider Physics
Further reading
• The ATLAS supersymmetry search that we used as an example is Search for
squarks and gluinos with the ATLAS detector in final states with jets and
missing transverse momentum using s = 8TeV proton-proton collision
data JHEP 09 176 (2014). We have also made use of an updated analysis
that used recursive jigsaw reconstruction, detailed in Phys. Rev. D 97 112001
(2018).
• The RestFrames package can be found at restframes.com, and more
information on recursive jigsaw reconstruction can be found in Phys. Rev.
D 96 11 112007 (2017).
• A useful resource on the likelihood treatment for ATLAS semi-invisible
particle searches is Baak M et al, HistFitter software framework for statistical
data analysis Eur. Phys. J. C 75 153 (2015).
• We have barely had space to scratch the surface of long-lived particle searches.
A comprehensive recent review be can be found in arXiv:1903.04497.
Exercises
10.1 A search is performed for smuon pair production at the LHC, in which
each smuon is expected to decay to a muon and a lightest neutralino.
(a) If the lightest neutralino is assumed to be massless, what is the
endpoint of the mT2 distribution for the supersymmetric signal
process?
10-24
Practical Collider Physics
10-25
IOP Publishing
Chapter 11
High-precision measurements
Now that we have explored the two basic types of direct particle search at the LHC,
let us continue on to consider the topic of measurements. We will primarily
distinguish such measurements from the analyses considered in the previous chapters
by their attempts to correct for biases introduced by the detection process, and/or to
fully reconstruct decayed resonances. These are not absolute conditions, and there
are undoubtedly searches which contain these elements, as there are measurements
which do not—but for our purposes here, it is a useful line to draw.
Analyses as described so far in this book operate in terms of detector-level
observables: fundamentally, the numbers of collider events found to have recon-
structed properties that fall into binned ranges or categories. Statistically there is no
uncertainty on these event counts: a fixed integer number of events will be found in
each reconstruction-level bin, and the observed yields can (via the use of Monte
Carlo (MC) and detector simulation tools) be used to constrain the masses,
couplings or signal-strengths of new-physics models. For such analyses, what
matters most is that the expected significance of the analysis to a particular choice
of new physics model is maximised. The limiting factors on such analyses arise from
the statistical uncertainties due to the finite sample size, and systematic uncertainties
from defects both in the model predictivity and the detector’s resolutions and biases.
By contrast, a ‘precision measurement’ analysis is motivated to make the best
possible estimation of how the observed events were really distributed, usually
without hypothesis testing of any particular model in mind. Most notably, the
imperfect detector has no place in such estimates: we want to know what would have
been observed not at ATLAS or at CMS specifically, but by anyone in possession of
a detector with known and correctable reconstruction inefficiencies and biases. In a
sense, what we want to know is what would have been observed by an experiment
with a perfect detector. As the integrated luminosity is also specific to the experi-
ment, we typically also divide our best estimate of this out of the inferred event
yields (and propagate its uncertainty) so that the analysis target is now a set of total
11-2
Practical Collider Physics
perfect coverage where the real one had effectively none. This picture corresponds to
a closed region of phase-space highly overlapping with the performant regions of the
real detector, within which physics object performance is perfect: this is the fiducial
volume. Its power is that it defines a detector-independent class of measurement in
which the assigned probabilities are dominated by the observations and under-
standing of the contributing experiment, rather than any specific physics model. By
contrast, any extrapolation outside the fiducial volume is necessarily dependent on
assumptions about things that could not be directly observed and in most cases will
either be highly model-dependent or highly uncertain. The motivating principle of
fiducial measurements, above all others, is not to extrapolate, and hence to make the
measurement as precise and model-independent as possible, based on what the
experiment was actually able to observe1. Theory predictions made more inclusively
than the detector could have seen, or at parton level, require the application of
separately estimated cut efficiencies and/or non-perturbative corrections for compar-
ison to a fiducial measurement.
Use of the fiducial volume in model interpretations from detector-independent
observables hence minimises the risk of contamination by such problematic regions.
Fiducial volumes restricted beyond the essential level imposed by detector limita-
tions can also be useful, as they permit comparison (and combination) of several
experiments’ measurements of the same fiducial observables within a commonly
accessible phase-space.
The exact fiducial volume is analysis-specific: an analysis measuring only inclusive
jets naturally has access to a larger fiducial volume than one using the same detector
which needs restricted-acceptance lepton or jet flavour-tag reconstruction. In
practice, fiducial observables are implemented in terms of event analysis on
simulated collider events via an MC event generator, such that the fiducial volume
is expressed via the fraction of inclusive events to pass the analysis selection cuts. As
these particle-level events can be passed through the detector simulation and
reconstruction chain, and a closely related reconstruction-level analysis performed,
a one-to-one mapping of simulated events can be obtained, allowing at least in
principle the construction of an arbitrarily complete mapping between fiducial and
reconstruction-level observables. Practical attempts to obtain and apply such a
mapping, and hence fiducial observables as seen in real collider data, lie at the heart
of the detector-correction methods described in Section 11.3.
1
As the host of venerable UK game show Catchphrase used to urge his contestants, ‘Say what you see’.
11-3
Practical Collider Physics
∑ wi
i ∈ acc
σfid = σtot , (11.1)
∑ wi
i ∈ all
where the ∑i wi terms are the sums of event weights in the showered MC sample,
either in total or those accepted by the analysis for the denominator or numerator,
respectively. The same applies to differential fiducial cross-sections dσfid /dX by
considering the acceptance restricted to a single bin. As a histogram bin already by
definition contains the sum of weights accepted into its fill-range, the differential
cross-section is usually obtained by simply dividing by the incoming sum of MC
weights ∑i∈all wi (giving a unit normalisation for the histogram), and multiplying by
the high-precision total cross-section σtot of choice.
11-4
Practical Collider Physics
11-5
Practical Collider Physics
11-6
Practical Collider Physics
single, generic ‘lepton’ measurement. At the partonic level, this is trivial, since
for high-energy collisions both the electron and muon can be safely treated as
effectively massless, but in fiducial definitions some extra work is needed to
account for the higher-mass muon’s suppression of QED radiation relative to
the electron. Without doing so, ‘bare’ final-state muons would have more
energy than equivalently produced electrons, since the muons radiate less. The
optimal balance is one that minimises model-sensitive extrapolations from
reconstruction to truth levels, and is achieved using the lepton-dressing recipe
described in Section 8.2.1.
• We earlier discussed the reconstruction-level design of isolation methods, to
separate direct non-QCD particles from their indirect cousins produced in
non-perturbative fragmentation processes. These methods can be more or less
directly encoded in a fiducial definition as well, for example summing the
hadronic transverse energy in a cone around a bare or dressed lepton and
applying the absolute or relative isolation criteria. But fiducial definitions do
not have to exactly match the reconstruction procedure: they just have to
achieve a similar enough result. As we already discussed, a particle’s direct-
ness is a property which in principle could be obtained from forensic final-
state reconstruction of hadron decays. Depending on the isolation procedure
used, it may even be acceptable to shortcut the isolation method at least
partially, by use of the directness property.
11-7
Practical Collider Physics
cannot be seen directly: the missing momentum must be used as a proxy. This
cunning plan has several deficiencies:
1. As the initial-state colliding-parton momenta are not known (cf. parton
distribution functions (PDFs)), the longitudinal component of the missing
momentum cannot be known even in principle with respect to the partonic
scattering rest frame. Even more crucially, since the missing momentum
components are estimated via negative sums of visible momenta, and the
dominant longitudinal momentum flows are in the uninstrumented area close
to the beam-pipe, even the lab-frame missing longitudinal momentum cannot
be measured with any accuracy. The best-available proxy is the purely
transverse missing-momentum vector, pTmiss .
2. The neutrino cannot be distinguished from any other invisible particle. While
reconstruction of the system would be almost entirely unaffected by a
different near-massless invisible object (or objects) in its place,2 physics
processes with massive invisibles contributing to pTmiss + jets event signatures
will be misreconstructed.
If one does not need to reconstruct the W completely, these limitations are not a
major problem as much relevant information is captured in differential distributions
such as those of the charged-lepton transverse momentum, pTℓ , and the transverse
mass,
(which uses nearly all the measured kinematic information, other than the charged-
lepton’s longitudinal component). Neglecting spin effects, the motion of the W and
assuming a two-body decay, the transverse momentum of the lepton is
1
pTℓ = MW sin θ , (11.3)
2
allowing the differential cross-section in the lepton pT to be written in terms of the
polar angle θ as
dσ dσ dcosθ
= (11.4)
dpTℓ dcosθ dpTℓ
dσ dcosθ 2
= . (11.5)
dcosθ dsinθ MW
The Jacobian factor dcosθ /dsinθ is easily computed using the usual trigonometric
identities, and re-substituting equation (11.3) gives the resulting rest-frame pTℓ
lineshape
2
Only ‘almost’ entirely, since non-SM spin effects could also manifest in the final state, via angular
distributions.
11-8
Practical Collider Physics
dσ ⎛ 2pT ⎞
ℓ
dσ 1
= ⎜ ⎟ .
dcosθ ⎜⎝ MW ⎟⎠ (MW /2)2 − (p ℓ )2
(11.6)
dpTℓ
T
This expression clearly has a singularity at pTℓ = MW /2, which due to resolution
effects, the motion of the W, and off-shell effects (cf. the W’s Lorentzian mass
lineshape) is rendered as a finite peak with a sharp cutoff to the distribution. Similar
kinematic endpoints appear in other distributions such as mT , and can be used to
constrain the system subject to different systematic uncertainties: examples are
shown in Figure 11.1. Note that we have here considered the W mass, but this
analysis is relevant for any two-body decay of a heavy object to a charged lepton and
a light invisible particle, and hence the model assumptions can be kept minimal.
For complete W reconstruction, however, stronger assumptions must be made in
order to solve for the unknown missing-object momentum components: its longi-
tudinal momentum, pzmiss, and mass, m . The obvious assumptions to make, given the
depth to which the electroweak model was tested at LEP and other lepton colliders,
is to assume SM W decay kinematics. These manifest via the missing-object
constraints (px , py ) = (pxmiss , pymiss ), i.e. a single invisible object, and m = 0 as for a
neutrino (within the ATLAS and CMS experimental resolution). The single
remaining unknown is hence pxmiss. To fix this, we again assume the SM and use
the W mass as a standard candle: to the extent the W can be treated as on-shell, the
sum of the fully-reconstructed lepton and the partially-reconstructed neutrino four-
vectors should have an invariant mass equal to the W. We can hence express a first-
order estimate of the neutrino pz, assuming massless leptons, via the constraint
MW2 ≈ (Eν + Eℓ )2 − (pν + pℓ )2 (11.7)
Figure 11.1. The ℓ + MET transverse mass spectrum (left) and electron pT distribution (right) in W → eν
decay. The black and red lines show the distribution with pT (W ) = 0 and a realistic W pT spectrum,
respectively, and the shaded region shows the distributions after including detector and reconstruction effects.
Reproduced from Sarah Malik, Precision measurement of the mass and width of the W Boson at CDF
FERMILAB-THESIS-2009-59TRN: US1004113 (2009).
11-9
Practical Collider Physics
This form contains two dependences on pzν : one explicit, and the other hidden inside
∣pν ∣. Expanding, we can rearrange into a quadratic formula with the resulting
neutrino longitudinal-momentum estimate
pz,ℓ Q 2 ± (
pz2, ℓ Q 4 − pT,2 ℓ ∣pℓ ∣2 ∣pT,ν ∣2 − Q 4 ) (11.10)
p˜z,ν = ,
pT,2 ℓ
where Q 2 = p T,ν · p T,ℓ + MW2 /2, and we use the experimental pTmiss vector in place of
p T,ν and ∣pT,ν ∣.
It is important to note that the quadratic leads to an ambiguity in the pz,ν
assignment, as there are in general two values of the longitudinal momentum that
will satisfy the mass constraint. Nevertheless, it is a very useful technique for
re-acquiring an almost unambiguous event reconstruction, under the assumption of
SM kinematics. Of course, this approach is unviable in analyses wishing to measure
the W mass!
This picture is rather idealised, though. In practice, the fact that pTmiss includes
contributions other than the direct neutrino, imperfect reconstruction resolutions, and
the finite W resonance width spoil the exactness of the solution, and result in imaginary
components appearing in the quadratic pz,ν estimates. These can in fact be used as
heuristics to break the quadratic degeneracy, such as choosing the solution with the
smallest imaginary components, or which requires the smallest modification of lepton
and jet kinematics (allowed to vary on the scale of their resolutions) to achieve a real-
valued solution. Such fits are typically performed via a kinematic likelihood expression,
with resolution-scaled Gaussian penalty terms on the kinematic variations. For variety,
we will introduce these concepts in the following section, where the same ambiguities
enter the reconstruction of single-leptonic top-quark events.
11-10
Practical Collider Physics
CKM coupling: ∣Vtb∣ ∼ 1, and the large phase-space available for its relatively
light decay products, means that the top decays before hadronising: our closest
possible view of a bare quark. But both QCD and EW corrections to top-quark
kinematics have been found to be critically important for the accuracy of MC/
data model comparisons.
These factors introduce problems for our preferred precision-analysis method, the
fiducial definition: the field has still not worked out a standard approach which
addresses all these issues. In this section, we will explore the methods available for
precision top-quark reconstruction with a single leptonic top, and in the next section
the even more awkward case with two. These methods provide useful techniques at
the reconstruction-level regardless of the level to be corrected to.
Our workhorses for top-quark physics analysis will be leptonic tops—those whose
decay-daughter W decays leptonically, as shown in Figure 11.2(a). Naïvely, fully
hadronic top-quark decays are more attractive as they have a fully reconstructible
final-state without missing energy. But two factors put paid to this hope: firstly,
either singly or in a tt¯ pair, hadronic tops have few distinguishing signatures from
the overwhelming QCD multi-jet background; and secondly, the resolutions on
masses and other composite observables obtained from jets are much poorer than
are required for e.g. a top-quark property measurement. Some published studies
have been made using jet-substructure techniques to reconstruct high-momentum
tops, reducing the jet background, but for now the momentum resolution remains
restrictive. By contrast, events with at least one leptonic top have a direct charged
lepton (immediately reducing the background by orders of magnitude, since a high-
scale weak process must have been involved somewhere), significant pTmiss , and
better resolution with which composite-object reconstruction techniques can be
attempted. The downside is the ambiguities introduced by the invisible neutrinos: at
Figure 11.2. (a) Leading-order Feynman diagram for leptonic top-quark decay, via the 99.8%-dominant t → b
CKM channel. (b) Top-quark pair branching fractions. Credit: diagram courtesy of the D∅ Collaboration.
11-11
Practical Collider Physics
least one per leptonic top, and two in the case of tau leptons (one from the tau
production and one from its leptonic decay to e or μ).
The classes of top-pair decays producing a charged lepton (including taus, which
themselves decay hadronically ∼65% of the time, and leptonically 35% of the time)
are shown in Figure 11.2(b). In addition to these we can add leptonic decays from
single-top production, although these have much smaller cross-sections due to being
induced via weak interactions; both the tt¯ and single-top production mechanisms are
shown in Figure 11.3. From these fractions we see that roughly 38% of top-pair
events have exactly one direct electron or muon in their final state, the semileptonic
tt¯ topology; and ∼6% in the dileptonic tt¯ mode with two direct e/μ. Given the natural
trade-offs between resolutions, yields, and the ambiguities of missing momentum,
Figure 11.3. (a) Leading-order Feynman diagrams for QCD top-quark pair-production. (b) Leading-order
Feynman diagrams for single-top-quark production: a) s-channel, b) t-channel, and c) Wt modes.
11-12
Practical Collider Physics
semileptonic tt¯ → bbjjℓ¯ ν ) is the most commonly used top-pair process for precision
measurements.
The nice thing about single-leptonic top reconstruction is that we can largely recycle
the principles used in the previous section for leptonic W reconstruction—after all, the
SM t → bℓ +νℓ decay is a leptonic W + decay with an additional b quark. This additional
quark is not trivial—as can be seen from Figure 11.3, the leptonic top will either be in a
tt¯ event, in which case there will certainly be another b-jet (from the other, hadronic,
top-decay), or will be in a single-top event where a b-quark is almost certainly co-
produced with the top due to the dominant charged-current production vertex. Even in
the diagrams without an explicit b-quark in the final state, the initial-state b has to be
produced from a QCD g → bb¯ splitting, with the additional b often entering the
analysis phase-space rather than just being ‘absorbed’ into the PDF. The mix of
implicit versus explicit g → bb¯ splitting in such processes is the subject of active theory
discussion about the best mixture of calculational four- and five-flavour schemes. The
extra b-jet, via either production mechanism, leads to a combinatoric ambiguity which
again requires an extra reconstruction heuristic.
As for W reconstruction, if the top mass is not being measured by the analysis, the
invariant mass m(b, ℓ, ν ) = mt ∼ 172.5 GeV can be used as a constraint in addition
to the W mass standard-candle. In principle this gives a perfect solution up to now
four-fold quadratic ambiguities, and we could stop the description here. However,
this more complex system is a good place to discuss the more practical realities of
reconstruction with unknown pzmiss. These are:
• experimental resolutions;
• truly off-shell tops and W’s;
• unresolvable neutrino pairs from leptonic tau-decay feed-down (electrons and
muons from direct-tau decays are necessarily treated as if themselves direct).
These effects spoil the perfect numerics of solving the quadratic systems, and hence
realistic top reconstruction attempts to take them into account. The simplest
approach is rather lazy: perform the W and top reconstruction sequentially, first
fixing the neutrino pz via the W mass, then using the (mostly known) top mass for
ambiguity resolution. The first step will not in general produce real-valued solutions
for pzmiss, in which case some heuristic is needed: one common approach is to permit
directional variations to the pTmiss vector, setting pzmiss = 0 and setting the transverse
mass mT equal to the W mass. An alternative common heuristic is to obtain the
complex solutions and just set the imaginary parts to zero—with a maximum
tolerance beyond which the event is declared un-reconstructible. Both are simple,
but improveable.
A more holistic approach is that of kinematic likelihood fitting. In its simplest
form this defines a simple, χ 2 -like metric of reconstruction quality based on how
closely the W and top components of the system match their pole-mass values:
2 2
χ2 ∼
( ˜ ℓνb − mtpole )
m
+
( ˜ ℓν − mWpole )
m
, (11.11)
˜ ℓνb)2
σ (m ˜ ℓν ) 2
σ (m
11-13
Practical Collider Physics
where m˜ ℓν and m˜ ℓνb are the leptonic W and top-quark mass estimators from the
combination of the charged lepton ℓ , and the neutrino candidate ν (with free/
reconstructed z momentum), and b-jet b. The σ (m̃i ) terms encode Gaussian
uncertainties around the pole masses, combining the intrinsic Γt ∼ 1.3 GeV and
ΓW ∼ 2.1 GeV widths with typically larger reconstruction resolutions Δm̃i , combined
according to e.g. σ (m˜ i )2 = Γ i2 + Δm˜ i 2 . It is then a question of numerical optimisation
and exploration to identify the best-fit configuration, classify whether it is acceptably
good, and to perhaps propagate some measure of the resulting uncertainty. An
example application of this technique, at truth-particle level for high- pT ‘boosted’ tt¯
events, is shown in Figure 11.4.
Obviously this approach can be refined into a more careful (log-)likelihood metric
in which the Lorentzian resonance is not approximated as Gaussian, but rather
convolved into a Voigt profile (see Section 5.1.6). Other extensions include putting
the hadronic top reconstruction into the fit (in the case of semi-leptonic tt¯ , and with
a correspondingly larger experimental resolution term), and potentially even
accounting for systematic correlations between the W and top-quark masses, given
their common constituents.
A further recent refinement in semi-leptonic tt¯ , in exchange for some clarity about
the mechanism of operation, is to use modern neural network technology in addition
to this leptonic reconstruction to assist with reducing or resolving the light-jet
combinatoric backgrounds for the accompanying hadronic top: these can be avoided
by requiring that there be only two b-jets and two light-jets in the event, but doing so
greatly reduces the number of available events, particularly for high-mtt¯ events where
there is a large phase-space for significant extra QCD radiation. Deep neural network
classifiers can help to improve the efficiency and purity of semi-leptonic tt¯ recon-
struction in the presence of multiple extra jets, but there is no free lunch: if using such
methods, validations and systematic error propagation will likely need to be performed
to capture any model-dependences implicit in the neural network training.
Figure 11.4. Illustration of invariant-mass kinematic fitting reconstruction of high- pT semileptonic tt events,
and their main W + jets background process. Both the leptonic W and the leptonic top-quark reconstructed
mass spectra are shown, with the extra suppression of the background clear in the latter case. Credit: MC
analysis and plots by Jack Araz.
11-14
Practical Collider Physics
So the system can be solved—but only just. This counting assumes many things:
1. Perfect charged-lepton, b-jet, and pTmiss identification and resolution;
2. Absence of any extra QCD radiation (the degree of inaccuracy of this
assumption effectively further degrading momentum and mass resolutions);
3. Correct assignment of b-jet/lepton pairings (in practice the b-quark charge
cannot be identified from the reconstructed b-jet);
4. SM decay kinematics via the intermediate W;
5. Perfectly on-shell W and top-(anti)quark resonances, with perfectly known
pole-mass values.
Even with these assumptions, the solution can only be obtained up to discrete
polynomial root-finding ambiguities, which in practice are multiplied by the
assignment ambiguities between leptons and b-jets. To minimise the impact of
Z + jets backgrounds in further confusing the resolution of these ambiguities, it is
common for experimental studies of the dileptonic tt¯ system to focus on the mixed
lepton-flavour eμ channel, rather than the same-flavour ee and μμ ones, or to use
explicit pTmiss and m ℓℓ cuts to reduce the impact of Z → ℓℓ backgrounds.
11-15
Practical Collider Physics
Analytic reconstruction
Given the equal numbers of variables and constraints on the system, analytic
solution is an obvious approach to take: the first such approach in 1992 was solved
numerically, and subsequent attempts reduced the sensitivity to singularities in the
constraint equations. The current de facto analytic approach is the ‘Sonnenschein
method’, a 2006 refinement to minimise the number of solution steps3.
This method separately considers the top-quark and antiquark systems, entangled
by the ambiguity between their (anti)neutrino decay products, assuming that the
b-jet/lepton pairing can be made correctly. By the usual arguments, the neutrino
energy on either side of the top-pair decay can be expressed via both the W and top
mass-constraints, as
mW2 − m ℓ2 − m ν2 + 2pℓ · pν
Eν = (11.12)
2Eℓ
3
See arXiv:hep-ph/0603011, but note there is an important sign-error in the original formulae! Using a
debugged pre-written implementation of the method is recommended.
11-16
Practical Collider Physics
whereupon the two right-hand sides can be equated to eliminate the dependence on
Eν. Doing so for each of the top and anti-top sides gives rise to linear equations in
the (anti)neutrino three-momentum components,
a1 + a2 pν,x + a3 pν,y + a 4 pν,z = 0 (11.14)
where the an and bn coefficients are functions of the other kinematic variables for
that decay. These can be used to express the neutrino z-components in terms of their
x and y counterparts, and together with the neutrino dispersion relation
Eν2 = m ν2 + pν2, x + pν2, y + pν2, z and the complementarity of the neutrino and anti-
neutrino x and y momenta in producing the total pTmiss , can be used to obtain a pair
of two-variable quadratic equations,
c22 + c21 pν,x + c11 pν,y + c20 pν2, x + c10 pν,x pν,y + c00 pν2, y = 0 (11.16)
d22 + d21 pν,x + d11 pν,y + d20 pν2, x + d10 pν,x pν,y + d 00 pν2, y = 0. (11.17)
As these two quadratics are expressed in the same variables, solutions for pν,x and
pν,y can be found where the two surfaces intersect. This is done by computing the
resultant with respect to pν,y , which is a quartic polynomial in pν,x and hence in
general has four solutions. In fact, with perfect resolution and particle-pair assign-
ment only two would usually be distinct, but in practical applications all four need to
be considered. For each pν,x solution, a single corresponding pν,y can be computed
with good numerical stability.
The critical issue with the Sonnenschein method is its assumptions of perfectly on-
shell decays, perfect experimental resolutions, and absence of assignment ambigu-
ities. The latter of these is not such a problem, merely ensuring that the fourfold
solution ambiguity is always relevant; the former are bigger issues, especially given
significant jet and pTmiss systematic uncertainties, and may lead to no real solutions
being found. A similarly analytic, but distinct ‘ellipses method’ has been developed
in which solutions are identified as the intersections of ellipses along which valid
solutions of the dilepton-tt¯ system’s kinematic constraints can be found. The
advantage of this more geometric approach is that it provides a clear heuristic for
recovering a solution in the case of ‘near-miss’ events with no intersections, instead
using the points of closest approach between the two ellipses.
Kinematic likelihoods
Given the relative algbraic complexity of analytic reconstruction, and especially since
it eventually runs aground on issues of finite resolution and resonance widths, a more
popular method in LHC dileptonic tt¯ studies has been to obtain ‘best fit’ estimates of
the tt¯ system kinematics by an explicit kinematic likelihood fit, bypassing the analytic
solution in favour of a numerical optimiser in the space of neutrino momentum
parameters—perhaps with analytically informed starting values. The W and top mass
11-17
Practical Collider Physics
constraints then naturally enter the target loss function not as delta-functions as
assumed in the Sonnenschein approach, but as broad pdfs incorporating knowledge of
both the Breit–Wigner resonance shapes and known widths, and experimental
uncertainties encoded to in-principle arbitrary levels of detail. Priors derived from
SM MC simulations can potentially also be included—via the usual dance between
Bayesian and frequentist semantics, and concerns about inadvertent model-depend-
ence. This overall approach is simply the higher-dimensional equivalent of kinematic
likelihood fits for semileptonic tt¯ , and can either be used to map the whole pdf (on an
event-by-event basis) or to find the single best-fit point within it.
Neutrino weighting
A similar approach goes by the name of neutrino weighting, which is most used in
top-mass measurements with dileptonic tt¯ : in this situation, the mt and mt¯
constraints cannot be used, leaving the system underconstrained even if the typical
mt = mt¯ assumption is made. The neutrino weighting method extracts an mt
measurement from a set of events by assuming top mass values in a set of special
MC simulations propagated through a comprehensive detector simulation. In
general, the predictions from each hypothesised top mass can be compared to a
whole set of observables {O}, and the likelihood or posterior density computed
accordingly. The best-fit probability score in each event can be used as an event
weight, and the distributions of these weights—typically their first two moments—
used to place a confidence interval on mt.
So much for the general concept. However, a reduced version of this has become usual
to the extent of being synonymous with ‘neutrino weighting’, in which{O} = {pTmiss }, i.e.
the weight is computed purely based on the missing-momentum components. To do this,
of course, the missing momentum cannot be used as a constraint, otherwise there would
be a circular-logic problem. The approach taken is to replace those two constraints (in
pxmiss and pymiss) with a scan over hypothesised values for the unmeasureable neutrino and
antineutrino (pseudo)rapidities ην and ην̄ ; each (ην , ην̄ ) combination is then translated
into a prediction for pTmiss via the other constraints, which can be compared to the
measured value to compute a likelihood weight,
⎛ (p miss − p ν − p ν¯ )2 ⎞ ⎛ (p miss − p ν − p ν¯ )2 ⎞
· exp⎜⎜ − ⎟ , (11.18)
y y y
⎜
wν(ην , ην¯ ; mt ) = exp⎜ − x x x
⎟
⎟
2 2 ⎟
⎝ 2σ T ⎠ ⎝ 2σ T ⎠
11-18
Practical Collider Physics
11-19
Practical Collider Physics
may be included, such as the number of tracks associated to a vertex for vertexing or
tagging performance: it is important that such efficiency tabulations do include all key
variables rather than assuming that (pT , ∣η∣) is always sufficient, and that the calibration
distributions in these variables are similar enough to those in the signal phase-space4.
While these tabulated efficiency numbers are only averages—per-event sampling
effects will be considered in the coming sections—their reciprocals can be treated as
per-object weights to offset some average biases,
wieff = 1/ ϵi (11.19)
for object i with efficiency ϵi . By construction, these weights are always >1.
The effect is that e.g. a low- pT or high-∣η∣ reconstructed track (with typically lower
reconstruction efficiency) will receive a higher weight than one with higher-efficiency
kinematics, and observable contributions containing such objects will be increased in
importance. Note that this is a per-object, per-event correction to be derived in an
event-analysis loop, not something that can be performed as a post hoc correction on
the observable distributions alone.
Mis-identifications, e.g. accidental labelling of an electron as a prompt photon, or
contributions from objects outside the fiducial acceptance, may be handled similarly,
but now with correction factor
wimis = (1 − fi ), (11.20)
where fi is now the expected mistag rate for an object i’s kinematics, i.e. the fake
fraction of the reconstructed objects. By construction, this weight is always <1. A
good example of such corrections in real-world use is the ATLAS track-based
minimum-bias analysis from early LHC operation.
It is important that such efficiency weights not be misunderstood as up-scalings of
momenta or similar: an efficiency tells us nothing about per-object mismatches
between particle and reco levels, and often this correction is made under the
assumption that they coincide, with shortcomings in that assumption to be offset
by the black-box unfolding step to follow. Efficiency weights rather indicate that if
one a priori low-efficiency object is present at reconstruction level, there were
probably more, or more events containing such objects, at particle level. The exact
propagation of these weights into observables depends on how the observable is
constructed from the inefficiently constructed objects: of course, inefficiencies or
impurities in objects irrelevant to the analysis should not affect the results at all.
We now proceed to the more complicated issue of handling the residual biases,
known as unfolding. Most such methods are based on MC-driven comparisons between
particle-level and reco-level observables, but in the interests of continuity we note here
that one—the ‘HBOM’ method—is a natural extension of the (in principle) MC-
independent pre-unfolding corrections described here. Those interested in following this
logic immediately should turn to the HBOM description later in this chapter.
4
An excellent, forensic, and very instructive examination of errors caused by subtleties in efficiency-correction
estimation for LHCb D0 meson differential cross-section measurements may be found in Chapter 8 of CERN-
THESIS-2018-089.
11-20
Practical Collider Physics
11.3.2 Unfolding
In common use, ‘unfolding’ is both an umbrella term used to refer to the entire
detector-correction process, and a more specific reference to the ‘magic’ bit of that
inference process. We will use it in the second form.
But first, to understand why correction is in general a difficult problem to solve, and
hence why complicated numerical machinery needs to be involved at all, it is useful to
take a step back and consider the inverse process: the forward folding of particle-level
observables (cf. the fiducial definitions discussed earlier) into reconstruction-level ones
as would be seen by an analyser accessing LHC-experiment data files. This process
includes the effects of finite detector resolutions, systematic calibration offsets (both
additive and multiplicative), and efficiency and mistag rates as invoked in the previous
section. Other than the systematic offsets, these processes are stochastic: repeated
passes of the exact same particle-level objects through the detector would not produce
exactly the same reconstructed observables. Folding is hence a broadening of particle-
level delta functions into finitely wide reco-level probability distributions, which are
then convolved with the particle-level input distributions. We should note immedi-
ately, that as the original folding process was stochastic, the unfolding logically cannot
be deterministic: whether it is explicitly stated or not, unfolding is a probabilistic
inference process, and has to work in terms of indefinite probability distributions.
On a per-event basis, smearing is not a deterministic process, but aggregated over
a statistically large number of events, the expected particle-to-reco mapping is well
defined. The problem is that this map may be many-to-one: more than one particle-
level distribution may produce the same reco-level one, and hence—even in the
infinite-statistics limit—a simple inversion of the ‘folding function’ is not in general
possible. Technically, this makes unfolding an ill-posed problem (as are many inverse
problems). As we shall see, such issues are typically dealt with by regularizing the
inversion—modifying the algebra such that a stable and unique solution can be
obtained with minimal bias. In the real world, we have not just this algebraic
singularity to deal with, but the additional complications of finite statistics, and
dependences of our detector-behaviour estimates on unknown nuisance parameters.
Before we consider these difficulties, it is useful to establish some common concepts
and terminology with which to frame the problem and its attempted solutions.
Unfolding formalism
Unfolding is always an attempt to find a map, or more generally a probabilistic
family of maps, between two data spaces: the reco observable space R, correspond-
ing to the measured data we want to correct, and the particle-level ‘truth’ space T we
are targetting in order to make a fiducial measurement. Barring some unforeseen
development, the values in each space are real, so R ∈ N and T ∈ M where N and
M are the number of measurements, i.e. number of distribution bins, in each space.
Folding corresponds to T → R maps, and unfolding to R → T . This relationship is
illustrated in Figure 11.5, with the ‘detector effects’ map from T → R labelled D , and
therefore its inverse D −1 providing the ideal map from R → T . These ideal maps not
being available, we instead typically progress with an approximation D͠ and its
11-21
Practical Collider Physics
D
DATA: Truth Reco
~
D
MC: MC truth MC reco
~ -1
D
~
Unfolded D-1
UNFOLDING: measurement Reco
(truth estimate)
Figure 11.5. Relationship between ‘truth’ and ‘reconstruction’ views of collision processes, in both real and
simulated events. MC simulations may be used to obtain an estimate D͠ of the real detector-effect transform D ,
and effectively (not always literally) invert it to find an estimate of the true physics parameters.
schematic inversion, obtained from e.g.~MC simulation. Note that in this setup we
are not making any requirement that the ‘physical meaning’ of the values be the
same in both spaces: we will return to this, as it provides a powerful framing with
which to explore the limits of what can and cannot be unfolded.
The natural way to encode such a map is through linear algebra, that is by
expression of forward folding as the transformation of a tuple of truth-space bin
values tj to a tuple of reco-space ones ri:
ri = ∑ Rijtj , (11.21)
j
where R ij is the response matrix. It is good practice for the ri and ti tuples to include
bins for migrations from and into regions outside the analysis acceptance, so we
have an algebraic handle on such migrations and can include what information we
have about them when inverting the folding map.
While written deterministically here, we can view this response equation as the
infinite-statistics limit (and maximum-likelihood estimator) of the stochastic folding
process, with R encoding the fractional contributions that each truth-level bin will
make on average to each reco-level one. Expressed as a conditional probability,
Rij = P(i ∣ j ), (11.22)
and hence it is normalised as ∑i R ij = 1 ∀ j . The intention is, of course, that the response
matrix should have a localised structure, ideally with each truth-level bin mapping to one
and only one reco-level one; in practice there will always be some width to the
distribution, with each truth bin populating several reco bins to different finite degrees.
Note that we have avoided saying that R should be diagonal: while this is often seen,
the concept introduces an assumption of equal bin numbers at truth and reco levels,
N = M, and also the idea that bins with adjacent i or j indices are necessarily adjacent
11-22
Practical Collider Physics
or at least close in the observable space. There is no such requirement for unfolding to
work: as framed here, there is no longer a concept of the observables themselves, just
the discrete bin indices that they induce. This is an important realisation for e.g.
unfolding observables with more than one binning dimension, since it is impossible to
create a linear index that captures the multidimensional nature of proximity.
It is freeing to realise that proximity is overrated in this situation: the dimensionality
of histograms is a presentational detail, and our interest is purely in the bin indices. As
long as bins in any dimensionality can be ordered—in any deterministic way—and
hence assigned unique indices, the mapping between their indices in two distinct spaces
can be defined and assigned a probabilistic structure. The response matrix for such
observables will certainly not have a ‘diagonal’ response matrix as usually understood,
but this is no indication that they cannot be unfolded. Even in the one-dimensional
case, diagonality is too strict a condition: as an extreme example, any response matrix
with exactly one entry per row (i.e. for each reco bin) can be exactly unfolded, even
though the layout may appear far from diagonal. The logical conclusion of this
thinking is that there is no need for the truth and reco observables to even be computed
using related definitions: an unfolding’s power is determined purely by the degree of
correlation or mutual information as expressed through the response matrix.
So the response matrix is a key quantity for assessing the degree to which detector
effects migrate observable contributions between bins. How can we obtain an
estimate of it? R ij can be estimated within an analysis by use of MC simulation event
samples, as these provide us with event-by-event pairs of (i , j ), i.e. the mappings
between bin-fills at truth and reco levels. Running over an MC event sample, we can
assemble a 2D histogram in (i , j ) known as the migration matrix:
where the Θ functions indicate whether the truth and reco observable values for event n
fall into the jth and ith bins in their respective space, and wn is the event weight
(possibly including pre-unfolding efficiency-correction factors). The migration matrix is
proportional to the joint probability p(i , j ) of populating reco-bin i and truth-bin j, and
indeed we will assume from here that it has been normalised from its raw form such
that ∑i,j Mij = 1. A crucial feature is that M encodes not just the way in which detector
effects split a given truth-level contribution into multiple reco bins, but also the truth-
level distribution across the bins. We cannot allow this assumption, dependent on the
event generator physics, to bias fiducial observable extractions from data, which is why
we instead work in terms of the response matrix, calculated via
Rij = Mij ∑ Mij . (11.24)
i
11-23
Practical Collider Physics
This should not be taken too literally as an algebraic statement. In particular, our
insistence that the numbers of bins N and M do not necessarily need to match
precludes a literal matrix inversion, although there is a simple extension to deal with
this situation. However, it is reasonable to note from an information perspective that
we cannot reliably ‘invert’ from a small number of reco bins to a large number of
truth-level ones—at least not without introducing a significant model dependence to
interpolate the fine-resolution details, which would be a betrayal of our no-
extrapolation fiducial principles! While rarely followed up in practice, one could
potentially unfold from one observable—possibly multi-dimensional—at reco level
to a different but correlated observable at truth level provided there is sufficient
information in the reco-level input.
In the following sections we explore, in roughly increasing complexity, how
various real-world unfolding schemes attempt to generalise this inversion concept
into a form able to cope better with numerical instabilities and to acknowledge
explicitly the statistical limitations and probabilistic nature of the inference.
11-24
Practical Collider Physics
dividing one by the other. The inversion then becomes a matter of multiplying each
data value by the reciprocal of its corresponding bin-response factor,
f˜i = di / R
˜ i. (11.28)
This approximation is clearly simplistic, but is also undeniably easy to perform and
debug, and as such has frequently been used in practice for ‘quick and dirty’
unfoldings where the aim is less a precision result than to publish some indication of
what is going on in a certain important phase-space in lieu of a longer-timescale
detailed study: two examples are the time-sensitive first measurements of minimum-
bias QCD distributions at LHC start-up and in particular from the first pp runs
above the Tevatron’s 1960 GeV energy frontier, and the relatively new development
of roughly unfolded distributions in control regions as a valuable side-effect of reco-
level BSM search analyses. The main requirement for using bin-by-bin unfolding is
that bin-migrations are minimal, or at least below the statistical and systematic
uncertainties of the measurement: this tends to mean that regrettably large bins have
to be used, but this is better than no measurement at all.
The other approach is to attempt to numerically regularise the matrix inversion.
Many methods exist for matrix-inverse regularisation in general, of which the most
famous is Tikhonov regularisation, adding positive numbers to the matrix diagonal
to reduce its condition number. In HEP unfolding, the most well-known regularisa-
tion scheme is the SVD unfolding method, again making use of the singular-value
decomposition, but in a different way. The approach used is rather involved, but its
key elements are
• the numerical instability in algebraic inversion of the response matrix can be
traced directly to the appearance of large off-diagonal response terms (or
their n ≠ m equivalents);
• the folding equation (11.21) can be rewritten as a least-squares problem, in
which the quantity to be minimised (by finding values t˜j ) is
y = ∑i (∑j R ijt˜j − ri )2 /Δri , where Δri is the uncertainty on ri;
• the numerical inversion instability can be regularised by adding a constraint
term y → y + τ Δy with Δy ∼ τ ∑i (∑j Cijt˜j )2 . Here C is a matrix,
⎛− 1 1 0 0 ⋯⎞
⎜ 1 − 2 1 0 ⋯⎟
C=⎜ ⎟ (11.29)
⎜ 0 1 − 2 1 ⋯⎟
⎝ ⋮ ⋮ ⋱⎠
which ensures that Δy is the sum of squares of numerical second-derivatives
across the unfolded t˜i spectrum: its inclusion in the minimisation term y hence
suppresses high-curvature oscillating solutions t˜i , corresponding to a physical
requirement of smoothness in the extracted spectrum;
• minimisation of this least-squares system requires calculation of C −1: this can
itself be regularised by the Tikhonov method;
• finally, the size of the regularisation term is governed by the τ parameter, and
the optimal value of this can be obtained by study of the singular values
11-25
Practical Collider Physics
(the diagonal terms in the SVD Σ matrix), which approach zero to suppress
the high-frequency solutions: for a rank-k system, in which only the first k
contributions are significant, the optimal τ = Σ 2kk .
This method also permits full propagation of the effective covariance matrix due to
the unfolding process. The SVD unfolding has, for reasons that are not entirely
clear, been largely eclipsed by the ‘iterative Bayes’ method to be described in the next
section; however, it is more explicitly about numerical well-behavedness, as
compared to the other’s more probabilistic motivation, and it is worth being aware
of its existence and the approaches used within it, should other methods prove
numerically troublesome in practice.
Iterative Bayes
An alternative approach to the regularisation problem is that of iterative Bayes
unfolding (IBU), for some time the most popular approach to detector-effect
correction at particle colliders. The core motivation for this method is complemen-
tary to that of SVD unfolding: its concern is less the numerical instabilities and more
that the linear algebra approach to building the map from the space of reco-level
observables to truth-level observables focuses on the asymptotic statistical limit in
which the mean bin-migration effects encoded in the response matrix deterministi-
cally connect truth-level fill values to their reco-level counterparts.
The IBU picture—unsurprisingly—uses Bayes’ theorem to express the probability of
the fiducial value vector t as a function of the reco values r and the response matrix R ,
P(r∣t , R) · P(t )
P(t∣r , R) = .
∑P(r∣t , R) · P(t ) (11.30)
t
The left-hand side here is not quite what we want, as it holds a dependence on the
response matrix, which in general is not perfectly known—a conceptual refinement
of some importance from here on. To eliminate this dependence, much as the space
of t configurations is eliminated in the denominator of equation (11.30), this result
must be integrated over the space of response matrix configurations with its own
prior probability density,
Limiting to the case of a binned differential distribution, and unit ‘fills’ of its bins
from the events being processed (including out-of-range overflow bins), the truth-to-reco
probabilistic map for a given truth bin containing tj fills is not a continuous Poisson but
a discrete multinomial distribution, with j → i index-mapping probabilities R ij ,
t j!
P(r∣tj , R) = ∏ R ijri.
∏ ri ! (11.32)
i
i
11-26
Practical Collider Physics
The desired likelihood term (the RHS numerator in equation (11.30)) is the sum of
such multinomial distributions over all truth bins j. Were we in the statistical limit
where smooth Poisson distributions could be used, the sum of multiple Poissons
would result in yet another Poisson, amenable to further analytic simplification, but
this is not the case for the discrete multinomial and so analytic attempts to define the
unfolding for finite fill-statistics run aground. The currently used version of the IBU
algorithm (as opposed to the first version) uses sampling from these multinomial
distributions via a Dirichlet distribution (their unbiased Jeffreys prior) to go beyond
the infinite-statistics assumptions of previous methods.
The other defining feature of the IBU method is of course its introduction of the
prior over truth spectrum values. Setting this term to unity produces the maximum-
likelihood estimators of the fiducial values, but implicitly assumes a flat spectrum—a
bias which leaves some residual effect in the final results. The iterative algorithm
essentially implements an intuitive procedure to reduce this bias: if one were to make
a likelihood-based inversion by hand, with a flat prior on one set of data, the
resulting t˜ would be the new best estimate of the real spectrum. One could then
(ideally on a new and independent dataset, to avoid reinforcing statistical fluctua-
tions), use that estimated t˜ as the prior in a second unfolding, and so on. In practice,
the algorithm recycles the same data several times to iterate its prior estimate, which
leads to instabilities: it is important in IBU to check the dependence of the result on
the number of iterations used, a rather unsatisfying feature and one of many cross-
checks required by the method (see Section 11.3.2).
The final step in IBU is the integral over response matrix uncertainties, which is
performed by sampling. Systematic uncertainties are fed in, incoherently with respect
to the response matrix variations which are fed by the same sources, by explicit use
of ‘1σ variation’ r inputs.
IBU is a very popular method, in large part due to its readily available
implementation in software packages such as RooUnfold and pyunfold, and has
been used for the majority of LHC unfolded analyses in Runs 1 and 2. But it is not
beyond criticism: as the iteration process is statistically ad hoc, it is more ‘Bayes-
inspired’ than fully Bayesian, and its regularising effect is semi-incidental by contrast
with the explicit treatment in the SVD method. (The original code applied a further
ad hoc regularisation via polynomial-fit smoothing in the prior iteration, but this is
not enabled by default in the more commonly used RooUnfold implementation.)
There are also instabilities inherent to recycling the input data through multiple
iterations, hence in practice it is not uncommon to see very small numbers of
iterations, Niter, in use, and without a clear, probabilistic way to treat the a priori
Niter as a nuisance parameter with its own probability mass function. Finally,
current implementations propagate systematics point-wise rather than holistically
mapping the full likelihood function—a limitation for re-interpretations as analysis
data becomes more precise, and correlations become limiting factors. In the
following section we move to unfolding methods which embrace modern computing
power to more completely and probabilistically map the full, high-dimensional pdf
of fiducial cross-sections and their systematic uncertainties.
11-27
Practical Collider Physics
5
In particular, more detailed likelihood information permits more coherent combination of many analyses into
a composite likelihood for post hoc ‘re-interpretation’ studies—see Chapter 12.
11-28
Practical Collider Physics
Now we introduce the t → r map, using as before the response matrix R specific to
the analysis phase-space and the observable’s truth and reco binnings. We are only
bothered about making this map for the signal process, however, so can separate
contributions to λ(θ ) into the integrated luminosity (and its systematic uncertainty)
L(ν ) and the reconstruction-level background and signal cross-sections bi and si, and
further deconstruct the signal portion into the set of signal-level bin cross-sections
σ ∈ ϕ and the response matrix:
λi (θ ) = L(ν ) · (bi (ν ) + si (θ )) (11.35)
⎛ ⎞
= L(ν ) · ⎜⎜bi (ν ) + R
∑ ij ( ν ) · σ ⎟
j⎟. (11.36)
⎝ j ⎠
An important feature of this encoding is that the response matrix is made dependent
on the nuisance parameters ν, i.e. the uncertainty in the response structure can be
smoothly encoded as a function of the nuisances, usually by use of interpolated MC
templates as for the background modelling b(ν ). In some circumstances, this may be
11-29
Practical Collider Physics
overkill and the nominal response matrix can be used instead, with just a ‘diagonal’
modification of signal responses to nuisance-parameter variations,
⎛ ⎡ ⎤⎡ ⎤⎞
λi (θ ) ≈ L(ν ) · ⎜bi (ν ) + ⎢1 + ∑ i 0,i i ⎢∑ ij 0 j ⎥⎥⎟⎟,
( ν − ν ) δ ⎥ ⎢ R ( ν ) · σ (11.37)
⎜ ⎢⎣ ⎥⎦⎣ j
⎝ ν∈ ν ⎦⎠
where ν0 are the nominal nuisance parameter values (usually encoded as ν = 0), and
the δi are the (MC-estimated) reco-level multiplicative modifications to bin values
induced by the usual 1σ systematic variations corresponding to νi = ±1.
Now we have a posterior function, and for fixed data D this expression can be
evaluated for a given parameter set θ . The prior π(θ ) (and its penalty-term
equivalent in likelihood-based unfolding) is most usually set to a collection of unit
Gaussians in the nuisance parameters ν,6 and a uniform or Jeffreys prior7 in the case
of the POIs ϕ, where biasing is to be avoided.
You may be wondering about the difference between the FBU approach, and the
frequentist likelihood-based unfolding approach. In the frequentist approach too,
standard Gaussian penalty terms are added to the likelihood to stop the nuisances
moving too far from their nominal values. As with their equivalents in profile fits for
Higgs and BSM searches, these occupy a vague space between frequentist and
Bayesian approaches; the remaining distinction between FBU and likelihood-fit
unfolding is hence largely in whether the nuisances are to be marginalised over
(FBU) or profiled out (likelihood), and whether CoIs or CrIs are used to construct the
resulting unfolded nominal bin-values and error bands. These treatments are standard
and follow the procedures described in Chapter 5. The practical distinction, for many
purposes, is the convergence time of the fit or scan, which tends to be much faster for
the profiling approach: it is easier, especially given a reasonable sampling proposal
model, to find a global maximum and map its functional vicinity, than to compre-
hensively integrate over the entire typical set of the high-dimensional pdf.
6
This is a major assumption about the statistical role of the template variations from object-calibration
prescriptions, particularly for ad hoc two-point uncertainties such as alternative MC generator choices, but a
common one nonetheless.
7
The maximally uninformative prior for a Poisson statistical model is p(λ ) ∼ 1/ λ .
11-30
Practical Collider Physics
where the ‘(n)’ index indicates how many times the HBOM D function has been applied8.
The track-jet finding is then run, giving a new set of track-jets {ji(1) }; we can view this as a
collective—and non-linear—track-jet detector function Dj. From here, a new set of ‘reco-
squared’ binned observables can be computed, ri(1) to complement the standard reco-level
ri(0). If we wanted we could linearly extrapolate this effect backward to the ri(−1)
configuration, i.e. before the real detector effect and hence an estimate of the fiducial values:
(
ri(−1) ≈ ri(0) − ri(1) − ri(0) ) (11.39)
But the non-linearity gives pause for thought, and so we apply the D functions again
on the whole event to give the second, third, nth etc. HBOM iterations, denoted by
the (n ) index:
t can be ‘null’, due to the sampling deciding that track does not get reconstructed this time: this is more
8 (1)
˜
easily handled in source code than in mathematical formalism.
11-31
Practical Collider Physics
n
ji(n) = (D j) · ji(0) . (11.41)
In practice this should be done with full statistical independence for each j (n), rather
than naïve iteration j (m+1) = Dj · j (m). The resulting set of ri(n), for every bin i, is then
a non-linear series of iterated detector effects on the observable, which can easily be
fit with a polynomial and extrapolated back to the n = −1 iteration. This is the core
of the HBOM method, which has been applied with success and minimal simulation-
dependence in several LHC analyses, especially for non-perturbative and low- pT
physics where the MC modelling is not expected to be robust, such as in the
minimum-bias particle-correlation analyses shown in Figure 11.6.
Figure 11.6. Use of the HBOM method for unfolding of 2-particle correlation data (the original HBOM
application, top), and azimuthal ordering of hadron chains (bottom). Both analyses were studies of
phenomena unimplemented or incompletely implemented in MC event generators, mandating a data-driven
unfolding method. Reproduced with permission from Monk J and Oropeza-Barrera C Nucl. Instrum. Meth. A
701 17–24 (2017), arXiv:1111.4896; and ATLAS Collaboration Phys. Rev. D 96 092008 (2017),
arXiv:1709.07384.
11-32
Practical Collider Physics
11-33
Practical Collider Physics
correlations, most methods have to extract this by the ‘toy MC’ method of sampling
from the nuisance parameters (e.g. again using ad hoc Gaussian distributions) and
using a template-interpolation between nominal and 1σ unfolding results. For the
pdf methods, this is again ‘built in’ to the resulting distribution, with the covariance
obtainable directly from the pdf samples: this can be cut down to the ϕ covariance
sub-matrix to represent overall bin-to-bin correlations.
Unfolding uncertainties are the remaining issue to deal with. These are of course
dependent on the method, with more complex methods providing more explicit
handles for evaluating uncertainty, e.g.
SVD unfolding: dependence on the number of singular values considered
statistically significant when choosing the regularisation scale;
Iterative Bayes: dependence on the number of iterations used to obtain the final
prior, MC inputs to the prior determination, and dependence on the regularisa-
tion technique;
FBU: dependence on choices of priors, particularly on the parameters of interest;
HBOM: ambiguities in both the maximum number of detector-function
iterations, and the fitting method used for extrapolation. Use of ensembles of
HBOM ‘histories’ can help to assess the stability and uncertainty of the result
extraction;
All but HBOM: uncertainties from MC statistics, particularly in stably
populating all bins of the migration matrix—this is likely to be impossible in
the lowest-probability bins, but fortunately these by construction contribute to
very subleading migration modes: the most important limitations are in the
leading migrations.
These require detailed checking in an analysis, which can be a difficult and time-
consuming process, especially as the number of cross-checks requested in both
internal and external review can be large. In addition, it is normal to perform further
standard MC-based ‘stress tests’ and ‘closure tests’ to verify that the method should
be performing adequately for the chosen observable. These typically include:
• showing that particle-level predictions from MC model A will be reobtained
to within statistical precision when unfolding reco-level A with MC prior A;
• showing that closure remains when unfolding reco-level A using prior model B;
• reweighting the prior model such that its forward-folding approximately
matches the reco-level data.
Developments in unfolding
Unfolding techniques are an ever-evolving area of technical development, spurred by
the open-endedness of the topic: the incompleteness of the information means there
are myriad approaches that can be taken; observables can always be made
11-34
Practical Collider Physics
Further reading
• Truth definitions for fiducial analysis at the LHC are publicly discussed in
ATL-PHYS-PUB-2015-013, https://cds.cern.ch/record/2022743/.
• Not many public resources exist for review of top-quark reconstruction
methods. One of the few is Kvita J, Nucl. Instrum. Meth. A 900 84–100 (2018),
arXiv:1806.05463, which presents a comparison of optimisation targets in
kinematic fits.
• The SVD unfolding method is described in Hoecker A and Kartvelishvili V,
SVD approach to data unfolding, Nucl. Instrum. Meth. A 372 469–81 (1996),
arXiv:9509307. The updated form of the IBU unfolding method is described
and discussed in D’Agostini G, Improved iterative Bayesian unfolding,
arXiv:1010.0632 (2010), and Choudalakis G, Fully Bayesian unfolding,
arXiv:1201.4612 (2012).
Exercises
11.1 Which of the following need to be included in the list of fiducial cuts for an SM
analysis? a) primary vertex position, b) displaced vertex cuts, if any, c) b-hadron
secondary-vertex positions in flavour-tagging, d) identification of jet constitu-
ents, e) requirements on electron hits on inner-tracker silicon, f) photon
promptness, g) which visible objects are used to define the MET vector.
11.2 Let’s imagine a BSM physics effect manifests in top-quark events,
modifying the kinematics such that reconstruction methods tuned for
the SM top are less efficient than expected. If placing statistical limits on
11-35
Practical Collider Physics
11-36
IOP Publishing
Chapter 12
Analysis preservation and reinterpretation
We have now completed our review of the searches and measurements that are
performed at hadron colliders, along with the complicated theoretical, experimental
and computational knowledge that underpins the Large Hadron Collider (LHC)
programme. We have seen that precision measurements of Standard Model (SM)
quantities allow us to search indirectly for physics beyond the Standard Model
(BSM), whilst resonance and semi-invisible searches allow us to search directly for
new particles and interactions.
What we have not yet considered is how to take the results of these various LHC
analyses, and use them to learn something about the theories we introduced in
Chapter 4. To take a concrete example, we have not yet uncovered any evidence for the
existence of supersymmetry at the LHC. This must therefore tell us something about
which particular combinations of masses and couplings of the superpartners are now
disfavoured. To take a second example, we have also not seen any new resonances
beyond the Standard Model Higgs boson, which must place constraints on any theory
that involves new resonances within a suitable mass range, such as composite Higgs
theories, or two Higgs doublet models. For a third example, consider the fact that we
have also not made any measurements of SM quantities that differ significantly from
their predicted values. This imposes non-trivial constraints on the parameters of BSM
theories that are able to provide sizable loop-corrections to SM processes.
In all of these cases, we need to understand how to take LHC data that was
developed in a very specific context, such as a search optimised on a particular
assumed theory, and reinterpret it in a generic theory of interest. Note that all LHC
results might in principle be relevant to a theory we are interested in, and we should
take care to consider any results that we think might be relevant. In particular, this
means that precision measurements can be as useful as searches for new particles for
constraining the parameter spaces of new theories.
Accurate reinterpretation is a crucial topic for both theorists and experimental-
ists. The former must learn how to rigorously make use of published LHC data,
whilst the latter must learn how to present their data in a way that is most useful for
theorists, and how to design future experimental searches that use a full knowledge
of which areas of a theory are no longer viable.
1
Other parameters exist, but will not typically affect measurement and search results at the LHC, so we can
forget about them.
12-2
Practical Collider Physics
Figure 12.1. Two examples of simplified model presentations of null results in supersymmetry searches. CMS
Collaboration, https://twiki.cern.ch/twiki/bin/view/CMSPublic/SUSYSummary2017, Copyright CERN, reused
with permission.
12-3
Practical Collider Physics
12-4
Practical Collider Physics
or a log-normal form,
⎡ 2⎤
1 1 1 ⎛ ln ξ ⎞ ⎥
P(ξ∣σξ ) = exp⎢ − ⎜ ⎟ . (12.3)
2π σξ ξ ⎢⎣ 2 ⎝ σξ ⎠ ⎥⎦
If the search has a number of signal regions rather than a single region, one can
either define the likelihood using the signal region with the best expected exclusion at
each point or, if covariance information is available, one can define a likelihood
which correctly accounts for signal region correlations. Alternatively, one can use
the likelihood formation supplied by the ATLAS and CMS experiments where
available. In cases where signal regions are obviously not overlapping (and one can
neglect correlated systematic uncertainties), one can combine the likelihoods for
each signal region simply by multiplying them together. This then gives the
composite likelihood that will be our measure of the model viability at each
parameter point. Although we have discussed semi-invisible particle searches in
the above example, broadly similar principles apply in the case of resonance searches
and precision measurements. One must simply add the relevant likelihood terms to
the composite likehood.
Given this likelihood for a single model point, we must now consider how to
determine which regions of the parameter space as a whole remain viable. The short
2
It is approximate because it does not contain all of the nuisance parameters of the experiment, nor does it
contain the details of the data-driven background estimates over the various control regions. It also uses a
single rescaling parameter to account for the signal and background systematics, rather than performing a 2D
integration.
12-5
Practical Collider Physics
answer is that we must use a valid statistical framework to find the regions of the
parameter space that lie within a specific confidence region. For frequentists, this
means that we must find the parameters that maximise our composite likelihood,
and define confidence intervals around that maximum. If we want to plot the shapes
of these intervals for each single parameter, or for 2D parameter planes, we can
make use of the profile likelihood construction. Bayesians must instead define a
suitable prior on our model parameters, then use the likelihood to define a posterior
that is mapped via sampling. The set of posterior samples can then be used to plot
1D and 2D marginalised posterior distributions. Both of these approaches were
covered in Chapter 5.
12-6
Practical Collider Physics
• Files that describe the physics models used for the results in the analysis
paper. These are usually released in a standard format called SLHA, and are
made available through the HepData database. The advantage of releasing
SLHA files is that theorists can debug their Monte Carlo simulations on the
precise models utilised by the experiments; any attempt at describing the
models in the analysis papers is typically sufficiently ambiguous to leave open
questions.
• Tables of the number of simulated signal events passing each successive
kinematic selection of the analysis, derived using the actual ATLAS and
CMS generation, simulation and reconstruction chain. These cutflow tables
allow theorists to check that they get similar numbers using their own
simulations, and to hunt for bugs until they see closer agreement.
Typically, a theorist can expect to see an agreement of the final signal region
yields at the level of 20% or so.
12-7
Practical Collider Physics
• Smearing functions and efficiencies, so that theorists can ensure that the
parameters of their detector simulation are up-to-date. Although generic
trigger and reconstruction efficiencies can be found for both ATLAS and
CMS in a variety of detector performance papers, they are often superceded
by the time an analysis comes out, and the original references are rarely
available in electronic form. Useful public information currently includes
analysis-specific object reconstruction efficiencies as a function of pT and η,
official configuration files for public detector-simulation packages such as
Delphes, and parametrisations of resolution function parameters for the
invariant-mass variables used in resonance searches. Many long-lived particle
searches also provide analysis-specific efficiencies, without which external
reinterpretation would be very difficult indeed.
12-8
Practical Collider Physics
rigorous. More rigorous are the rare cases where experiments publish a correlation
matrix alongside the simplified model results, which allows theorists to calculate an
approximate likelihood.
12-9
Practical Collider Physics
developing many of the fiducial analysis ideas discussed in Section 11.1. Rivet also
includes code routines corresponding to a significant fraction of suitable LHC
experimental analyses, mostly unfolded but also several making use of smearing-based
fast-simulation. These routines are largely provided by the LHC experiments them-
selves, each of which operates a Rivet-based analysis preservation programme for
fiducial measurements: if you work on a measurement analysis, you should expect to
write a Rivet analysis code to accompany the submission of the numerical results to
HepData. The Rivet platform is further used in applications for MC generator tuning
and development with the Professor and other optimisation toolkits, and in BSM
model interpretations using a package called Contur.
where σi is the Higgs production cross-section starting from the state i, Bf is the
branching ratio to the final state f, and the final two quantities have been defined as
the ratio of production cross-sections and ratio of branching ratios, respectively.
Disentangling the production (μi ) and decays (μf ) can be performed if one supplies
the additional assumption of SM branching ratios in order to extract μi , or of SM
production cross-sections in order to extract μf .
12-10
Practical Collider Physics
Figure 12.2. Feynman diagram for Higgs boson production via gluon–gluon fusion, with subsequent decay to
W bosons (one of which must be off-shell).
which is nothing more than the statement that the total Higgs width has changed
because the partial widths have changed. Why have we dressed the SM cross-
sections and partial widths, rather than simply parameterising the couplings
directly? The answer is that it allows us to retain the current best knowledge for
the Higgs cross-sections, which include higher-order quantum chromodynamics and
electroweak (EW) corrections. The κi correspond to additional leading-order degrees
of freedom whose values tell us whether the observed Higgs couplings are compat-
ible with SM predictions or not and, if not, which couplings show hints of BSM
physics. Such a result would then motivate tests of explicit models of BSM physics in
order to find which best matches the observed data.
To get a better idea of what the κi represent, consider the Feynman diagram for
Higgs boson production via gluon–gluon fusion, with subsequent decay to W
bosons, shown in Figure 12.2. Since the total mass of two W bosons exceeds
125 GeV, one of these W bosons must be off-shell. For the decay vertex, we are
assuming that the vertex factor gets modified from the SM value by a factor κW , such
that the branching fraction scales as κW2 . For the production, things are more
complicated since there is no direct Higgs-gluon vertex. The lowest-order diagram
instead involves a loop, and the ability of different particles to run in this loop means
that a variety of κi factors are in principle relevant to the modification of the Higgs
production cross-section in this case. We have two choices for how to proceed. The
first is to simply replace the loop with an effective vertex, and treat the gluon fusion
production cross-section as if it is modified by a single effective factor κg . The second
is to use the known combination of diagrams in the SM alongside the κi factors for
12-11
Practical Collider Physics
each diagram to calculate a resolved scaling-factor. For the present case, this can be
shown to be κg ≈ 1.06 · κt2 + 0.01 · κ b2 − 0.07 · κtκb . Whether to use an effective or
resolved scaling factor depends on how many parameters a physicist wants to test
against the Higgs data. It might be worth dropping multiple κi factors if they are
poorly constrained by the data, since a lower-dimensional space requires less
computational effort to explore. A full list of the resolved scaling factors for
different production and decay modes is given in table 12.1.
Given a choice of which κi factors to include, experimental results can then be
presented via a global fit of the set of κi parameters.
12-12
Practical Collider Physics
1. they are defined inclusively in Higgs decay channels and kinematics (up to an
overall rapidity-acceptance cut on the Higgs itself ); and
2. they are specific to the Higgs production mode, and to the topology and
kinematics of the associated physics-objects.
The first of these properties enables combination of multiple Higgs decay channels, giving
the necessary statistical stability to unfold to the common set of ‘STXS bins’ in the
production mode and event topology introduced by the second property. The simplifi-
cations of STXS—in integrating over all decay modes and in unfolding to parton-level,
only indirectly observable physics objects—permit the use of MVA techniques such that
the STXS-bin cross-sections can be mapped via a likelihood fit or similar techniques.
The STXS bins are standardised across the HEP community by the LHC Higgs
Cross-section Working Group, such that different experiments have a common set
of ‘semi-fiducial’ phase-spaces in which measurements can be compared and
combined. The schematic design of the STXS bin-set is shown in Figure 12.3, where
the full set of production+decay channels is decomposed into first the ggF, VBF,
and VH production modes; then into sub-bins in jet-multiplicity, pT (V ) etc; and as
events accumulate and finer sub-divisions become possible, further nested bins. The
initial set of STXS bins was denoted ‘Stage 1’, followed by a ‘Stage 1.1’ refinement,
with anticipation of later ‘Stage 2’, etc—eventually, it will become possible to drop
the simplification and perform even higher-precision Higgs interpretations via a
holistic set of fiducial cross-section measurements.
12-13
Practical Collider Physics
Figure 12.3. Schematic operation of the STXS framework for Higgs-boson production interpretations. The
combination of the many different Higgs-boson decay modes is shown on the left-hand side, feeding into a set of
standardised bins in production topology and kinematics, which become more deeply nested as data statistics permit.
currently comes with a substantial CPU and person-hour cost, it may be possible in
future to use such systems for complex phenomenological studies. In the mean-
time, the system offers great utility for members of the ATLAS and CMS
collaborations who wish to add value to their analyses by studying new theoretical
models.
An even more extreme way to make ATLAS and CMS results fully
interpretable by theorists is to release the actual data itself, and provide theorists
with tools for analysing it. This is occurring through the CERN Open Data Portal,
and it is important to note that it is just as important to release the ATLAS and
CMS simulations of SM background events as it is to release their recorded proton–
proton collision data. At the time of writing, the CMS experiment is leading the
charge, having promised to make 100% of data available within 10 years of it first
being recorded. ATLAS has made data available primarily for outreach and
education projects, but it can be reasonably expected that more serious efforts will
follow within the next few years.
One should never be naïve when starting an Open Data project. Extensive
documentation is required to understand the content of the data, and how to
reconstruct and use physics objects. This is much more natural for an experimen-
talist than a theorist, and there is currently a significant effort to create working
analysis examples and simplified data formats for those not trained in the requisite
techniques. Even if a physicist has the technical skills to process the data, the sheer
12-14
Practical Collider Physics
size of the datasets means that a realistic analysis will require serious computing
power. With cloud computing becoming ever more affordable, however, perform-
ing physics analyses on real LHC data is becoming a realistic option for
phenomenologists, mirroring developments in other fields such as gamma ray
astronomy.
Further reading
• The HepData database can be found at https://www.hepdata.net/.
• Various parts of this chapter are based on the excellent report ‘Reinterpretation
of LHC Results for New Physics: Status and Recommendations after Run 2’,
published in SciPost Phys. 9 022 (2020). This reference contains links to many
public software tools for the reinterpretation of LHC results.
• Public codes for reinterpreting LHC analyses in the context of new theories
include GAMBIT (https://gambit.hepforge.org), Rivet (https://rivet.hepforge.
org) and MadAnalysis (https://launchpad.net/madanalysis5).
• The κ framework for parameterising Higgs production and decay modes is
described in arXiv:1307.1347.
• The STXS framework was originally introduced in the proceedings of the Les
Houches Physics at the Terascale 2015 workshop (arXiv:1605.04692), and a
report from the LHC Higgs Cross-section Working Group (arXiv:1610.07922).
The Stage 1.1 STXS report (arXiv:1906.02754) also serves as a useful
introduction, with the perspective of several years of Stage 1 operation.
Exercises
12.1 Prove equation (12.5).
12.2 Explain why the exclusion limits in the right-hand plot of Figure 12.1
change with the assumed branching ratios of the sparticles.
12.3 Write a more general form of equation (12.1) that is suitable for the case of
two signal regions, with a 2 × 2 covariance matrix.
12.4 Explain how a cutflow table could be used to debug a generation and
simulation software chain, assuming that one had the SLHA file for the
physics model of interest.
12.5 If correlation information between analyses is not explicitly provided,
which of these sets of analyses can be assumed safe to combine as if
independent?
(a) Any measurements/searches at different experiments,
(b) any measurements/searches at different s energies,
(c) same experiment and energy analyses with orthogonal cuts,
(d) same experiment and energy analyses with overlapping cuts, one
much tighter than the other?
12-15
Practical Collider Physics
12.6 SMEFT model spaces include all possible admixtures of BSM operators,
including many configurations which are extremely hard to create through
explicit BSM models. With this in mind, are marginalised/profiled EFT
constraints always better than single-parameter fits? Can you think of a
Bayesian way to better reflect the mixture of top-down and bottom-up
constraints?
12.7 The Contur package for interpretation of fiducial measurements makes a
default assumption that the measured data points represent the SM, with
BSM effects treated additively. What does this imply about use of the
method for BSM-model
(a) discovery,
(b) exclusion?
12-16
IOP Publishing
Chapter 13
Outlook
Figure 13.2. The expected LHC luminosity shown as a function of year. Red dots show the peak luminosity
(which reflect the past or anticipated operating conditions of the LHC), whilst the blue solid line shows the past
or expected integrated luminosity. Credit: https://lhc-commissioning.web.cern.ch/schedule/images/LHC-nom-
lumi-proj-with-ions.png.
run period starting in 2027. This is called the high-luminosity LHC, or HL-LHC. These
machine conditions will persist for the remaining LHC run periods scheduled up to
2039, with the final aim of collecting approximately 50 times more data than has
currently been obtained. The accumulation of integrated luminosity out to 2039 is
shown in Figure 13.2, which makes the impact of the HL-LHC very clear. Making the
most of this deluge of data, however, will require not just development of new detector
elements with the necessary resolutions, readout rates and radiation hardness, but also
major reworkings of the computational data processing chain, both for collision data
and simulation. The scale of this challenge is clear from the comparison of computing
resources to requirements in Figure 13.3.
Note, however, that the accumulation of integrated luminosity is relatively
modest before then, which is often a cause of pessimism in the high-energy physics
13-2
Practical Collider Physics
Figure 13.3. The ATLAS computing model’s comparison of CPU and disk capacity to expected demand,
looking forward from Run 3 to the HL-LHC runs. The need for intense technical development through the
simulation and data-processing chain is starkly clear. Credit: ATLAS HL-LHC Computing Conceptual
Design Report, CERN-LHCC-2020-015, https://cds.cern.ch/record/2729668.
community. It is often assumed, for example, that the lack of evidence for beyond-
the Standard Model (BSM) physics in the LHC Run 1 and Run 2 datasets implies
that the mass of new particles is high enough to guarantee a relatively small cross-
section for their production. One therefore needs to accumulate a lot more data in
order to find evidence for these particles, and the relatively slow increase in the
amount of data in the next few years is unlikely to produce a dramatic discovery.
Against this, however, is the fact that it is known that many searches for new
particles contain substantial gaps in their reach even within the Run 1 and Run 2 data
(see, for example, Section 10.1.2), and that we need cleverer, and more model-
independent analysis techniques to close these gaps. We therefore remain confident
that it is all still to play for at the LHC, and that commencing graduate students can
expect exciting discoveries at any time provided they retain a commitment to
innovation.
One obvious tool—or rather a whole suite of tools—for such innovation is
statistical machine learning. Techniques under this umbrella, such as deep neural
networks, are huge subjects in their own right, and are evolving so rapidly that it is not
currently possible to cover them in this volume without both inflating its size and
ensuring rapid obsolescence. But the reader will undoubtedly be aware of the potential
of such methods for addressing thorny problems, from quantum chromodynamics
(QCD) calculations to detector simulation, to smoothing and interpolation of back-
ground estimates, and directly in statistical inference from measurements. They are no
‘silver bullet’ with guaranteed easy wins, but in combination with solid physical
foundations it is likely that the future of collider physics will heavily rest on novel
applications of such techniques.
13-3
Practical Collider Physics
collider for discovery (e.g. the Super Proton Synchrotron at CERN) and a lepton
collider for precision (e.g. the Large Electron Positron collider at CERN), before
swinging back again to hadron colliders (e.g. the Tevatron at Fermilab and the Large
Hadron Collider at CERN).
This would suggest that the next wave of collider proposals would be biased
towards lepton colliders, and indeed this is currently the case. The argument for
precision machines stems ultimately from the discovery of the Higgs boson. Much as
the experiments of the Large Electron Positron collider made precision measure-
ments of the W and Z bosons after their discovery by the Super Proton Synchrotron
experiments, future lepton colliders could make precise measurements of the Higgs
boson. By the end of the HL-LHC run, the ATLAS and CMS experiments will be
able to study the main Higgs boson properties with a precision of approximately
5%–10%, limited by the large QCD background and the expected rates of pile-up.
A proposed circular collider in China—the Circular Electron Positron Collider
(CEPC)—would collide electrons and positrons at a centre-of-mass energy of
240 GeV, which represents the optimum energy for the production of a Z boson
with a Higgs boson. The same collider could run at 160 GeV to produce pairs of
W bosons and 91 GeV to produce Z boson events, and the tunnel could be used for a
later hadron collider. The experiment could be operating by 2030, which is the most
aggressive timescale of the current proposals, overlapping substantially with the
running of the HL-LHC.
A competing European proposal, the Future Circular Collider at CERN, would
involve the construction of a new 100 km circumference tunnel at CERN, that
would first host an electron–positron collider running at energies up to 240 GeV,
then on to 350 GeV to 365 GeV by 2045 or so. Again, the tunnel can be used for a
later hadron collider, which in this case is proposed to have a centre-of-mass energy
of 100 TeV. The catch is that, even if funded, the hadron collider would not
realistically start taking data until around 2050 at the earliest.
Competing with these circular collider proposals are linear colliders, which, by
virtue of not suffering from the synchrotron-radiation problem that affects circular
colliders, are able to target higher centre-of-mass energies. Current proposals include
the International Linear Collider (ILC)—that would commence at a centre-of-mass
energy of 250 GeV shortly after 2030, before running at 500 GeV and 350 GeV from
2045—and the Compact Linear Collider (CLIC) that would start at a centre-of-mass
energy of 380 GeV before targeting energies up to 3 TeV in the decades thereafter.
These higher energies would permit an extensive top quark physics programme.
Whatever the future holds (and there are many more proposals, e.g. for lepton–
hadron colliders, muon colliders, or entirely new ways of accelerating particles),
collider physics is sure to remain an active field with tens of thousands of interna-
tional participants. With its blend of topics at the frontier of both theoretical and
experimental physics, it remains hard to imagine a more exciting field in which to
commence study.
13-4
Part III
Appendix
IOP Publishing
Appendix A
Useful relativistic formulae
mass of the particle m, calculating m relies on the subtraction of two large numbers
(through E 2 − ∣p∣2 ). This in turn can generate an apparently negative mass through
numerical instability, which will break any code that relies on a positive mass. It is
therefore preferable to store four-vector components in an alternative form such as
(m, px , py , pz ) from which one can calculate the energy safely via E 2 = ∣p∣2 + m2 .
Since analyses typically rely on using η and pT , it can also be convenient to use the
choice of variables ( pT , η , ϕ, m ), or the true-rapidity version ( pT , y, ϕ, m ). The
polar angle θ with respect to the positive beam direction may also be useful,
particularly for cosmic-ray or forward-proton physics where other parametrisations
are numerically unreliable for expressing small deviations from the z axis.
Below, we provide a shopping list of formulae that will assist in converting
between different choices of variables. Typically, these will be implemented inside an
analysis package, such as the ubiquitous ROOT.
η = −ln tan(θ 2) = sign(z ) ln⎡⎣( p + z ) pT ⎤⎦
1 ⎡⎛ ⎞ ⎛ ⎞⎤
y = ln⎢⎜E + pz ⎟ ⎜E − pz ⎟⎥
2 ⎣⎝ ⎠ ⎝ ⎠⎦
θ = 2 tan−1(exp( −η))
p = pT sin(θ )
pT = ( px2 + py2 )
E= p2 + m2 = ( )
pT2 + m 2 cosh y
px = pT cos(ϕ)
py = pT sin(ϕ)
pz = ( )
pT2 + m 2 sinh y = p cos(θ ).
A-2