Adaptive Control and Machine Learning For Particle
Adaptive Control and Machine Learning For Particle
Content from this work may be used under the terms of the CC BY 3.0 licence (© 2021). Any distribution of this work must maintain attribution to the author(s), title of the work, publisher, and DOI
ISBN: 978-3-95450-230-1 ISSN: 2673-5350 doi:10.18429/JACoW-IBIC2021-THOB03
Content from this work may be used under the terms of the CC BY 3.0 licence (© 2021). Any distribution of this work must maintain attribution to the author(s), title of the work, publisher, and DOI
ISBN: 978-3-95450-230-1 ISSN: 2673-5350 doi:10.18429/JACoW-IBIC2021-THOB03
un-modeled disturbance in the dynamics (1) can destabilize phase set points as well as magnet power supply voltages
the closed loop nonlinear system [1–3]. or currents. The adaptive ES algorithm dynamically tunes
For a long time, the main limitation of nonlinear and adap- parameters according to
tive control approaches was an inability to handle a sign-
𝑝¤ 𝑗 = 𝜓 𝑗 𝜔 𝑗 𝑡 + 𝑘 𝑦ˆ (x, 𝑡) , (4)
changing time varying coefficient 𝑏(𝑡) in system (1) which
multiplies the control input 𝑢(𝑡), such as 𝑏(𝑡) = cos(2𝜋 𝑓 𝑡) where 𝜔𝑖 are distinct dithering frequencies defined as 𝜔𝑖 =
which changes sign repeatedly thereby changing the effect of 𝜔𝑟 𝑖 with 𝑟 𝑖 ≠ 𝑟 𝑗 for 𝑖 ≠ 𝑗, 𝑘 is a feedback gain. The 𝜓 𝑗 may
control input 𝑢(𝑡). For particle accelerators such variation be chosen from a large class of functions which may be non-
comes from the fact that the beam at a certain location is differentiable and not even continuous, such as square waves
influenced by many upstream components, such as magnet which are easily implemented in digital systems [6]. The
settings as well as the initial phase space distribution of the only requirements on the 𝜓 𝑗 are that for a given time interval
beam entering into the particle accelerator. Changes in in- [0, 𝑡] they are measurable with respect to the 𝐿 2 norm and
put beams and accelerator components upstream have an that they are mutually orthogonal in Hilbert space in the weak
influence on the response of quantities such as beam loss sense relative to all measurable functions 𝑓 (𝑡) ∈ 𝐿 2 [0, 𝑡] in
relative to downstream components. For example, consider the limit as 𝜔 → ∞, which can be written as
a state 𝑥(𝑡) which describes beam loss in a particle accelera- ∫ 𝑡
tor, whose minimization is desired, which is influenced by lim 𝜓𝑖 (𝜏)𝜓 𝑗 (𝜏)𝑑𝜏 = 0, ∀𝑖 ≠ 𝑗,
𝜔→∞ 0
a large collection of quadrupole magnets u = (𝑢 1 , . . . , 𝑢 𝑚 ). ∫ 𝑡
The effect of a single magnet, 𝑢 𝑚 , depends on the initial lim 𝜓𝑖 (𝜏) 𝑓 (𝜏)𝑑𝜏 = 0, ∀𝑖, ∀ 𝑓 (𝑡) ∈ 𝐿 2 [0, 𝑡],
𝜔→∞ 0
beam’s phase space as it enters the accelerator from the ∫ 𝑡 ∫ 𝑡
source and also on the settings of all of the other quadrupole
lim 𝜓𝑖2 (𝜏) 𝑓 (𝜏)𝑑𝜏 = 𝑐 𝑖 𝑓 (𝜏)𝑑𝜏,
magnets that are upstream, 𝑢 𝑖<𝑚 and changes with time as 𝜔→∞ 0 0
the upstream magnets are adjusted and as the initial beam 2
∀𝑖, ∀ 𝑓 (𝑡) ∈ 𝐿 [0, 𝑡], 𝑐 𝑖 > 0.
conditions change. One day decreasing 𝑥 may require de-
creasing the current of magnet 𝑢 𝑚 and another day it might One particular implementation of the ES method is es-
have to be increased. pecially convenient for particle accelerator applications be-
The control and stabilization of time-varying systems is cause the tuning functions 𝜓𝑖 have analytically guaranteed
notoriously difficult, even simple linear time-varying sys- bounds despite acting on analytically unknown and noisy
tems are difficult to analyze in general because standard functions, which guaranteed known update rates and limits
eigenvalue techniques break down and stability can only be on all tuned parameters [5]:
proven by using Lyapunov theory [1]. Recently, a nonlinear √
𝑢 𝑖 = 𝛼𝑖 𝜔𝑖 cos (𝜔𝑖 𝑡 + 𝑘 𝑦ˆ (x, 𝑡)) . (5)
extremum seeking (ES) feedback control method was devel-
oped which could stabilize and minimize the analytically un- The utility of this approach is clearly demonstrated by con-
known outputs of a wide range of dynamics systems, scalar sidering a system of the form
and vector-valued which can be time-varying, nonlinear and √
x¤ = f (x(𝑡), u( 𝑦ˆ (𝑡)), 𝑡), 𝑝¤𝑖 = 𝛼𝜔𝑖 cos (𝜔𝑖 𝑡 + 𝑘 𝑦ˆ (x, 𝑡)) ,
open loop unstable with unknown control directions [4–6]. (6)
The ES method is applicable to a wide range of nonlinear which results in average dynamics that minimize the noise-
and time-varying systems of the form corrupted unknown function 𝑦(x, 𝑡):
𝑥¤ = 𝑎(𝑡)𝑥(𝑡) + 𝑏(𝑡)𝑢( 𝑦ˆ (𝑡)), 𝑘𝛼 𝜕𝑦(x, 𝑡)
𝑝¤̄𝑖 = − . (7)
x¤ = 𝐴(𝑡)x(𝑡) + 𝐵(𝑡)u( 𝑦ˆ (𝑡)), 2 𝜕 𝑝𝑖
x¤ = f (x(𝑡), u( 𝑦ˆ (𝑡)), 𝑡), This method has been utilized for various particle accelerator
𝑦ˆ (x, 𝑡) = 𝑦(x, 𝑡) + 𝑛(𝑡), (3) applications including real-time betatron oscillation mini-
mization in a time-varying magnetic lattice at the SPEAR3
which include scalar time-varying linear systems, vector- synchrotron [7], for maximization of the output power of
valued time-varying linear systems, and vector-valued non- the Linac Coherent Light Source (LCLS) free electron laser
linear time-varying systems, where in each case the feedback (FEL) and of the European X-ray FEL [8], for real-time multi-
control 𝑢 is based only on a noise-corrupted measurement objective optimization for simultaneous trajectory control
𝑦ˆ (𝑡) of an analytically unknown cost function 𝑦(x, 𝑡). For and emittance growth minimization at the AWAKE plasma
example, a measurable but analytically unknown cost func- wakefield acceleration facility at CERN [9], and for beam
tion an be the sum of beam loss along a many kilometer loss minimization by automatically tuning the amplitude and
long particle accelerator, which depends on all accelerator phase set points of multiple RF cavities at the Los Alamos
parameters and on the initial 6D phase space of the beam Neutron Science Center (LANSCE) linear ion accelerator.
being accelerated. One limitation of adaptive methods such as the ES approach
For accelerator applications, the ES method can tune is that they are local feedback-based methods and it is possi-
groups of parameters, p = ( 𝑝 1 , . . . , 𝑝 𝑚 ). For example, ble for them to get stuck in a local minimum when optimizing
tuned parameters might include RF cavity amplitude and for an analytically unknown output function.
THOB03
08 Feedback Systems and Beam Stability 467
10th Int. Beam Instrum. Conf. IBIC2021, Pohang, Rep. of Korea JACoW Publishing
Content from this work may be used under the terms of the CC BY 3.0 licence (© 2021). Any distribution of this work must maintain attribution to the author(s), title of the work, publisher, and DOI
ISBN: 978-3-95450-230-1 ISSN: 2673-5350 doi:10.18429/JACoW-IBIC2021-THOB03
Figure 1: The accuracy of the 𝜎𝑦 prediction quickly degrades once the system has had time to evolve and leaves the span
of the collected training data set. Such an approach would have to be continuously and repeatedly re-trained to maintain
accuracy, which is infeasible and defeats the purpose of the ML-based diagnostic [16].
LIMITATIONS OF ML FOR provide fast, robust, and automatic control over the energy
TIME-VARYING SYSTEMS vs time phase space of electron bunches in the LCLS [17].
Recently, AML methods have been studied in more general-
Machine learning (ML) tools are being developed that can ity for adaptive tuning of the inputs and outputs of ML tools
learn representations of complex accelerator dynamics di- such as neural networks for time-varying systems [18].
rectly from data. ML methods have been utilized to develop
Some of the most powerful ML tools are encoder-decoder
surrogate models to act as virtual diagnostics [10], powerful
generative convolutional neural networks (CNN) which can
polynomial chaos expansion-based surrogate models have
be used to find highly efficient nonlinear functions that can
been used for uncertainty quantification [11], convolutional
project incredibly high dimensional input spaces, which may
neural networks have been used for time-series classification
be combinations of images and vectors, down to a very low
and forecasting in accelerators [12], Bayesian Gaussian pro-
dimensional latent space, before generating back up to a
cesses utilize learned correlations in data/physics-informed
high dimensional representation [19, 20]. Encoder-decoder
kernels [13], surrogate models can help speed up simulation-
networks have been used for anomaly detection [21], time-
based optimization [14], and various ML methods have been
series data [22], and for optimization of deep generative
used for beam dynamics studies at CERN [15].
models [23].
A major limitation of ML methods, and an active area
of research in the ML community, is the problem of time- A novel approach to AML for time-varying systems is
varying systems, known as distribution shift. If a system now being developed which utilizes such generative CNN-
changes with time then the data that was used to train an based encoder-decoders to adaptively tune directly the low-
ML-based tool will no longer provide an accurate represen- dimensional latent space representation (as small as 2 di-
tation of the system of interest, and the accuracy of the ML mensions), for incredibly high dimensional systems (hun-
tool will degrade. Distribution drift is a challenge for all dreds of thousands - millions of parameters) [24–26]. The
ML methods including neural networks for surrogate mod- setup of such an encoder-decoder generative CNN is shown
els, the use of neural networks to represent cost functions in Figure 2. The network takes inputs that are 2D images
or optimal policies in reinforcement learning, and even for of beam phase space distributions together with vectors of
methods such as Gaussian processes which utilize learned accelerator parameter settings such as magnets and RF sys-
correlations in their kernels. Incorporating methods to deal tems. The high dimensional inputs are squeezed down to
with distribution shift is a major need for the accelerator a low-dimensional latent space representation from which
community because accelerators and their beams change un- a collection of distributions is then generated, as shown in
predictably with time. This challenge is illustrated by Fig. 1 Figure 3 for a 2-dimensional latent space representation.
which demonstrates that an ML-based prediction quickly The method works by first performing a supervised
degrades in accuracy as the system changes over time, which learning-based training in which we have access to input-
in this case is the 𝜎𝑦 beam prediction for the LCLS [16]. output pairs of the form (xin , Xin , Ŷout ) where xin are vec-
Such an approach would have to be continuously and repeat- tors of accelerator parameter inputs, Xin are stacks of 2D
edly re-trained to maintain accuracy, which is infeasible and phase space image inputs. The generative half of the
defeats the purpose of the ML-based diagnostic. encoder-decoder CNN builds back up to a high dimen-
sional output Ŷout which is a 752, 640 = 224 × 224 × 15
ADAPTIVE ML FOR TIME-VARYING dimensional output with the 15 channels representing the
15 2D projections of the 6D phase space: (𝑥, 𝑦), (𝑥, 𝑧),
SYSTEMS (𝑥, 𝑥 ′ ), (𝑥, 𝑦 ′ ), (𝑥, 𝐸), (𝑥 ′ , 𝑦), (𝑥 ′ , 𝑧), (𝑥 ′ , 𝑦 ′ ), (𝑥 ′ , 𝐸),
Efforts have begun to combine the robustness of adaptive (𝑦, 𝑧), (𝑦, 𝑦 ′ ), (𝑦, 𝐸), (𝑦 ′ , 𝑧), (𝑦 ′ , 𝐸), (𝑧, 𝐸) in the HiRES
feedback with the global representations that can be learned UED as shown in Figure 2 and in a similar setup the output
with ML methods to develop adaptive machine learning is a 1, 228, 800 = 128 × 128 × 75 dimensional object with
(AML) for time-varying systems. The first such result com- the 75 channels representing the 15 2D projections of the 6D
bined neural networks and model-independent feedback to phase at 5 different locations in FACET-II, shown in 3. By
THOB03
468 08 Feedback Systems and Beam Stability
10th Int. Beam Instrum. Conf. IBIC2021, Pohang, Rep. of Korea JACoW Publishing
Content from this work may be used under the terms of the CC BY 3.0 licence (© 2021). Any distribution of this work must maintain attribution to the author(s), title of the work, publisher, and DOI
ISBN: 978-3-95450-230-1 ISSN: 2673-5350 doi:10.18429/JACoW-IBIC2021-THOB03
Figure 2: The HiRES encoder-decoder CNN structure for the AML setup is shown with layer sizes such as (224, 224, 15)
representing an output of 15 filters of image size 214 × 214 each. The dense layer widths shown as single numbers.
Figure 3: An encoder-decoder convolutional neural network setup is shown which takes an image of an electron beam’s
(𝑥, 𝑦) phase space distribution as an input together with a vector of accelerator parameters (A). The high dimensional inputs
are squeezed down to a 2 dimensional latent space (B), from which 75 2D distributions are then generated which are all
15 2D projections of the beam’s 6D phase space at 5 different particle accelerator locations (C). Some of the projections,
such as the (𝑧, 𝐸) longitudinal phase space distributions can be compared to TCAV-based measurements to guide adaptive
feedback which takes place in the low dimension latent space to compensate for unknown changes in both the accelerator
parameters and in the initial beam distribution (D). The variation of the (𝑥 / , 𝑦 / ) and (𝑧, 𝐸) 2D phase space projections is
shown as one moves through the 2D latent space learned by the network and adaptively tuned (E) [25].
THOB03
08 Feedback Systems and Beam Stability 469
10th Int. Beam Instrum. Conf. IBIC2021, Pohang, Rep. of Korea JACoW Publishing
Content from this work may be used under the terms of the CC BY 3.0 licence (© 2021). Any distribution of this work must maintain attribution to the author(s), title of the work, publisher, and DOI
ISBN: 978-3-95450-230-1 ISSN: 2673-5350 doi:10.18429/JACoW-IBIC2021-THOB03
forcing the generative half of the CNN to predict such high We compare just these predictions to their measurements
dimensional outputs which contain all of the projections of and operate the trained generative CNN in a un-supervised
the beam’s 6D phase space simultaneously, we are forcing adaptive manner in which we apply feedback directly on the
the CNN to learn the relationships between various phase low-dimensional latent space representation in order to track
space projections as well as their correlations and physics the time-varying measurements by actively minimizing a
constraints within the particle accelerator system for which cost function in real time, of the form:
the network is being trained. ∬
In both the HiRES and the FACET-II setup we are con- 𝐶 (Y𝑖 (𝑡), Ŷ𝑖 (𝑡)) = Y𝑖 (𝑡) − Ŷ𝑖 (𝑡) 𝑑𝑌𝑖 , (9)
sidering a mapping of inputs to outputs of the form
which is minimized by adaptively tuning the latent space
Ŷout (𝑡) = F (xin (𝑡), Xin (𝑡)) , (8) parameters y 𝐿 = (𝑦 1 , . . . , 𝑦 𝑛 ), according to the model-
independent ES algorithm described above, according to:
where both the accelerator parameters xin (𝑡) and the input
beam Xin (𝑡) are expected to change unpredictably with time 𝑑𝑦 𝑗 (𝑡) √
and furthermore we assume that we will not have access to = 𝛼𝑖 𝜔𝑖 cos 𝜔 𝑗 𝑡 + 𝑘 𝑗 𝐶 (Y𝑖 (𝑡), Ŷ𝑖 (𝑡)) , (10)
𝑑𝑡
non-invasive and accurate measurements of these changes.
as shown in Figure 2.
Furthermore, once the accelerator is operational, we lose
Note that with this implementation, the relationship in
access to most of the true measurements of the beam’s phase
Equation (8) is now being approximated by
space Yout (𝑡) which could be compared to their predictions
from the generative CNN Ŷout (𝑡). However, most advanced Ŷout (𝑡) ≈ F̂ (y 𝐿 (𝑡)) , (11)
accelerators do have access to non-invasive measurements
of some subset of the beam’s phase space, for example trans- where F̂ is the generative half of the CNN and Ŷout (𝑡) is
verse deflective cavities together with dipole magnets can be now parameterized by the low dimensional latent space
used to measure the beam’s longitudinal phase space (LPS) vector y 𝐿 (𝑡) without needing access to measurements of
2D (𝑧, 𝐸) distribution as is routinely done at the LCLS. (xin (𝑡), Xin (𝑡)).
In order to accurately predict Ŷout (𝑡) without knowledge One example of such convergence for the FACET-II setup
of the time-varying accelerator beam and component mea- with a 7-dimensional latent space is shown in Figure 4, which
surements (xin (𝑡), Xin (𝑡)), we rely on the fact that the gen- shows the trajectory taken by ES in the latent space from
erative CNN has learned the correlations within the system a starting point very far from the correct input distribution
and respects the physics constraints in the data and therefore and accelerator parameters (xin (𝑡), Xin (𝑡)) as it converges
we use just the available measurements, such as the LPS dis- to the global minimum, with the components (𝑦 1 , 𝑦 𝑛 ) for
tribution or energy spread spectrum measurements, which 𝑛 ∈ {2, 3, 4, 5, 6, 7} shown overlaid on top of the cost func-
we denote as Ŷ𝑖 (𝑡) ∈ Ŷout (𝑡). tion surface. Figure 5 shows the results of the convergence
Figure 4: Several 3D projections (𝑦 1 , 𝑦 𝑛 ) for 𝑛 ∈ {2, 3, 4, 5, 6, 7} of convergence within the 7D latent space are shown
with the adaptively tuned trajectory shown as black dots lifted slightly above the surface of the cost function. The cost
convergence is also shown and seen to take approximately 400 steps to converge.
THOB03
470 08 Feedback Systems and Beam Stability
10th Int. Beam Instrum. Conf. IBIC2021, Pohang, Rep. of Korea JACoW Publishing
Content from this work may be used under the terms of the CC BY 3.0 licence (© 2021). Any distribution of this work must maintain attribution to the author(s), title of the work, publisher, and DOI
ISBN: 978-3-95450-230-1 ISSN: 2673-5350 doi:10.18429/JACoW-IBIC2021-THOB03
Figure 5: Predictions of the 7D latent space model of the phase space at bunch compressor 20 of FACAT-II (BC20) and at
the interaction point (IP). The red dashed box shows a LPS diagnostic that was used as part of the cost function while the
other 2D phase space projections in the green dashed box were unseen by the CNN which is correctly predicting projections
of the beam’s 2D phase space not only at BC20, but also at the unseen IP location.
which gives a close match of various 2D phase space pro- [7] A. Scheinker, X. Huang, and J. Wu, “Minimization of beta-
jections throughout the accelerator despite feedback acting tron oscillations of electron beam injected into a time-varying
only based on a single LPS measurement. lattice via extremum seeking,” IEEE Transactions on Con-
trol Systems Technology, vol 26, no .1, pp. 336-343, 2017.
CONCLUSION doi:10.1109/TCST.2017.2664728
[8] A. Scheinker et al., “Model-independent tuning for maxi-
This work demonstrates an adaptive ML approach to high
mizing free electron laser pulse energy,” Physical Review
dimensional time-varying systems in general and in partic-
Accelerators and Beams, vol. 22, no .8, p. 082802, 2019.
ular for particle accelerator applications in which both the doi:10.1103/PhysRevAccelBeams.22.082802
accelerator components and the input beams change unpre-
dictably with time due to various external disturbances. By [9] A. Scheinker et al., “Online multi-objective particle accel-
erator optimization of the AWAKE electron beam line for
training a deep convolutional encoder-decoder style gener-
simultaneous emittance and orbit control,” AIP Advances,
ative neural network and forcing it to predict all 2D pro- vol. 10, no .5, p. 055320, 2020. doi:10.1063/5.0003423
jections of the beam’s 6D phase space simultaneously this
physics-informed approach gives accurate predictions for [10] C. Emma et al., “Machine learning-based longitudinal phase
space prediction of particle accelerators,” Physical Review
unseen phase space projections by adaptively matching only
Accelerators and Beams, vol. 21, no .11, p. 112802, 2018.
a measurable distribution. doi:10.1103/PhysRevAccelBeams.21.112802
Content from this work may be used under the terms of the CC BY 3.0 licence (© 2021). Any distribution of this work must maintain attribution to the author(s), title of the work, publisher, and DOI
ISBN: 978-3-95450-230-1 ISSN: 2673-5350 doi:10.18429/JACoW-IBIC2021-THOB03
ical Journal Plus, vol. 136, no .4, pp. 1-19, 2021. doi: [22] R. Maulik, A. Mohan, B. Lusch, S. Madireddy, P. Bal-
10.1140/epjp/s13360-021-01348-5 aprakash, and D. Livescu, “Time-series learning of latent-
[16] Figure shared by collaborators at SLAC National Accelerator space dynamics for reduced-order model closure,” Phys-
Laboratory. ica D: Nonlinear Phenomena, vol. 405, p. 132368, 2020.
doi:10.1016/j.physd.2020.132368
[17] A. Scheinker et al., “Demonstration of model-independent
control of the longitudinal phase space of electron beams in [23] A. Tripp, E. Daxberger, and J. M. Hernández-Lobato.
the linac-coherent light source with femtosecond resolution,” “Sample-efficient optimization in the latent space of deep
Physical review letters, vol. 121, no .4, p. 044801, 2018. generative models via weighted retraining,” Advances in
doi:10.1103/PhysRevLett.121.044801 Neural Information Processing Systems, 33, 2020. https:
[18] A. Scheinker, “Adaptive Machine Learning for Robust Di- //proceedings.neurips.cc/paper/2020/file/
agnostics and Control of Time-Varying Particle Accelerator 81e3225c6ad49623167a4309eb4b2e75-Paper.pdf
Components and Beams,” Information, vol. 12, no .4, p. 161,
[24] A. Scheinker, F. Cropp, S. Paiagua, and D. Filippetto,
2021. doi:10.3390/info12040161
“Adaptive deep learning for time-varying systems with hid-
[19] A. L. Caterini, A. Doucet, and D. Sejdinovic. “Hamiltonian den parameters: Predicting changing input beam distri-
variational auto-encoder." arXivpreprint arXiv:1805.11328, butions of compact particle accelerators,” arXiv preprint
2018. https://arxiv.org/abs/1805.11328 arXiv:2102.10510, 2021. https://arxiv.org/abs/2102.
[20] Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, C. A. Stevens, 10510
and L. Carin, “Variational autoencoder for deep learning of
images, labels and captions," Advances in neural information [25] A. Scheinker, F. Cropp, S. Paiagua, and D. Filippetto, “Adap-
processing systems, vol. 29, pp. 2352-2360, 2016. https: tive Latent Space Tuning for Non-Stationary Distributions,”
//proceedings.neurips.cc/paper/2016/file/ arXiv preprint arXiv:2105.03584, 2021. https://arxiv.
eb86d510361fc23b59f18c1bc9802cc6-Paper.pdf org/abs/2105.03584
[21] J. An and S. Cho, “Variational autoencoder based anomaly [26] A. Scheinker, “Adaptive Machine Learning for Time-Varying
detection using reconstruction probability,” Special lecture Systems: Low Dimensional Latent Space Tuning,” arXiv
on IE., vol. 2, no .1, pp. 1-18, 2015. http://dm.snu.ac. preprint arXiv:2107.06207, 2021. https://arxiv.org/
kr/static/docs/TR/SNUDM-TR-2015-03.pdf abs/2107.06207
THOB03
472 08 Feedback Systems and Beam Stability