-
Machine learning evaluation in the Global Event Processor FPGA for the ATLAS trigger upgrade
Authors:
Zhixing Jiang,
Scott Hauck,
Dennis Yin,
Bowen Zuo,
Ben Carlson,
Shih-Chieh Hsu,
Allison Deiana,
Rohin Narayan,
Santosh Parajuli,
Jeff Eastlack
Abstract:
The Global Event Processor (GEP) FPGA is an area-constrained, performance-critical element of the Large Hadron Collider's (LHC) ATLAS experiment. It needs to very quickly determine which small fraction of detected events should be retained for further processing, and which other events will be discarded. This system involves a large number of individual processing tasks, brought together within th…
▽ More
The Global Event Processor (GEP) FPGA is an area-constrained, performance-critical element of the Large Hadron Collider's (LHC) ATLAS experiment. It needs to very quickly determine which small fraction of detected events should be retained for further processing, and which other events will be discarded. This system involves a large number of individual processing tasks, brought together within the overall Algorithm Processing Platform (APP), to make filtering decisions at an overall latency of no more than 8ms. Currently, such filtering tasks are hand-coded implementations of standard deterministic signal processing tasks.
In this paper we present methods to automatically create machine learning based algorithms for use within the APP framework, and demonstrate several successful such deployments. We leverage existing machine learning to FPGA flows such as hls4ml and fwX to significantly reduce the complexity of algorithm design. These have resulted in implementations of various machine learning algorithms with latencies of 1.2us and less than 5% resource utilization on an Xilinx XCVU9P FPGA. Finally, we implement these algorithms into the GEP system and present their actual performance.
Our work shows the potential of using machine learning in the GEP for high-energy physics applications. This can significantly improve the performance of the trigger system and enable the ATLAS experiment to collect more data and make more discoveries. The architecture and approach presented in this paper can also be applied to other applications that require real-time processing of large volumes of data.
△ Less
Submitted 7 May, 2024;
originally announced June 2024.
-
Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml
Authors:
Elham E Khoda,
Dylan Rankin,
Rafael Teixeira de Lima,
Philip Harris,
Scott Hauck,
Shih-Chieh Hsu,
Michael Kagan,
Vladimir Loncar,
Chaitanya Paikara,
Richa Rao,
Sioni Summers,
Caterina Vernieri,
Aaron Wang
Abstract:
Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neura…
▽ More
Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neural network layers -- long short-term memory and gated recurrent unit -- within the hls4ml framework. We demonstrate that our implementation is capable of producing effective designs for both small and large models, and can be customized to meet specific design requirements for inference latencies and FPGA resources. We show the performance and synthesized designs for multiple neural networks, many of which are trained specifically for jet identification tasks at the CERN Large Hadron Collider.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
Physics Community Needs, Tools, and Resources for Machine Learning
Authors:
Philip Harris,
Erik Katsavounidis,
William Patrick McCormack,
Dylan Rankin,
Yongbin Feng,
Abhijith Gandrakota,
Christian Herwig,
Burt Holzman,
Kevin Pedro,
Nhan Tran,
Tingjun Yang,
Jennifer Ngadiuba,
Michael Coughlin,
Scott Hauck,
Shih-Chieh Hsu,
Elham E Khoda,
Deming Chen,
Mark Neubauer,
Javier Duarte,
Georgia Karagiorgi,
Mia Liu
Abstract:
Machine learning (ML) is becoming an increasingly important component of cutting-edge physics research, but its computational requirements present significant challenges. In this white paper, we discuss the needs of the physics community regarding ML across latency and throughput regimes, the tools and resources that offer the possibility of addressing these needs, and how these can be best utiliz…
▽ More
Machine learning (ML) is becoming an increasingly important component of cutting-edge physics research, but its computational requirements present significant challenges. In this white paper, we discuss the needs of the physics community regarding ML across latency and throughput regimes, the tools and resources that offer the possibility of addressing these needs, and how these can be best utilized and accessed in the coming years.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Graph Neural Networks for Charged Particle Tracking on FPGAs
Authors:
Abdelrahman Elabd,
Vesal Razavimaleki,
Shi-Yu Huang,
Javier Duarte,
Markus Atkinson,
Gage DeZoort,
Peter Elmer,
Scott Hauck,
Jin-Xuan Hu,
Shih-Chieh Hsu,
Bo-Cheng Lai,
Mark Neubauer,
Isobel Ojalvo,
Savannah Thais,
Matthew Trahms
Abstract:
The determination of charged particle trajectories in collisions at the CERN Large Hadron Collider (LHC) is an important but challenging problem, especially in the high interaction density conditions expected during the future high-luminosity phase of the LHC (HL-LHC). Graph neural networks (GNNs) are a type of geometric deep learning algorithm that has successfully been applied to this task by em…
▽ More
The determination of charged particle trajectories in collisions at the CERN Large Hadron Collider (LHC) is an important but challenging problem, especially in the high interaction density conditions expected during the future high-luminosity phase of the LHC (HL-LHC). Graph neural networks (GNNs) are a type of geometric deep learning algorithm that has successfully been applied to this task by embedding tracker data as a graph -- nodes represent hits, while edges represent possible track segments -- and classifying the edges as true or fake track segments. However, their study in hardware- or software-based trigger applications has been limited due to their large computational cost. In this paper, we introduce an automated translation workflow, integrated into a broader tool called $\texttt{hls4ml}$, for converting GNNs into firmware for field-programmable gate arrays (FPGAs). We use this translation tool to implement GNNs for charged particle tracking, trained using the TrackML challenge dataset, on FPGAs with designs targeting different graph sizes, task complexites, and latency/throughput requirements. This work could enable the inclusion of charged particle tracking GNNs at the trigger level for HL-LHC experiments.
△ Less
Submitted 23 March, 2022; v1 submitted 3 December, 2021;
originally announced December 2021.
-
Applications and Techniques for Fast Machine Learning in Science
Authors:
Allison McCarn Deiana,
Nhan Tran,
Joshua Agar,
Michaela Blott,
Giuseppe Di Guglielmo,
Javier Duarte,
Philip Harris,
Scott Hauck,
Mia Liu,
Mark S. Neubauer,
Jennifer Ngadiuba,
Seda Ogrenci-Memik,
Maurizio Pierini,
Thea Aarrestad,
Steffen Bahr,
Jurgen Becker,
Anne-Sophie Berthold,
Richard J. Bonventre,
Tomas E. Muller Bravo,
Markus Diefenthaler,
Zhen Dong,
Nick Fritzsche,
Amir Gholami,
Ekaterina Govorkova,
Kyle J Hazelwood
, et al. (62 additional authors not shown)
Abstract:
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML ac…
▽ More
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices
Authors:
Farah Fahim,
Benjamin Hawks,
Christian Herwig,
James Hirschauer,
Sergo Jindariani,
Nhan Tran,
Luca P. Carloni,
Giuseppe Di Guglielmo,
Philip Harris,
Jeffrey Krupa,
Dylan Rankin,
Manuel Blanco Valentin,
Josiah Hester,
Yingyi Luo,
John Mamish,
Seda Orgrenci-Memik,
Thea Aarrestad,
Hamza Javed,
Vladimir Loncar,
Maurizio Pierini,
Adrian Alan Pol,
Sioni Summers,
Javier Duarte,
Scott Hauck,
Shih-Chieh Hsu
, et al. (5 additional authors not shown)
Abstract:
Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-h…
▽ More
Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.
△ Less
Submitted 23 March, 2021; v1 submitted 9 March, 2021;
originally announced March 2021.
-
FPGAs-as-a-Service Toolkit (FaaST)
Authors:
Dylan Sheldon Rankin,
Jeffrey Krupa,
Philip Harris,
Maria Acosta Flechas,
Burt Holzman,
Thomas Klijnsma,
Kevin Pedro,
Nhan Tran,
Scott Hauck,
Shih-Chieh Hsu,
Matthew Trahms,
Kelvin Lin,
Yu Lou,
Ta-Wei Ho,
Javier Duarte,
Mia Liu
Abstract:
Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant gains over traditional computing models. Although previous studies and packages in the field of heterogeneous computing have focused on GPUs as accelerators, FPGAs…
▽ More
Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant gains over traditional computing models. Although previous studies and packages in the field of heterogeneous computing have focused on GPUs as accelerators, FPGAs are an extremely promising option as well. A series of workflows are developed to establish the performance capabilities of FPGAs as a service. Multiple different devices and a range of algorithms for use in high energy physics are studied. For a small, dense network, the throughput can be improved by an order of magnitude with respect to GPUs as a service. For large convolutional networks, the throughput is found to be comparable to GPUs as a service. This work represents the first open-source FPGAs-as-a-service toolkit.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
GPU coprocessors as a service for deep learning inference in high energy physics
Authors:
Jeffrey Krupa,
Kelvin Lin,
Maria Acosta Flechas,
Jack Dinsmore,
Javier Duarte,
Philip Harris,
Scott Hauck,
Burt Holzman,
Shih-Chieh Hsu,
Thomas Klijnsma,
Mia Liu,
Kevin Pedro,
Dylan Rankin,
Natchanon Suaysom,
Matt Trahms,
Nhan Tran
Abstract:
In the next decade, the demands for computing in large scientific experiments are expected to grow tremendously. During the same time period, CPU performance increases will be limited. At the CERN Large Hadron Collider (LHC), these two issues will confront one another as the collider is upgraded for high luminosity running. Alternative processors such as graphics processing units (GPUs) can resolv…
▽ More
In the next decade, the demands for computing in large scientific experiments are expected to grow tremendously. During the same time period, CPU performance increases will be limited. At the CERN Large Hadron Collider (LHC), these two issues will confront one another as the collider is upgraded for high luminosity running. Alternative processors such as graphics processing units (GPUs) can resolve this confrontation provided that algorithms can be sufficiently accelerated. In many cases, algorithmic speedups are found to be largest through the adoption of deep learning algorithms. We present a comprehensive exploration of the use of GPU-based hardware acceleration for deep learning inference within the data reconstruction workflow of high energy physics. We present several realistic examples and discuss a strategy for the seamless integration of coprocessors so that the LHC can maintain, if not exceed, its current performance throughout its running.
△ Less
Submitted 23 April, 2021; v1 submitted 20 July, 2020;
originally announced July 2020.
-
FPGA-accelerated machine learning inference as a service for particle physics computing
Authors:
Javier Duarte,
Philip Harris,
Scott Hauck,
Burt Holzman,
Shih-Chieh Hsu,
Sergo Jindariani,
Suffian Khan,
Benjamin Kreis,
Brian Lee,
Mia Liu,
Vladimir LonĨar,
Jennifer Ngadiuba,
Kevin Pedro,
Brandon Perez,
Maurizio Pierini,
Dylan Rankin,
Nhan Tran,
Matthew Trahms,
Aristeidis Tsaris,
Colin Versteeg,
Ted W. Way,
Dustin Werran,
Zhenbin Wu
Abstract:
New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of mach…
▽ More
New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that potentially requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave by Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) milliseconds with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600--700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective.
△ Less
Submitted 16 October, 2019; v1 submitted 18 April, 2019;
originally announced April 2019.
-
Production and Integration of the ATLAS Insertable B-Layer
Authors:
B. Abbott,
J. Albert,
F. Alberti,
M. Alex,
G. Alimonti,
S. Alkire,
P. Allport,
S. Altenheiner,
L. Ancu,
E. Anderssen,
A. Andreani,
A. Andreazza,
B. Axen,
J. Arguin,
M. Backhaus,
G. Balbi,
J. Ballansat,
M. Barbero,
G. Barbier,
A. Bassalat,
R. Bates,
P. Baudin,
M. Battaglia,
T. Beau,
R. Beccherle
, et al. (352 additional authors not shown)
Abstract:
During the shutdown of the CERN Large Hadron Collider in 2013-2014, an additional pixel layer was installed between the existing Pixel detector of the ATLAS experiment and a new, smaller radius beam pipe. The motivation for this new pixel layer, the Insertable B-Layer (IBL), was to maintain or improve the robustness and performance of the ATLAS tracking system, given the higher instantaneous and i…
▽ More
During the shutdown of the CERN Large Hadron Collider in 2013-2014, an additional pixel layer was installed between the existing Pixel detector of the ATLAS experiment and a new, smaller radius beam pipe. The motivation for this new pixel layer, the Insertable B-Layer (IBL), was to maintain or improve the robustness and performance of the ATLAS tracking system, given the higher instantaneous and integrated luminosities realised following the shutdown. Because of the extreme radiation and collision rate environment, several new radiation-tolerant sensor and electronic technologies were utilised for this layer. This paper reports on the IBL construction and integration prior to its operation in the ATLAS detector.
△ Less
Submitted 6 June, 2018; v1 submitted 2 March, 2018;
originally announced March 2018.