Search | arXiv e-print repository

Machine learning evaluation in the Global Event Processor FPGA for the ATLAS trigger upgrade

Authors: Zhixing Jiang, Scott Hauck, Dennis Yin, Bowen Zuo, Ben Carlson, Shih-Chieh Hsu, Allison Deiana, Rohin Narayan, Santosh Parajuli, Jeff Eastlack

Abstract: The Global Event Processor (GEP) FPGA is an area-constrained, performance-critical element of the Large Hadron Collider's (LHC) ATLAS experiment. It needs to very quickly determine which small fraction of detected events should be retained for further processing, and which other events will be discarded. This system involves a large number of individual processing tasks, brought together within th… ▽ More The Global Event Processor (GEP) FPGA is an area-constrained, performance-critical element of the Large Hadron Collider's (LHC) ATLAS experiment. It needs to very quickly determine which small fraction of detected events should be retained for further processing, and which other events will be discarded. This system involves a large number of individual processing tasks, brought together within the overall Algorithm Processing Platform (APP), to make filtering decisions at an overall latency of no more than 8ms. Currently, such filtering tasks are hand-coded implementations of standard deterministic signal processing tasks. In this paper we present methods to automatically create machine learning based algorithms for use within the APP framework, and demonstrate several successful such deployments. We leverage existing machine learning to FPGA flows such as hls4ml and fwX to significantly reduce the complexity of algorithm design. These have resulted in implementations of various machine learning algorithms with latencies of 1.2us and less than 5% resource utilization on an Xilinx XCVU9P FPGA. Finally, we implement these algorithms into the GEP system and present their actual performance. Our work shows the potential of using machine learning in the GEP for high-energy physics applications. This can significantly improve the performance of the trigger system and enable the ATLAS experiment to collect more data and make more discoveries. The architecture and approach presented in this paper can also be applied to other applications that require real-time processing of large volumes of data. △ Less

Submitted 7 May, 2024; originally announced June 2024.

Comments: 14 pages, 4 figures, 6 tables. Accepted by JINST on April 3, 2024

arXiv:2207.00559 [pdf, other]

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Authors: Elham E Khoda, Dylan Rankin, Rafael Teixeira de Lima, Philip Harris, Scott Hauck, Shih-Chieh Hsu, Michael Kagan, Vladimir Loncar, Chaitanya Paikara, Richa Rao, Sioni Summers, Caterina Vernieri, Aaron Wang

Abstract: Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neura… ▽ More Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neural network layers -- long short-term memory and gated recurrent unit -- within the hls4ml framework. We demonstrate that our implementation is capable of producing effective designs for both small and large models, and can be customized to meet specific design requirements for inference latencies and FPGA resources. We show the performance and synthesized designs for multiple neural networks, many of which are trained specifically for jet identification tasks at the CERN Large Hadron Collider. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Comments: 12 pages, 6 figures, 5 tables

arXiv:2203.16255 [pdf, other]

Physics Community Needs, Tools, and Resources for Machine Learning

Authors: Philip Harris, Erik Katsavounidis, William Patrick McCormack, Dylan Rankin, Yongbin Feng, Abhijith Gandrakota, Christian Herwig, Burt Holzman, Kevin Pedro, Nhan Tran, Tingjun Yang, Jennifer Ngadiuba, Michael Coughlin, Scott Hauck, Shih-Chieh Hsu, Elham E Khoda, Deming Chen, Mark Neubauer, Javier Duarte, Georgia Karagiorgi, Mia Liu

Abstract: Machine learning (ML) is becoming an increasingly important component of cutting-edge physics research, but its computational requirements present significant challenges. In this white paper, we discuss the needs of the physics community regarding ML across latency and throughput regimes, the tools and resources that offer the possibility of addressing these needs, and how these can be best utiliz… ▽ More Machine learning (ML) is becoming an increasingly important component of cutting-edge physics research, but its computational requirements present significant challenges. In this white paper, we discuss the needs of the physics community regarding ML across latency and throughput regimes, the tools and resources that offer the possibility of addressing these needs, and how these can be best utilized and accessed in the coming years. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: Contribution to Snowmass 2021, 33 pages, 5 figures

arXiv:2112.02048 [pdf, other]

doi 10.3389/fdata.2022.828666

Graph Neural Networks for Charged Particle Tracking on FPGAs

Authors: Abdelrahman Elabd, Vesal Razavimaleki, Shi-Yu Huang, Javier Duarte, Markus Atkinson, Gage DeZoort, Peter Elmer, Scott Hauck, Jin-Xuan Hu, Shih-Chieh Hsu, Bo-Cheng Lai, Mark Neubauer, Isobel Ojalvo, Savannah Thais, Matthew Trahms

Abstract: The determination of charged particle trajectories in collisions at the CERN Large Hadron Collider (LHC) is an important but challenging problem, especially in the high interaction density conditions expected during the future high-luminosity phase of the LHC (HL-LHC). Graph neural networks (GNNs) are a type of geometric deep learning algorithm that has successfully been applied to this task by em… ▽ More The determination of charged particle trajectories in collisions at the CERN Large Hadron Collider (LHC) is an important but challenging problem, especially in the high interaction density conditions expected during the future high-luminosity phase of the LHC (HL-LHC). Graph neural networks (GNNs) are a type of geometric deep learning algorithm that has successfully been applied to this task by embedding tracker data as a graph -- nodes represent hits, while edges represent possible track segments -- and classifying the edges as true or fake track segments. However, their study in hardware- or software-based trigger applications has been limited due to their large computational cost. In this paper, we introduce an automated translation workflow, integrated into a broader tool called $\texttt{hls4ml}$, for converting GNNs into firmware for field-programmable gate arrays (FPGAs). We use this translation tool to implement GNNs for charged particle tracking, trained using the TrackML challenge dataset, on FPGAs with designs targeting different graph sizes, task complexites, and latency/throughput requirements. This work could enable the inclusion of charged particle tracking GNNs at the trigger level for HL-LHC experiments. △ Less

Submitted 23 March, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

Comments: 28 pages, 17 figures, 1 table, published version

Journal ref: Front. Big Data 5 (2022) 828666

arXiv:2110.13041 [pdf, other]

doi 10.3389/fdata.2022.787421

Applications and Techniques for Fast Machine Learning in Science

Authors: Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini, Thea Aarrestad, Steffen Bahr, Jurgen Becker, Anne-Sophie Berthold, Richard J. Bonventre, Tomas E. Muller Bravo, Markus Diefenthaler, Zhen Dong, Nick Fritzsche, Amir Gholami, Ekaterina Govorkova, Kyle J Hazelwood , et al. (62 additional authors not shown)

Abstract: In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML ac… ▽ More In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs. △ Less

Submitted 25 October, 2021; originally announced October 2021.

Comments: 66 pages, 13 figures, 5 tables

Report number: FERMILAB-PUB-21-502-AD-E-SCD

Journal ref: Front. Big Data 5, 787421 (2022)

arXiv:2103.05579 [pdf, other]

hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

Authors: Farah Fahim, Benjamin Hawks, Christian Herwig, James Hirschauer, Sergo Jindariani, Nhan Tran, Luca P. Carloni, Giuseppe Di Guglielmo, Philip Harris, Jeffrey Krupa, Dylan Rankin, Manuel Blanco Valentin, Josiah Hester, Yingyi Luo, John Mamish, Seda Orgrenci-Memik, Thea Aarrestad, Hamza Javed, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, Sioni Summers, Javier Duarte, Scott Hauck, Shih-Chieh Hsu , et al. (5 additional authors not shown)

Abstract: Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-h… ▽ More Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery. △ Less

Submitted 23 March, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

Comments: 10 pages, 8 figures, TinyML Research Symposium 2021

Report number: FERMILAB-CONF-21-080-SCD

arXiv:2010.08556 [pdf, other]

doi 10.1109/H2RC51942.2020.00010

FPGAs-as-a-Service Toolkit (FaaST)

Authors: Dylan Sheldon Rankin, Jeffrey Krupa, Philip Harris, Maria Acosta Flechas, Burt Holzman, Thomas Klijnsma, Kevin Pedro, Nhan Tran, Scott Hauck, Shih-Chieh Hsu, Matthew Trahms, Kelvin Lin, Yu Lou, Ta-Wei Ho, Javier Duarte, Mia Liu

Abstract: Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant gains over traditional computing models. Although previous studies and packages in the field of heterogeneous computing have focused on GPUs as accelerators, FPGAs… ▽ More Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant gains over traditional computing models. Although previous studies and packages in the field of heterogeneous computing have focused on GPUs as accelerators, FPGAs are an extremely promising option as well. A series of workflows are developed to establish the performance capabilities of FPGAs as a service. Multiple different devices and a range of algorithms for use in high energy physics are studied. For a small, dense network, the throughput can be improved by an order of magnitude with respect to GPUs as a service. For large convolutional networks, the throughput is found to be comparable to GPUs as a service. This work represents the first open-source FPGAs-as-a-service toolkit. △ Less

Submitted 16 October, 2020; originally announced October 2020.

Comments: 10 pages, 7 figures, to appear in proceedings of the 2020 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing

Report number: FERMILAB-CONF-20-426-SCD

Journal ref: 2020 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC), 2020, pp. 38-47

arXiv:2007.10359 [pdf, other]

doi 10.1088/2632-2153/abec21

GPU coprocessors as a service for deep learning inference in high energy physics

Authors: Jeffrey Krupa, Kelvin Lin, Maria Acosta Flechas, Jack Dinsmore, Javier Duarte, Philip Harris, Scott Hauck, Burt Holzman, Shih-Chieh Hsu, Thomas Klijnsma, Mia Liu, Kevin Pedro, Dylan Rankin, Natchanon Suaysom, Matt Trahms, Nhan Tran

Abstract: In the next decade, the demands for computing in large scientific experiments are expected to grow tremendously. During the same time period, CPU performance increases will be limited. At the CERN Large Hadron Collider (LHC), these two issues will confront one another as the collider is upgraded for high luminosity running. Alternative processors such as graphics processing units (GPUs) can resolv… ▽ More In the next decade, the demands for computing in large scientific experiments are expected to grow tremendously. During the same time period, CPU performance increases will be limited. At the CERN Large Hadron Collider (LHC), these two issues will confront one another as the collider is upgraded for high luminosity running. Alternative processors such as graphics processing units (GPUs) can resolve this confrontation provided that algorithms can be sufficiently accelerated. In many cases, algorithmic speedups are found to be largest through the adoption of deep learning algorithms. We present a comprehensive exploration of the use of GPU-based hardware acceleration for deep learning inference within the data reconstruction workflow of high energy physics. We present several realistic examples and discuss a strategy for the seamless integration of coprocessors so that the LHC can maintain, if not exceed, its current performance throughout its running. △ Less

Submitted 23 April, 2021; v1 submitted 20 July, 2020; originally announced July 2020.

Comments: 26 pages, 7 figures, 2 tables

Report number: FERMILAB-PUB-20-338-E-SCD

Journal ref: Mach. Learn.: Sci. Technol. 2 (2021) 035005

arXiv:1904.08986 [pdf, other]

doi 10.1007/s41781-019-0027-2

FPGA-accelerated machine learning inference as a service for particle physics computing

Authors: Javier Duarte, Philip Harris, Scott Hauck, Burt Holzman, Shih-Chieh Hsu, Sergo Jindariani, Suffian Khan, Benjamin Kreis, Brian Lee, Mia Liu, Vladimir Lončar, Jennifer Ngadiuba, Kevin Pedro, Brandon Perez, Maurizio Pierini, Dylan Rankin, Nhan Tran, Matthew Trahms, Aristeidis Tsaris, Colin Versteeg, Ted W. Way, Dustin Werran, Zhenbin Wu

Abstract: New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of mach… ▽ More New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that potentially requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave by Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) milliseconds with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600--700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective. △ Less

Submitted 16 October, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

Comments: 16 pages, 14 figures, 2 tables

Report number: FERMILAB-PUB-19-170-CD-CMS-E-ND

Journal ref: Comput Softw Big Sci (2019) 3: 13

arXiv:1803.00844 [pdf, other]

doi 10.1088/1748-0221/13/05/T05008

Production and Integration of the ATLAS Insertable B-Layer

Authors: B. Abbott, J. Albert, F. Alberti, M. Alex, G. Alimonti, S. Alkire, P. Allport, S. Altenheiner, L. Ancu, E. Anderssen, A. Andreani, A. Andreazza, B. Axen, J. Arguin, M. Backhaus, G. Balbi, J. Ballansat, M. Barbero, G. Barbier, A. Bassalat, R. Bates, P. Baudin, M. Battaglia, T. Beau, R. Beccherle , et al. (352 additional authors not shown)

Abstract: During the shutdown of the CERN Large Hadron Collider in 2013-2014, an additional pixel layer was installed between the existing Pixel detector of the ATLAS experiment and a new, smaller radius beam pipe. The motivation for this new pixel layer, the Insertable B-Layer (IBL), was to maintain or improve the robustness and performance of the ATLAS tracking system, given the higher instantaneous and i… ▽ More During the shutdown of the CERN Large Hadron Collider in 2013-2014, an additional pixel layer was installed between the existing Pixel detector of the ATLAS experiment and a new, smaller radius beam pipe. The motivation for this new pixel layer, the Insertable B-Layer (IBL), was to maintain or improve the robustness and performance of the ATLAS tracking system, given the higher instantaneous and integrated luminosities realised following the shutdown. Because of the extreme radiation and collision rate environment, several new radiation-tolerant sensor and electronic technologies were utilised for this layer. This paper reports on the IBL construction and integration prior to its operation in the ATLAS detector. △ Less

Submitted 6 June, 2018; v1 submitted 2 March, 2018; originally announced March 2018.

Comments: 90 pages in total. Author list: ATLAS IBL Collaboration, starting page 2. 69 figures, 20 tables. Published in Journal of Instrumentation. All figures available at: https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PLOTS/PIX-2018-001

Journal ref: Journal of Instrumentation JINST 13 T05008 (2018)

Showing 1–10 of 10 results for author: Hauck, S