CERN Accelerating science

Article
Report number arXiv:2411.09851
Title SymbolFit: Automatic Parametric Modeling with Symbolic Regression
Author(s) Tsoi, Ho Fung (UPenn, Philadelphia) ; Rankin, Dylan (UPenn, Philadelphia) ; Caillol, Cecile (CERN) ; Cranmer, Miles (Cambridge U. (main)) ; Dasu, Sridhara (U. Wisconsin, Madison (main)) ; Duarte, Javier (UC, San Diego (main)) ; Harris, Philip (MIT ; Harvard U. (main) ; IAIFI, Cambridge) ; Lipeles, Elliot (UPenn, Philadelphia) ; Loncar, Vladimir (MIT ; Belgrade, Inst. Phys.)
Publication 2025
Imprint 2024-11-14
Number of pages 45
Note The API can be used out-of-the-box and is available at https://github.com/hftsoi/symbolfit
In: Comput. Softw. Big Sci. 9 (2025) 12
DOI 10.1007/s41781-025-00140-9 (publication)
Subject category Other Fields of Physics ; Computing and Computers ; Particle Physics - Experiment
Abstract We introduce SymbolFit, a framework that automates parametric modeling by using symbolic regression to perform a machine-search for functions that fit the data while simultaneously providing uncertainty estimates in a single run. Traditionally, constructing a parametric model to accurately describe binned data has been a manual and iterative process, requiring an adequate functional form to be determined before the fit can be performed. The main challenge arises when the appropriate functional forms cannot be derived from first principles, especially when there is no underlying true closed-form function for the distribution. In this work, we develop a framework that automates and streamlines the process by utilizing symbolic regression, a machine learning technique that explores a vast space of candidate functions without requiring a predefined functional form because the functional form itself is treated as a trainable parameter, making the process far more efficient and effortless than traditional regression methods. We demonstrate the framework in high-energy physics experiments at the CERN Large Hadron Collider (LHC) using five real proton-proton collision datasets from new physics searches, including background modeling in resonance searches for high-mass dijet, trijet, paired-dijet, diphoton, and dimuon events. We show that our framework can flexibly and efficiently generate a wide range of candidate functions that fit a nontrivial distribution well using a simple fit configuration that varies only by random seed, and that the same fit configuration, which defines a vast function space, can also be applied to distributions of different shapes, whereas achieving a comparable result with traditional methods would have required extensive manual effort.
Copyright/License publication: © 2025 The Author(s) (License: CC-BY-4.0)
preprint: (License: CC BY 4.0)



Corresponding record in: Inspire


 Record created 2024-11-21, last modified 2025-08-27


Fulltext:
2411.09851 - Download fulltextPDF
document - Download fulltextPDF