0% found this document useful (0 votes)
28 views33 pages

Weirhsshpreport 1

This document describes using Python and the Uproot library for analyzing data from Jefferson Lab's Super-BigBite spectrometer. Uproot provides an alternative framework to ROOT for analyzing particle data by streaming ROOT files into Python libraries like NumPy and Matplotlib. The author explores implementing Uproot to analyze Run 13747 from the Super-BigBite detector and creating a Jupyter Notebook to share the process and findings. The document also includes an abstract, acknowledgments, and glossary of particle physics terms.

Uploaded by

api-693345270
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views33 pages

Weirhsshpreport 1

This document describes using Python and the Uproot library for analyzing data from Jefferson Lab's Super-BigBite spectrometer. Uproot provides an alternative framework to ROOT for analyzing particle data by streaming ROOT files into Python libraries like NumPy and Matplotlib. The author explores implementing Uproot to analyze Run 13747 from the Super-BigBite detector and creating a Jupyter Notebook to share the process and findings. The document also includes an abstract, acknowledgments, and glossary of particle physics terms.

Uploaded by

api-693345270
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Running head: DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE

A Journey into High-Energy Data Analysis: Enabling In-


Depth Analysis with Python for the Super-BigBite
Spectrometer
_____________________________________________

Dominic Weir

High School Summer Honors Program

Thomas Jefferson National Accelerator Facility

Prepared in partial fulfillment of the requirements of the Department of Energy’s Office of


Science and Thomas Jefferson Accelerator Facility under the direction of Alexandre Camsonne,
PhD, in the Hall A/C division at Thomas Jefferson National Accelerator Facility.

Participant: ________________________________________
Signature

Research Advisor: ________________________________________


Signature
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE ii

Abstract

Scientists at Jefferson Lab leverage the use of ROOT, a CERN-developed object-oriented

computer program and library, to conduct analysis on high-energy particle events. ROOT

employs a hierarchical tree structure for the organization of detector data, thereby enabling

efficient computational capabilities that meet the demanding requirements of nuclear physicists

at major laboratories worldwide. In addition to its inherent graphical user interface, ROOT

incorporates histogramming, graphing of distributions and functions, 3-dimensional

visualization, and statistical treatment. However, ROOT can be difficult for beginners without

C++ background, and has various flaws in its design and implementation, in addition to limited

applications outside of particle physics. Uproot provides an alternative data analysis framework

independent of C++ ROOT, intended to stream data into machine learning libraries in Python.

The input/output framework relies on NumPy and Matplotlib, two popular python libraries most

users are well acquainted with, to cast blocks of data from the ROOT file and perform the same

graphical analysis as possible in ROOT. This paper explores the implementation of Uproot for

the purpose of analyzing Run 13747 from Hall A’s Super-BigBite detector and includes the

production of a Jupyter Notebook to disseminate the process and findings.


DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE iii

Acknowledgments

I would like to express my heartfelt gratitude to everyone who made my Jefferson Lab

internship an exceptionally rewarding experience.

First and foremost, I extend my sincere thanks to my mentor, Dr. Alexandre Camsonne,

for his invaluable guidance, unwavering support, and prompt assistance throughout the

internship. His willingness to answer all my questions has been instrumental in my learning and

growth. I am also grateful to my alternative mentor, Sanghwa Park, for providing me with

essential fundamentals at the beginning of the internship, which laid a strong foundation for my

work. Furthermore, I extend my appreciation to the JLab Science Education team, especially

Steve Gagnon and Carol McKisson, for their continuous support, engaging physics

demonstrations, and facility tours. Their efforts have enhanced my overall experience at

Jefferson Lab. Additionally, I am thankful to all the summer series lecture speakers for

generously sharing their knowledge and experiences, enriching my understanding of various

topics related to high-energy particle events. Special thanks go to the Director of this wonderful

Lab, Dr. Stuart Henderson, for not only managing the facility but also taking the time to interact

with us interns and share his personal experiences and wisdom. His leadership has been truly

inspiring.

I owe a special debt of gratitude to my partner, Connor Carpenter, for our fruitful

collaboration on exploring the SuperBigBite spectrometer. His computer science skills

complemented my physics expertise, and together, we achieved remarkable results. I also extend

my appreciation to him for authoring the PyROOT tutorial. In addition, I would like to

acknowledge Angelina Nair and Esha Sing for their contributions to the Main notebook and the

ROOT notebook.
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE iv

I also want to express my gratitude to all the teachers at Ocean Lakes High School,

particularly the magnet physics teacher, William Isel, for instilling critical thinking skills that

proved invaluable during my nuclear physics internship. I would also like to extend thanks to

Allison Graves, my Senior Project Advisor, for her encouragement and support in pursuing this

internship opportunity. Likewise, I am grateful to Ruoming Shen for informing me about the

opportunity at Jefferson Lab, as her positive experience here inspired me to apply.

Next, my heartfelt thanks go out to my mom and dad for their unwavering support and

encouragement throughout this journey. Their belief in me and my dreams has been a constant

source of motivation. Lastly, I want to express my sincere gratitude to my grandpa, Dr. Daniel

Larusso, for listening to all my stories from Jefferson Lab during my long car rides home. His

continuous support and enlightened wisdom have been a source of inspiration for me, and I am

grateful for his presence in my life.

To all the individuals mentioned and to those behind the scenes who contributed to my

internship experience, thank you for making it a truly memorable and transformative journey.
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE v

Glossary

Particles:

1. Antiquark: A fundamental particle that possesses properties identical to those of a quark,

except for the sign of certain quantum numbers. Specifically, each quark flavor has a

corresponding antiquark flavor with the same mass and spin, but opposite electric and

color charge.

2. Baryons: Composite particles composed of three quarks, participating in strong

interactions due to their fractional electric charges and non-zero spin.

3. Electrons: Elementary particles with a negative charge, orbiting the nucleus of atoms, and

having a spin of 1/2.

4. Gluons: Elementary particles mediating the strong force that binds quarks together within

hadrons.

5. Hadrons: Composite particles consisting of quarks, including baryons (e.g., protons and

neutrons) and mesons.

6. Kaons: Mesons composed of one strange quark and one antiquark, playing crucial roles

in particle interactions.

7. Leptons: A family of fundamental particles, including electrons, that do not experience

strong interactions and have a spin of 1/2.

8. Mesons: Hadrons composed of one quark and one antiquark (e.g., pions and kaons),

exhibiting integer spins.

9. Neutrons: Neutral subatomic particles found in atomic nuclei, made up of quarks with

fractional electric charges and a spin of 1/2.


DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE vi

10. Photons: Elementary particles of light and electromagnetic radiation, carrying no electric

charge and having a spin of 1.

11. Pions: Pions are mesons, which are composite particles composed of an up quark and a

down antiquark, or a down quark and an up antiquark. They play essential roles in the

strong nuclear force, mediating interactions between nucleons (protons and neutrons) in

atomic nuclei.

12. Proton: A positively charged subatomic particle found in atomic nuclei, composed of

quarks (two up quarks and one down quark) and exhibiting a spin of 1/2.

13. Quarks: Elementary particles that combine to form protons and neutrons (baryons), as

well as other hadrons like mesons, characterized by fractional electric charges and half-

integer spins. Quark flavors refer to the distinct types of quarks: up, down, strange,

charm, top, and bottom. Each flavor possesses unique properties, such as electric charge

and mass.

14. Virtual photon: a particle-like entity in quantum field theory that cannot be directly

measured or observed, which momentarily arises during particle interactions and

mediates the electromagnetic force, facilitating the exchange of energy and momentum

between charged particles.

Detectors and Processes:

1. Calorimeter: A particle detector that measures the energy of particles by stopping them

and analyzing the energy they deposit in the detector material.

2. Dynodes: Electron multipliers used in detectors to amplify weak signals generated by

particles interacting with the detector material by increasing the number of electrons.

High voltages applied to the dynodes causes the multiplication of electrons through the
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE vii

process of secondary emission, where each incident electron striking a dynode releases

multiple electrons, leading to an overall increase in the number of detected electrons.

3. Feynman diagram: A graphical representation in quantum field theory depicting particle

interactions, underlying processes, and the role of mathematical integrals in calculating

probabilities. The Feynman diagram also provides information about the positions of

particles in space and time during the interaction.

4. Form factors: Mathematical functions describing how the internal structure of a particle

influences its interactions.

5. Kinematic phase space: The region of possible values for particle momenta and energies

in a particular interaction.

6. Quark confinement: The phenomenon wherein quarks are bound together within hadrons

and cannot exist in isolated form due to the strong force.

7. Resonance: A short-lived particle state formed during high-energy collisions, providing

valuable information about the underlying forces and particles.

8. Scintillator: A material that emits light when struck by particles, assisting in particle

detection and measurement.

9. Short-range correlations: Interactions between nucleons within atomic nuclei at close

distances, offering crucial insights into nuclear structure, often visualized by the nucleons

overlapping.

10. Spectrometers are precision instruments equipped with magnetic or electric fields that

enable the precise measurement of the energy and momentum of charged particles,

facilitating intricate investigations into particle properties, interactions, and kinematic

characteristics.
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE viii

Table of Contents

Abstract .................................................................................................................................................... ii

Acknowledgments................................................................................................................................ iii

Glossary ..................................................................................................................................................... v

Physics ...................................................................................................................................................... 1

Detectors .................................................................................................................................................. 5

Data ............................................................................................................................................................ 8

Materials and Methods ...................................................................................................................... 11

Uproot ............................................................................................................................................................................... 11

Jupyter .............................................................................................................................................................................. 12

Uproot Notebook........................................................................................................................................................... 13

ROOT Trio ...................................................................................................................................................................... 17

Jefferson Lab Experience .................................................................................................................. 18

References ............................................................................................................................................. 23
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 1

A Journey into High-Energy Data Analysis

Enabling In-Depth Analysis with Python for the Super-BigBite Spectrometer

At the Thomas Jefferson National Accelerator Facility (JLab), Experimental Hall A is

devoted to investigating the structure of nuclei through two high resolution spectrometers at

precise angles (Alcorn et al., 2004). Initiating each experiment, an electron beam with energies

as high as 11 GeV is aimed at a target; 1 after the reaction the electron scatter is measured by the

BigBite spectrometer, while the recoil particle (the scattered particle ejected from the target) is

measured by the Super-BigBite detector. By examining the scattered particles under different

initial conditions, scientists are able to infer the properties of the nucleons and their constituent

quarks.

Physics

According to Conseil Europeen Pour La Recherche Nucleaire (CERN), elastic lepton-

hadron scattering can be used to measure the size of the hadron (Smirnova & Hedberg, 2005).

Run 13747 was an experiment at Jefferson Lab designed to investigate the size of a proton

(Benmokhtar et al., 2008). This experiment involved aiming the beam of electrons at a liquid

hydrogen-1 target with the intention of an elastic collision ejecting a proton. The primary

observable to measure such experiments is the scattering cross section. The differential cross-

𝑑2 σ
section gives the probability of protons scattering into a particular solid angle 𝑑Ω =
𝑑Ω𝑑ν

𝑑𝜙𝑑𝜃𝑠𝑖𝑛(𝜃) and change in energy transferred to the proton in the proton-rest frame 𝑑𝜈 (Zheng,

2021). Figure 1 shows the spectrum of scattering cross-section for 𝑒𝑝 collisions at a fixed four-

momentum transfer squared 𝑄 2 .

1
All equations and quantities in this report are in natural units: 𝑐 = ℏ = 1
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 2

Figure 1

As shown in Figure 1, elastic collisions between the incident electron and the proton have

a relatively small energy transfer as this process involves no internal excitation or change in the

quantum state of the proton. On the other hand, in semi-inelastic delta collisions, the scattering

process involves exciting the proton to a higher energy state; one possible state is the delta

resonance (Δ). The delta resonance occurs when one of the quarks in the proton is excited to a

higher energy state while remaining bound within the baryon. The Δ baryons have a mass of

about 1232 MeV, as opposed to 939 MeV of an ordinary nucleon; however, they quickly decay

via the strong interaction into a nucleon and a pion of appropriate charge (Nave, 1998). Other

semi-inelastic collisions are those that result in the N* excitations; these resonances are higher in

energy usually corresponding to one of the quarks having a flipped spin state, or with different

orbital angular momentum when the particle decays. As opposed to Δ baryons that only decay

through pion production, N* resonances decay through various channels. That being said, the

most common decay mode is still the emission of pions that are responsible for carrying away

excess energy and angular momentum. However, for sufficiently high energy resonance states,

heavier mesons including Eta mesons and Kaons may be emitted. Note that both Eta mesons’

and Kaons’ composition include a strange quark, while protons consists only of up and down
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 3

quarks. This evolution is possible through the strong interaction mediated by gluons. For

example, the down quark of a proton may annihilate an anti-down quark of a pion emitting a

gluon. This gluon then materializes into a strange and anti-strange pair, the latter pairing with the

up quark to form a positively charged kaon. This reaction is summarized by the Feynman

diagram in Figure 2 (Govind, 2020).

Figure 2

Nevertheless, both delta and N* resonances decay back to stable nucleons by emitting

particles, preserving the overall baryon number and charge of the nucleon system. On the other

hand, Deep Inelastic Scattering (DIS) involves high-energy electrons (or exchange photons)

scattering off individual quarks within the nucleon. During this process, the virtual photon

interacts with a quark, probing the nucleon's internal structure at short distances and high

momentum transfers, and probing the nucleon's substructure through parton distribution

functions (PDFs). These PDFs provide essential information about the momentum distributions

of quarks and gluons within the nucleon, revealing their contributions to the nucleon's total

momentum and spin. In fact, deep inelastic electron-proton scattering experiments led to the

discovery of quarks in 1968 (O’Luanaigh, 2019). However, due to the phenomenon of quark

confinement, isolated quarks cannot be observed directly. Instead, the scattered quarks fragment
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 4

into collimated sprays of hadrons in the final state, a process known as hadronization. Thus,

detectors only observe a tight cone of hadrons.

As an elastic electron-proton scattering experiment, run 13747 is summarized by the

expression 𝒌 + 𝒑 = 𝒌′ + 𝒑′ , where 𝒌 and 𝒑 are the 4-momentnum vectors of the initial electron

beam and proton, respectively, and the scattered states are represented by the prime symbol (‘);

this is best visualized by the Feynman diagram in Figure 3.

Figure 3

After the collision, the electron beam with relativistic energy of 𝐸𝑒 traveling along the z-axis

scatters at some angle 𝜃𝑘 and the proton scatters at an angle 𝜃𝑝 in the xz-plane. At a speed near

that of light, the mass of the electron is negligibly small, and by the Eisenstein Energy–

Momentum Relation (equation 1), the momentum is approximately equal to its relativistic

energy.

(𝐸𝑒 )2 = (𝑚𝑒 )2 + (𝑝𝑒 )2 (1)

On the other hand, in the lab frame the proton’s relativistic energy is its invariant mass (mp = 938

MeV). Consequently, the two initial vectors are defined as shown below:

𝐸 𝐸𝑒 (2)
𝑘 = (𝑝𝑥 ) = ( 0 )
𝑝𝑧 𝐸𝑒
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 5

𝐸 𝑚𝑝 (3)
𝑝
𝒑 = ( 𝑥) = ( 0 )
𝑝𝑧 0

After the collision, 4-momentum is conserved between the two particles. Assuming only

scattering in the xz-plane, and once again that the mass of the electron is negligibly small, the

conservation equation is derived resulting in Equation 4.

𝐸𝑝′ (4)

𝐸𝑒 𝑚𝑝 𝐸𝑒′ 2 2
′ √(𝐸𝑝′ ) − (𝑚𝑝 ) sin θ𝑝
( 0 ) + ( 0 ) = ( 𝐸𝑒 sin θ𝑘 ) +
𝐸𝑒 0 𝐸𝑒′ cos θ𝑘 2 2
√ ′
( (𝐸𝑝 ) − (𝑚𝑝 ) cos θ𝑝 )

The final energy states, as well as the position and angles of the scattered particles are

measured by the detectors. Because the physical quantities associated with the scattering of

electrons on protons depends on the electric and a magnetic form factor of the proton,

measurements of the cross section can be used to determine the form factor and hence the charge

distribution, or size, of the proton (Smirnova & Hedberg, 2005).

Detectors

Hall A at Jefferson Lab is currently equipped with two state-of-the-art spectrometers, the

BigBite Spectrometer and the SuperBigBite Spectrometer, both of which play crucial roles in

high-energy particle event analysis. BigBite, named for its large momentum and angular

acceptance, is the first of the two high resolution spectrometer detectors and was installed in

2007 (Liyanage & Wojtsekhowski, 2007). It is designed for detecting, tracking, and identifying

scattered electrons at high luminosity to map out the kinematic phase space. In 2021, the

SuperBigBite detector (SBS) was commissioned on the right side of the beam to detect high-

energy protons and neutrons (Puckett, 2021). The two spectrometers complement each other

well, providing unique kinematic coverage and excellent systematics control.


DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 6

The SBS has several components intended to measure the momentum and energy of the

hadrons. An overview of the spectrometer is shown in Figure 4.

Figure 4

Upon first entering the spectrometer, the charged particles are deflected by the 48D48 warm coil

magnet. This deflection is vital to distinguishing protons from neutrons once they make it to the

Hadronic Calorimeter.

First though, the particles travel through the Gas Electron Multiplier (GEM) detectors.

These detectors take advantage of the electrons stripped from the gasses’ molecular orbitals by

ionizing radiation. A suitable voltage is applied across the polyimide foil guiding the electrons to

microholes spread across the GEM. In the presence of strong electric fields, microholes serve as

sites of electron acceleration, leading to collisions with surrounding atoms that release additional

electrons. As these affected regions accumulate sufficient electrons, they transform into an

electrically conductive medium, generating a significant current that can be detected and read by

electronic devices (Fabio Sauli, 2016). This information reveals the position of the particles

passing through, and ultimately the momentum by analyzing the particle’s deflection in the

presence of the magnetic field.


DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 7

After traveling through the 3 low interference GEMs, the particles enter the Hadronic

Calorimeter (HCal) shown in Figure 5.

Figure 5

The HCal consists of 288 detector modules aligned in a 12 wide by 24 high array. Figure 6

shows an individual module.

Figure 6

The module consists of iron plates interleaved with scintillator planes. Particles hit the iron plates

initiating a hadronic shower dominated by a succession of inelastic hadronic interactions. The

cascade of secondary particles excites the electrons in the scintillator tiles producing photons

proportional to the energy absorbed. Running the length of each module is single wavelength

shifter that’s directs the photons to a 2-inch diameter Photo Multiplier Tube (PMT) mounted on

the back. A photocathode is located at the opening of the PMT. As dictated by the photoelectric
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 8

effect, the photocathode ejects an electron that is directed to the electron multiplier by the

focusing electrode. The electron multiplier consists of several dynodes arranged in increasing

potential. The electrons are accelerated towards the dynode, striking the surface producing

several more electrons by secondary emission. Each voltage difference increases the totally

energy of the electrons while each collision increases the number of electrons. By the time the

group of electrons travel to the anode, a large enough current is produced to be read by

electronics. This data is crucial in determining the energy of the original hadron, but also give

information regarding the locations of the clusters. A diagram of a PMT is shown in figure 7.

Figure 7

Data

There are several steps involved in converting the signals produced by the detectors into

data that the physicists can work with. However, the first step is the trigger system that rapidly

evaluates which events in a particle detector to keep based on a trigger menu because only a

small fraction of the total can be recorded. At the same time, the Data Acquisition System

(DAQ) is responsible for temporarily storing the data pending the trigger decision, and then

recording data from the selected events in a suitable format. A complex interaction between the

initial steps of the DAQ, the triggering system, and time-to-digital converters are incorporated to
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 9

minimize dead time (time periods when interesting interactions cannot be selected) and correctly

assign the cross sections to their respective events (Ellis, n.d.).

One of the most crucial aspects of preparing the data is the Analog-to-digital conversion

(ADC). The ADC process transforms the continuous analog signals into discrete digital values.

This conversion is performed by sampling the continuous signal at regular intervals and

assigning numerical values to these samples based on their amplitude. The greater the resolution

as determined by its bit length, the greater the generated piecewise signal represents the original

analog signal (Gudino, 2018). The digital data is then temporarily stored in Data Buffers, high-

speed memory elements that can handle the large data rates produced by the detectors. They

store the raw data from multiple events before the data is forwarded to the Event Builders for

further processing. The Event Builders are responsible for assembling the data fragments from

the Data Buffers to form complete events, each of which are data related to a specific particle

interaction. Event Builders efficiently combine the data from multiple detectors and channels to

create coherent event data. Next, the Readout Controllers manage the flow of data from the

Event Builders to the next stage of data processing. They organize the data into packets and

ensure that the data is properly transmitted and recorded. After passing through the Event

Builders, the data from different detectors and channels are combined to form complete events.

However, the data is still in a raw and unorganized format. In the back end of the DAQ,

the data is formatted into a standardized structure to ensure consistency and ease of analysis. To

manage the storage and transfer of the large amount of data efficiently, data compression

techniques may be employed. Additionally, the back end of the DAQ organizes the data into

logical units, such as data files or data streams. The data is then divided into smaller chunks to

facilitate parallel processing and easy access during analysis. This organized and compressed
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 10

data is stored in a standardized format known as ROOT files; a framework developed by CERN

to store a hierarchical tree of columnar data. Along with the raw event data, additional

information known as metadata is also stored. Metadata includes details about the experiment

setup, detector calibration, event timestamps, and other relevant information that is crucial for

data analysis and interpretation.

At the core of ROOT is the TTree class, representing the tree structure that organizes data

hierarchically. The TTree efficiently stores events as entries, each associated with specific data

variables or properties, known as branches. These branches act as columns, holding arrays or

simple data types that represent measurements or attributes of the events. One of the essential

features of the TTree is its ability to store metadata in the form of TNamed objects. Additionally,

the TTree supports data compression to reduce storage requirements and batch processing for

efficient parallelization. Furthermore, ROOT is not only limited to data storage and

manipulation; it also provides a comprehensive suite of data visualization tools. Researchers can

create a wide range of plots, histograms, and graphs to gain insights into their data and explore

patterns or trends visually (ROOT, n.d.). Overall, ROOT's Tree structure and accompanying

features make it an indispensable tool for analyzing complex scientific datasets, providing

researchers with the means to explore fundamental physics principles and make significant

contributions to particle physics. However, while a powerful and widely used (in particle or

nuclear physics) framework, ROOT has faced criticism regarding its limited documentation and

steep learning curve. Novices often find it challenging to grasp the intricacies of the C++

programming language used in the complex framework of ROOT. Moreover, its design and

implementation have been subject to scrutiny due to issues such as code bloat, heavy reliance on

global variables, and an overly complex class hierarchy. These aspects have occasionally led to
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 11

frustration among developers and have prompted discussions about improving the framework's

usability and architecture to address these concerns.

Materials and Methods

This research sought to ascertain the viability of Uproot, a Python input/output library

specialized in reading ROOT files, as an alternative to the more challenging C++ ROOT

framework for physicists' data analysis requirements. The exploration involved a systematic

evaluation of Uproot's capabilities to replicate the essential functionalities offered by ROOT,

serving as a suitable platform for comprehensive data processing and manipulation tasks. To

foster broader understanding and proficiency among the scientific community, a Jupyter

Notebook tutorial was formulated, providing a structured guide for researchers to harness Uproot

effectively in their analytical pursuits.

Uproot

Building upon the utilization of Uproot, this section delves into its technical capabilities,

comparing it to the C++ ROOT framework, and showcasing its seamless integration with

NumPy and Matplotlib, offering physicists a user-friendly solution for comprehensive data

analysis. As a Python-based library, this solution leverages the simplicity, flexibility, and

extensive ecosystem of Python. For example, while C++ ROOT requires researchers to manage

multiple objects, such as TFile, TBranch, and TTree, to access and extract data properly, Uproot

streamlines the process with a more intuitive and straightforward Python syntax. Videlicet,

researchers can effortlessly access data from a ROOT file using Uproot's simple syntax, such as

uproot.open() to open a file and treeName["branchName"].array() to extract

data from branches. Leveraging Python's versatility and eliminating the need for low-level
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 12

memory management, Uproot empowers researchers to concentrate on their scientific

exploration rather than grappling with programming intricacies.

However, while C++ ROOT has its own numerical computation and graphics features,

Uproot incorporates these powerful capabilities through integration with the NumPy and

Matplotlib library, respectively. By employing NumPy arrays and vectorized operations, Uproot

efficiently processes and manipulates large datasets, offering competitive (albeit slightly inferior)

performance compared to C++ ROOT. Furthermore, Uproot seamlessly integrates with

Matplotlib, a widely used Python library for data visualization. This tight integration with

Matplotlib streamlines the data visualization process, allowing for immediate exploration and

representation of data in various types of histograms and scatter plots.

Jupyter

The tutorial was created in JupyterHub, a web-based platform that enables the

development of interactive computing environments for multiple users. By harnessing

JupyterHub, the research project gained valuable access to Jefferson Lab’s computational

environments and resources. This access proved to be beneficial, as it provided a scalable and

powerful computing infrastructure that significantly enhanced the project's data analysis

capabilities, while providing direct access to the ROOT files. Jupyter Notebook, an integral

component of JupyterHub, provided a versatile environment where code execution occurs in

cells, allowing segments of code to be ran individually and iteratively modify specific portions to

analyze graphs and results interactively. This cell-based structure facilitated flexible data

analysis, enabling experimenting with code to visualize immediate outcomes, and refine

analytical approaches step-by-step.


DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 13

The tutorial's creation took advantage of Jupyter Notebook's unique capabilities to

combine Markdown (a lightweight markup language) and Python code in separate cells.

Markdown integration allowed for the seamless inclusion of HTML/CSS and LaTeX, enhancing

the tutorial's explanatory power and technical formatting. This integration provided

comprehensive explanations and descriptions of key concepts, offering learners a structured

guide to comprehensively understand Uproot and its practical implementation.

In addition to its accessibility through JupyterHub, the tutorial was designed to be

compatible with JupyterLab, an integrated development environment independent of Jefferson

Lab’s infrastructure. Users could access the tutorial by downloading the ROOT file locally and

working within JupyterLab. This adaptability made the tutorial widely accessible to researchers

beyond Jefferson Lab’s computational environment, enabling them to leverage Uproot

effectively in their analytical pursuits.

Uproot Notebook

The Uproot Jupyter Notebook tutorial commences with importing the ROOT file, and

then the TTree, that contains the data from Run 13747. Next, the tutorial describes the event data

and how to access it. For the remainder of the notebook, the tutorial works with the energy

(MeV) of the largest clusters in the HCal: TTree["sbs.hcal.e"].array(). This array

would ideally consist only of the energy deposited by the proton; however, due to the inevitable

background noise and some inelastic collisions, the array includes other miscellaneous energies

as well.

Next, the tutorial delineates plotting the energies using Matplotlib. The first example is a

1D histogram plot of the energy clusters. This example of the plotting section is shown below in

Figure 8.
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 14

Figure 8

After the 1D histogram, the tutorial introduces the x and y positions from the center of

the HCal for the energy clusters to graph a 2D positional plot. Succeeding the 2D graph is a 3D

scatter diagram that plots the energy of the clusters against their 2-dimensional position — first

without color and then with a color axis corresponding to the energies, the latter of which is

shown in figure 9. To conclude the section on plotting, the 3D scatter plot is reduced to 2 spatial

dimensions and 1 color dimension.


DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 15

Figure 9

The final section of the notebook is designated to applying selection criteria, or cuts, to

the data set. In this segment, two different types of cuts are demonstrated. The first is energy

cuts, limiting the graph to events where the clusters meet a certain energy threshold. Figure 10

shows the example energy cut provided in the tutorial where the clusters are restricted to those

above 0.5 GeV, effectively reducing the amount of the lower energy background noise. The

second type of selection criteria is positional cuts, restricting the data set to events where the

position vectors of the particles satisfy certain criteria. The first position cut necessary is cutting

the end border of the position space because only some of the energy of the proton is detected by

the HCal, and the rest is lost to the external environment. Including these events would bias the

average energy of the protons. Another effective strategy for data selection is limiting the y-

values to the region specific to the charge of the proton. This approach is facilitated by the

varying deflection of particles in the 48D48 magnetic field, enabling distinct trajectories based

on their charge.
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 16

Figure 10

In conclusion, the Uproot Jupyter Notebook provides users with essential skills for

efficiently analyzing event data stored in ROOT files through the aforementioned Python

libraries. Subsequently, physicists can leverage their expertise in the field to further integrate

data from different detectors, apply customized selection criteria, and conduct in-depth analyses

at a more advanced and sophisticated level. An illustrative example of this is demonstrated by

Dr. Provakar Datta, wherein he crafted a proton spot plot (Figure 11) for Run 13747, adeptly

applying advanced cuts to enhance the visibility of protons in the HCal detector.
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 17

Figure 11

ROOT Trio

In pursuit of advancing data analysis capabilities and fostering ease of access to ROOT,

this project is part of a trio of tutorials geared towards simplifying data analysis using different

interfaces of the ROOT framework. Alongside the Uproot tutorial described earlier, two

additional tutorials have been crafted by other interns, each catering to distinct aspects of ROOT

data analysis.

The first tutorial concentrates on the fundamentals of ROOT and provides comprehensive

guidance on utilizing the native C++ interface. Delving into ROOT's powerful features, this

tutorial equips researchers with a foundation in C++ ROOT, enabling them to harness its

potential for high energy particle analysis.

The second tutorial centers around PyROOT, a Python interface that serves as a bridge

between ROOT's capabilities and the simplicity of Python programming. With PyROOT,

researchers can seamlessly work with ROOT data, harnessing Python's versatility and powerful

data analysis libraries to complement ROOT's functionalities. Note that while both Uproot and
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 18

PyROOT offer Python interfaces for ROOT data analysis, they have distinct differences in their

approach and usage. Uproot specializes in reading ROOT files directly into Python data

structures like NumPy arrays, providing a more streamlined and user-friendly experience. On the

other hand, PyROOT allows for a closer integration with ROOT's C++ interface, enabling

researchers to access and manipulate ROOT objects directly, making it suitable for those who are

already familiar with C++ ROOT or need to interact more closely with ROOT's internal

functionalities.

As all tutorials focus on data from Run 13747, conducted in Hall A, a Main file offers a

comprehensive description of the detectors employed in the Hall A experimental setup, as well

as a description of the kinematics of the reaction. This collective knowledge base equips

researchers with essential contextual understanding to effectively analyze data from this specific

experiment.

Consolidated in a GitHub repository, these four notebooks collectively provide a

powerful toolkit for physicists and researchers, offering multiple entry points to the ROOT

framework based on individual preferences and expertise (Jefferson Lab, 2023). By streamlining

and simplifying ROOT data analysis through these tutorials, the trio aims to enhance researchers'

productivity and efficiency, enabling them to make meaningful scientific discoveries from the

wealth of data produced in high-energy physics experiments at Jefferson Lab.

Jefferson Lab Experience

Throughout the course of this internship, I embarked on a journey that initially seemed

daunting, with a vast realm of new knowledge to acquire and challenges to overcome. However,

I found great enjoyment in delving into this exciting world of nuclear physics, as well as the

opportunity to meet remarkable individuals at Jefferson Lab who shared their expertise and
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 19

passion for the subject. This experience has been immensely valuable, as it not only facilitated

significant growth in my technical skills but also provided a profound opportunity for self-

discovery. Immersed in the captivating field of nuclear physics, I have gained a deeper

appreciation for the complexities of scientific research and its potential to unravel the mysteries

of the universe.

Throughout this project, the key to my success lay in integrating numerous resources that

facilitated my learning journey in accelerator and nuclear physics. First and foremost, my

mentors played a pivotal role in providing crucial guidance and support. Their expertise and

willingness to patiently answer my questions proved invaluable in navigating the complexities of

the equipment and understanding the kinematics involved. Their mentorship not only enabled me

to grasp the fundamentals of the field but also provided me with a solid foundation to embark on

more sophisticated analyses.

Nonetheless, self-learning became an indispensable part of my progress, especially in the

realm of mathematics and programming required for this project. I dedicated time to refining my

understanding of rudimentary special relativity and linear algebra, enabling me to delve deeper

into the physics concepts involved. Additionally, I immersed myself in scientific journal articles,

which proved to be an invaluable source of information, expanding my knowledge beyond what

textbooks could offer. In addition to mathematics, I also learned many technical skills that I can

continue to apply outside of Jefferson Lab. The first of which is learning to program in Python.

While a rather moderate transition from my experience in Java, Python still posed certain

challenges, such as adapting to Python's dynamic typing and different approach to object-

oriented programming, which required a degree of adjustment in my coding practices.

Ultimately, despite these initial difficulties, mastering Python's ease of use and readily accessible
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 20

data analysis libraries will prove highly beneficial in my future endeavors, equipping me with a

crucial and versatile skill set for diverse scientific pursuits. Another vital technical skill was

learning LaTeX, a typesetting system widely used in academia, which will be instrumental in

preparing physics reports if I choose to pursue a PhD. In fact, its usefulness is evident in the

present report, as I have employed LaTeX to write the mathematical equations showcased

throughout the document, showcasing its significance in delivering professional and well-

formatted scientific content.

Another crucial resource to my learning experience was interacting with the

undergraduate interns. Their willingness to share their knowledge and insights garnered by a

slightly longer internship (15 weeks as opposed to 6) enriched my understanding of various

scientific inquiries and offered me diverse perspectives on nuclear physics research. I also had

the opportunity to converse with several graduate students who were able to give me a deeper

understanding in a language that was still accessible.

Additionally, the workplace experience was an invaluable aspect of my journey, offering

me opportunities to observe my mentors in their meetings, interact with scientists at Jefferson

Lab during lectures, and engage with the scientific community. Being part of this collaborative

and supportive environment fostered personal and professional growth, instilling in me a deeper

sense of purpose and dedication to my work.

However, one of the most enlightening experiences was participating in the summer

lecture series. These lectures were intended for an undergraduate physics audience, and hence

were the most digestible source of information and tied together my knowledge. These lecturers

were highly enthusiastic about their subject and deepened my understanding of various research

areas within high-energy physics. The lectures could be divided into 3 different categories.
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 21

The first of which is lectures on Jefferson Lab’s accelerator and detector equipment,

from superconducting radio frequency technology to calorimetry. From this category, the

presentation which stood out the most was Dr. Joe Grames’s lecture on the polarization of the

electron beam. In addition to his charismatic presentation skills, he effectively explained how the

electrons are produced via photoemission from the gallium arsenide phosphide alloy, with a bias

towards a particular spin dependent on the polarization of the light source. From there, he

elucidated the significance of polarized electron beams in exploring asymmetries in particle

physics, revealing the interdisciplinarity of nuclear physics and the pivotal role of chemistry in

advancing our understanding of higher energy phenomena. I particularly enjoyed how this

lecture underscored the rich connections between various scientific disciplines, further fueling

my fascination with nuclear physics and its broader implications.

The second group of lectures introduced us to the recent and near future experiments at

Jefferson Lab and their applications. This varied from Molecular Breast Imaging to the positron

beam upgrade. My favorite lecture was on proton therapy by Dr. Cynthia Keppel; I enjoyed

learning about how nuclear physics, and particularly the Bragg peak of ions, is applied to cure

cancer. As she eloquently characterized the partnership between nuclear physics and radiation

oncologists, I found myself deeply intrigued by both fields and their collaborative potential in

advancing medical treatments.

The final group of lectures consisted of pertinent topics encompassing essential

knowledge and skills for research and working as a mentoree in research and development. In

this set of lectures, the topics varied from solving Fermi problems by an MIT physicist to the

ethics regarding research presented by the dean of the ODU College of Science. Nevertheless,

my most cherished lecture in this series was delivered by Dr. Douglas Higinbotham, focusing on
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 22

"The Lifecycle of Nuclear Physics Experiments at Jefferson Lab and the Future Electron Ion

Collider." It was the culminating presentation of the summer lecture series, and fascinating to

see the larger picture. By showcasing the remarkable discoveries of the quasi-elastic electron

scattering experiment that verified the existence of short-range correlations, Dr. Higinbotham

took us on a captivating journey. He masterfully elucidated how this venture commenced with a

discrepancy between the quasi-elastic electron-proton knockout rate and the mean-field theory

prediction, leading to the approval of the experiment by the advisory board in 2001. The

ambitious undertaking involved the construction of a new detector in Hall A to detect neutrons,

and it wasn't until 2008 that the experiment ran, yielding successful results published thereafter

(and several PhDs). Furthermore, Dr. Higinbotham emphasized the iterative nature of this

scientific process, revealing at the most recent Program Advisory Committee the board approved

a short-range correlation experiment into the realm of exotic 3-body correlations.

Overall, the integration of these diverse resources has been instrumental in shaping my

growth and learning experience during this internship. The combination of mentorship, self-

learning, interactions with peers, exposure to lectures, and the engaging workplace environment

has not only equipped me with valuable technical skills but also fueled my lifelong passion for

science. I am deeply grateful for the rich learning experience this project has provided and look

forward to applying this newfound knowledge and enthusiasm in my future scientific endeavors.
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 23

References

Alcorn, J., Anderson, B. D., Aniol, K. A., Annand, J. R. M., Auerbach, L., Arrington, J., Averett,

T., Baker, F. T., Baylac, M., Beise, E. J., Berthot, J., Bertin, P. Y., Bertozzi, W., Bimbot,

L., Black, T., Boeglin, W. U., Boykin, D. V., Brash, E. J., Breton, V., & Breuer, H.

(2004). Basic instrumentation for Hall A at Jefferson Lab. Nuclear Instruments and

Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and

Associated Equipment, 522(3), 294–346. https://doi.org/10.1016/j.nima.2003.11.415

Benmokhtar, F., Franklin, G., Quinn, B., Schumacher, R., Camsonne, A., Chen, J., Chudakov,

E., Dejager, C., Degtyarenko, P., Gomez, J., Hansen, O., Higinbotham, D., Jones, M.,

Lerose, J., Michaels, R., Nanda, S., Saha, A., Sulkosky, V., Wojtsekhowski, B., & Fassi,

L. (2008). Precision Measurement of the Neutron Magnetic Form Factor up to Q^2 =

18.0 (GeV/c)^2 by the Ratio Method. https://www.jlab.org/exp_prog/proposals/09/PR12-

09-019.pdf

Ellis, N. (n.d.). Trigger and data acquisition. CERN. Retrieved August 2, 2023, from

https://cds.cern.ch/record/1017829/files/p241.pdf

Fabio Sauli. (2016). The gas electron multiplier (GEM): Operating principles and applications.

Nuclear Instruments and Methods in Physics Research, 805, 2–24.

https://doi.org/10.1016/j.nima.2015.07.060

Govind, K. (2020, January 6). Pion+ and proton make kaon+ and another strange particle, X.

Why is this the strong interaction? Physics Stack Exchange.

https://physics.stackexchange.com/q/523428
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 24

Gudino, M. (2018, April 17). Engineering Resources: Basics of Analog-to-Digital Converters.

Arrow.com; Arrow.com. https://www.arrow.com/en/research-and-

events/articles/engineering-resource-basics-of-analog-to-digital-converters

Jefferson Lab. (2016). GSPDA Wiki. Jlab.org.

https://gspda.jlab.org/wiki/index.php/Main_Page#tab=Summer_Lecture_Series

Jefferson Lab. (2023, July 6). Data Analysis of Hall A Using ROOT Applications in Jupyter.

GitHub. https://github.com/JeffersonLab/JupyterAnalysis

Liyanage, N., & Wojtsekhowski, B. (2007). BigBite: A new large acceptance spectrometer for

Jefferson Lab Hall A. NASA ADS, E16.008.

https://ui.adsabs.harvard.edu/abs/2007APS..APRE16008L/abstract

Nave, R. (1998). The Delta Baryon. Hyperphysics.phy-Astr.gsu.edu. http://hyperphysics.phy-

astr.gsu.edu/hbase/Particles/delta.html

O’Luanaigh, C. (2019, October 11). Fifty years of quarks. CERN.

https://home.cern/news/news/physics/fifty-years-quarks

Puckett, A. (2021, August 2). SBS Installation in Hall A at Jefferson Lab, July 2021 | Professor

Puckett’s Research Homepage. Professor Puckett’s Research Homepage.

https://puckett.physics.uconn.edu/2021/08/02/sbs-installation-in-hall-a-at-jefferson-lab-

july-2021/

ROOT. (n.d.). ROOT Manual. ROOT; CERN. Retrieved August 2, 2023, from

https://root.cern/manual

Smirnova, O., & Hedberg, V. (2005). Elastic electron-proton scattering.

https://hedberg.web.cern.ch/hedberg/lectures/ch7_2005_lec2.pdf
DATA ANALYSIS WITH PYTHON FOR SUPER-BIGBITE 25

Zheng, X. (2021). A Crash Course for Summer Research.

https://inpp.ohio.edu/~rochej/group_page/tips/crash_course_for_summer_undergrad_rese

arch.pdf

You might also like