0% found this document useful (0 votes)
7 views21 pages

Article

This paper outlines a comprehensive approach to building a software quantum simulator that operates on classical computing architectures, focusing on key quantum computing concepts and memory management techniques. It discusses various optimization strategies, including dynamic state pruning and data compression, to enhance simulation performance and scalability. The findings emphasize the balance between computational overhead and memory efficiency, providing a foundation for developing efficient quantum simulators for complex quantum algorithms.

Uploaded by

Gilberto Diaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views21 pages

Article

This paper outlines a comprehensive approach to building a software quantum simulator that operates on classical computing architectures, focusing on key quantum computing concepts and memory management techniques. It discusses various optimization strategies, including dynamic state pruning and data compression, to enhance simulation performance and scalability. The findings emphasize the balance between computational overhead and memory efficiency, providing a foundation for developing efficient quantum simulators for complex quantum algorithms.

Uploaded by

Gilberto Diaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Article

How to Build a Software Quantum Simulator


Gilberto Díaz 1 , Luiz Steffenel 2 , Carlos Barrios 3 and Jean Couturier 4

1 Universidad Industrial de Santander, Bucaramanga, Colombia; [email protected]


2 Université de Reims Champagne-Ardenne, Reims, France; [email protected]
3 Universidad Industrial de Santander, Bucaramanga, Colombia; [email protected]
4 Université de Reims Champagne-Ardenne, Reims, France; [email protected]

Abstract: Software quantum simulators are the most accessible tools for designing and testing 1

quantum algorithms. This paper presents a comprehensive approach to building a software-based 2

quantum simulator designed to run on classical computing architectures. We explore fundamental 3

quantum computing concepts, including state vector representations, quantum gates, and memory 4

management techniques. The simulator prototype implements various memory optimization strate- 5

gies, such as full-state representation, dynamic state pruning, and shared memory parallelization 6

with OpenMP and distributed memory models using MPI. Additionally, data compression tech- 7

niques, like ZFP, are explored to enhance simulation performance by reducing memory footprint. 8

The results are validated through performance comparisons with leading open-source quantum 9

simulators, such as Intel-QS, QuEST, and qsim. Our findings highlight the trade-offs between compu- 10

tational overhead and memory efficiency. This demonstrates that a hybrid approach using distributed 11

memory and compression offers the best scalability for simulating large quantum systems. This 12

work provides a foundation for developing efficient quantum simulators supporting more complex 13

quantum algorithms on classical hardware. 14

Keywords: Quantum Computing; High-Performance Computing; Parallel Computing. 15

1. Introduction 16

One of the primary reasons to develop quantum computing is that, theoretically, it 17

has been demonstrated that it allows efficient solutions to some complex problems whose 18

best-known solution has an exponential cost for the input size. Quantum superposition, 19

quantum uncertainty, and quantum entanglement are powerful resources that we can 20

use to encode, decode, transmit, and process information in a highly efficient way that is 21

impossible in the classical world. 22


Citation: Díaz, G.; Steffenel, L.; Barrios, Recent technological advances have enabled the development of real quantum devices 23
C., Couturier, J. How to Build a accessed through the cloud. However, these devices are expected to be limited in the 24
Software Quantum Simulator.
short term in terms of the number and quality of their fundamental component, the qubit. 25
Quantum Rep. 2024, 1, 1–21.
Most current quantum devices have a limited number of qubits. In the quantum circuit 26
https://doi.org/
model, Atom Computing has 1180 qubits, and IBM Osprey has 433 qubits. In the adiabatic 27

Received: model, the D-Wave 2000Q has 2000 qubits. These quantum computers represent prototypes 28

Revised: that are not scalable and sufficient to test complex quantum algorithms. Constructing a 29

Accepted: full-scale quantum computer comprising millions of qubits is a longer-term prospect. 30

Published: The growing interest in Quantum Computing and the limitations of real quantum 31

Copyright: © 2024 by the authors. devices have caused many organizations to focus on developing software quantum simula- 32

Submitted to Quantum Rep. for tors that run on classical computers. These simulators are trendy tools suitable for testing 33

possible open access publication quantum computing concepts on ideal conditions, avoiding hardware challenges like the 34

under the terms and conditions limited number and quality of physical qubits and quantum error correction. A list of the 35

of the Creative Commons Attri- very recent initiatives is maintained on several websites [1–4]. This large number of projects 36

bution (CC BY) license (https:// reflects the area’s growth and makes it difficult for researchers to decide which tool to use 37

creativecommons.org/licenses/by/ in their research. 38


4.0/).

Version October 24, 2024 submitted to Quantum Rep. https://www.mdpi.com/journal/quantumrep


Version October 24, 2024 submitted to Quantum Rep. 2

Quantum computing simulators, which operate on classical computers, have emerged 39

as valuable and widely used tools in the field of quantum computing. These simulators play 40

a crucial role in the development, testing, and validation of quantum algorithms before they 41

are implemented on actual quantum hardware. One of the primary advantages of quantum 42

simulators is their accessibility. Unlike quantum computers, which are still relatively scarce 43

and often require significant resources and expertise to operate, simulators can be run 44

on conventional computers. This accessibility allows a broader range of researchers and 45

developers to explore quantum algorithms and concepts without the need for physical 46

quantum computing resources. 47

Quantum simulators offer a controlled environment for designing and refining quan- 48

tum algorithms. They can simulate ideal quantum systems without the noise and error 49

rates present in current quantum hardware, providing clearer insights into the theoretical 50

performance of an algorithm. This is particularly useful for educational purposes and 51

theoretical research, where understanding the principles of quantum computation and 52

algorithm design is the main focus. 53

However, the simulation of quantum computing models in classical computers re- 54

quires exponential time and involves highly complex memory management. The problem 55

is that using conventional techniques to simulate an arbitrary quantum process signifi- 56

cantly more prominent than any of the existing quantum prototypes would soon require 57

considerable memory on a classical computer. For instance, to simulate a 60-qubit quantum 58

state, the process would take 18.000 petabytes (18 Exabytes) of classical computer memory. 59

Therefore, researchers try to reduce such challenges by proposing efficient simulators. 60

This work explores the fundamental principles of developing a software-based quan- 61

tum simulator capable of performing simulations on classical computers. 62

2. Fundamental Concepts of Quantum Computing 63

To better understand the quantum computing model, it is necessary to know the key 64

aspects of the inheritance of quantum mechanics. This section describes the fundamental 65

concepts on which quantum computing is based. Readers with knowledge of the field may 66

wish to skip this section. 67

A Logical Qubit is a unitary vector in a two-dimensional Hilbert space. The Boolean 68

states 0 and 1 are represented by a prescribed pair of normalized and mutually orthog- 69

onal quantum states denoted using Dirac’s notation |0⟩ and |1⟩ [5]. The two states form 70

a “computational basis,” and any other (pure) state of the qubit can be written as a su- 71

perposition α|0⟩ + β|1⟩ [6]. Formally, a Quantum State is a vector |ψ⟩ representing a 72

superposition of basis elements | β 1 ⟩, | β 2 ⟩ if it is a nontrivial linear combination of | β 1 ⟩ and 73

| β 2 ⟩, if |ψ⟩ = a1 | β 1 ⟩ + a2 | β 2 ⟩ where a1 and a2 are non-zero. The state of a bit is a particular 74

case of a two-dimensional vector; that is to say, there are only two vectors in the whole 75

two-dimensional vector space with real meaning; these are the two orthogonal vectors 76

|0⟩ and |1⟩, this is depicted in the figure 1a. Conversely, qubits do not suffer from this 77

limitation. The general state of a qubit is |ψ⟩ = α0 |0⟩ + α1 |1⟩ where α0 and α1 are two 78

complex numbers constrained only by the requirement that |ψ⟩, like |0⟩ and |1⟩, should be 79

a unit vector in the complex vector space, in other words, only by the normalization. 80

The Bloch sphere is commonly used to depict a qubit, figure 1b. Two angles represent 81

the state,0 < θ < π and 0 <=  ϕ <= 2π. Thus, the state |ψ⟩ can be rewritten as 82

|ψ⟩ = eiγcos θ
2 |0⟩ + eiϕ sin 2θ
|1⟩ The vector from the origin to the point representing 83

the state makes an angle of θ with the z-axis and its component in the x-y plane make an 84

angle of ϕ with the x-axis. γ is the global phase, which does not affect the measurable 85

probabilities of the quantum state (it only introduces a uniform phase shift to the whole 86

state). The state |0⟩ is the North Pole of the sphere, and the state |1⟩ is the South Pole. 87
N 1
The general equation of a n-qubit state is |ψ⟩ = ∑2x=− 0 α x | X ⟩ Or, in its expanded form: 88

|ψ⟩ = α0 |0...00⟩ + α1 |0...01⟩ + ... + α2n −1 |1...11⟩ (1)


Version October 24, 2024 submitted to Quantum Rep. 3

(a) Bits vs Qubits (b) Bloch Sphere


Figure 1. Visual representation of qubit: a qubit can be written as a superposition α0 |0⟩ + α1 |1⟩.

Where |0...00⟩ = |0⟩ ⊗ ...|0⟩ ⊗ |0⟩...|1...11⟩ = |1⟩ ⊗ ...|1⟩ ⊗ |1⟩. As we can see, a single 89

complex number can specify a single-qubit state, so n complex numbers can specify any 90

tensor product of n individual single-qubit states. 91

The special characteristic of quantum states is that they allow the system to be in a 92

few states simultaneously, this is called superposition [7]. 93

Quantum bits are not constrained to be wholly 0 or wholly 1 at a given instant. In 94

quantum physics, if a quantum system can be found to be in one of a discrete set of states, 95

which we will write as |0⟩ or |1⟩, then, whenever it is not being observed it may also exist 96

in a superposition, or a blend of those states simultaneously [8]. 97

Because a qubit can take on any one of infinitely many states, one can think that a 98

single qubit could store lots of classical information. However, the properties of quantum 99

measurement severely restrict the amount of information that can be extracted from a 100

qubit. Information about a quantum bit can be obtained only by measurement, and any 101

measurement results in one of only two states, the two basis states associated with the 102

measuring device; thus, a single measurement yields, at most, a single classical bit of 103

information [9]. 104

The quantum entanglement describes a correlation between different parts of a quan- 105

tum system that surpasses anything classically possible. It happens when the subsystems 106

interact so that the resulting state of the whole system cannot be expressed as the direct 107

product of the states of its parts [5]. States that cannot be written as the tensor product of n 108

single-qubit states are called entangled states. Thus, most quantum states are entangled 109

[9]. If we can write the tensor product of those states, they are said to be separate states. 110

In the Quantum Circuits model, the fundamental transformation of a quantum state 111

is carried out using Quantum Gates, which are the basic components of quantum circuits. 112

Quantum gates are analogous to classical logic gates but operate on qubits instead of 113

classical bits. 114

To transform the state of the equation 1, we need 2n × 2n unitary matrices. Applying a 115

single-qubit gate G to the i-th qubit of an n-qubit quantum state amounts to multiplying 116

the state vector of coefficients αi by the matrix. 117

12 ⊗ ... ⊗ 12 ⊗ G ⊗ 12 ⊗ ... ⊗ 12 (2)


| {z } | {z }
n − i −1 i

Figure 2 depicts some popular quantum gates. 118

A classical (or non-quantum) algorithm is a finite sequence of instructions or a step- 119

by-step procedure for solving a problem, where each step or instruction can be performed 120

on a classical computer. Similarly, a Quantum Algorithm is a step-by-step procedure for 121

performing each step on a quantum computer. Although all classical algorithms can also 122

be performed on a quantum computer, the term quantum algorithm is generally used for 123

those algorithms that incorporate some essential feature of quantum computing, such as 124

superposition or entanglement. There are three classes of quantum algorithms with clear 125
Version October 24, 2024 submitted to Quantum Rep. 4

Figure 2. Quantum Gates: Pauli X gate acts linearly and it takes the state α|0⟩ + β|1⟩ to the corre-
sponding state in which the role of |0⟩ and |1⟩ have been interchanged; it is the quantum equivalent
of the NOT gate for classical computers. The Hadamard gate is the first authentic quantum gate
because can generate superposition states. Phase Shift Gate is a single qubit gate that leaves the basis
state |0⟩ unchanged and maps the state |1⟩ to eiϕ |1⟩.

advantages over known classical algorithms: algorithms based upon quantum versions of 126

the Fourier transform, quantum search algorithms and quantum simulations. 127

3. Leading Open-Source Simulators for Quantum Computing 128

From a recent review, we take some quantum simulators that currently lead the field 129

to characterize their main properties, performance, execution mode, and simulation results 130

to provide comparison and analysis. To facilitate this task, we work with open-source 131

simulators. These simulators are considered state-of-the-art due to several factors [10]: 132

• Innovative Features: Each simulator offers unique capabilities that set them apart, 133

such as optimized algorithms, integration with widely-used programming frame- 134

works, or novel approaches to handling quantum state representations. For example, 135

qsim’s integration with Cirq and its ability to simulate up to 40 qubits on a high- 136

performance workstation make it a significant tool for developers and researchers. 137

• Adoption and Partnerships: Some of these simulators are backed by major tech 138

companies and have extensive partnerships within the industry, increasing their 139

influence and credibility. 140

• Academic and Commercial Use: These tools are not only used in academic research 141

but are also increasingly adopted by industries for practical applications, which 142

demonstrates their effectiveness and robustness. 143

• Recent Updates and Community Support: The continual updates, community sup- 144

port, and documentation available for these tools contribute to their status as leaders 145

in the field. This ongoing development ensures they remain relevant and useful as 146

quantum computing technology evolves. 147

• Open Collaboration: Open-source projects encourage open collaboration among de- 148

velopers, researchers, and users. Ensuring the source code is available for modification 149

and redistribution fosters a community-driven development approach. This can lead 150

to rapid improvements and innovations, as a diverse group of contributors can work 151

on the software. 152

The combination of these factors makes these simulators outstanding in the current 153

world of quantum computing, pointing towards their innovativeness and leadership in 154

technological advancement. 155

3.1. Algorithm Selected for Testing 156

The quantum Fourier transform was selected to carry out the simulations because it 157

offers several advantages: it is a well-studied quantum algorithm with known properties, 158

making it a reliable benchmark for validating the accuracy and efficiency of quantum 159

simulators on classical hardware. QFT’s performance scales predictably with the number 160

of qubits, allowing researchers to analyze how the simulator handles increasing complexity. 161
Version October 24, 2024 submitted to Quantum Rep. 5

Performing QFT simulations helps estimate the computational resources (memory, pro- 162

cessing power) required for larger, more complex quantum algorithms. Finally, QFT is a 163

crucial component in many quantum algorithms, such as Shor’s algorithm for factoring 164

large integers. Simulating QFT provides a foundation for testing and understanding these 165

more complex algorithms. 166

3.2. Test platform 167

To evaluate the performance of the selected simulators, the following platforms were 168

used: 169

• Platform 1: One of the nodes of the cluster Guane of Supercomputing Center of 170

Universidad Industrial de Santander with the following configuration: two AMD 171

EPYC 9554 64-Core (two threads per core) @ 3.1 GHz Processors and 375 GB of RAM 172

memory. 173

• Platform 2: A workstation with One Intel(R) Xeon(R) E-2136 CPU 6-Core, (two threads 174

per core) @ 3.30GHz processor with 32 GiB of RAM and a NVIDIA Corporation 175

GP106GL Quadro P2000 5GB. This node is used only for GPU-capable simulators. 176

3.3. Intel-QS 177

It is an open-source quantum circuit simulator implemented in C++. It uses multi- 178

processing and has an intuitive Python interface. It is a full-state vector simulator using 179

arbitrary single-qubit gates and gates controlled by two qubits. [11]. The Intel Quan- 180

tum Simulator leverages the full capabilities of an HPC system through its shared and 181

distributed memory implementation. The implementation on a single node incorporates 182

enhancements such as vectorization, threading, and cache optimization through the process 183

of gate fusion. The primary object in the Intel Quantum Simulator (IQS) is the QubitRegister, 184

representing the quantum state of the qubits in the system of interest. When declaring 185

a QubitRegister, the number of qubits must be specified to allocate enough memory to 186

describe their state. The state can then be initialized to any computational basis state, 187

uniquely identified by its index. 188

3.4. Quantum++ 189

Is a general-purpose multi-threaded quantum simulator with high performance. The 190

library is not restricted to qubit systems or specific quantum information processing tasks, 191

being capable of simulating arbitrary quantum processes [12]. Quantum++ is developed 192

using standard C++17 and has minimal external dependencies. It primarily utilizes the 193

Eigen 3 linear algebra template library, which is header-only. Additionally, when available, 194

it employs the OpenMP library to facilitate multi-processing. The primary data types 195

are complex vectors and complex matrices, such as complex dynamic matrices, double 196

dynamic matrices, complex dynamic column vectors, complex dynamic row vectors, etc. 197

3.5. qsim 198

Developed by Google, qsim is an optimized quantum circuit simulator that uses gate 199

fusion and vectorized instructions to simulate up to 40 qubits on a powerful workstation 200

[13]. Integrated with Cirq, it provides a robust environment for developing and testing 201

quantum algorithms. To achieve cutting-edge simulations of quantum circuits, it uses gate 202

fusion, AVX/FMA vectorized instructions, and openMP multi-threading. This relies on 203

cuQuantum to integrate GPU support. 204

3.6. cuQuantum 205

NVIDIA’s cuQuantum SDK is another leading tool, designed to accelerate quantum 206

circuit simulations on GPUs. This toolkit is essential for developers looking to leverage 207

the power of GPUs to enhance simulation performance and scalability. It provides an 208

integrated programming model tailored for a hybrid environment, enabling the combined 209

operation of CPUs, GPUs, and QPUs. 210


Version October 24, 2024 submitted to Quantum Rep. 6

3.7. QuEST 211

QuEST, or the Quantum Exact Simulation Toolkit, is a high-performance open-source 212

quantum computing simulator designed for simulating quantum circuits, state-vectors, and 213

density matrices. Developed by the Quantum Technology Theory Group at the University 214

of Oxford, QuEST is distinguished by its ability to utilize multithreading, GPU acceleration, 215

and distribution, making it highly effective across various computing environments, from 216

laptops to networked supercomputers. The toolkit is capable of simulating both pure 217

quantum states and mixed states with precision, and supports a wide array of quantum 218

operations. It allows for simulations that are extensible and adaptable, thanks to its open- 219

source nature and support for various back-end hardware via its simple and flexible 220

interface [14]. QuEST represents a pure state for a system of n qubits using 2n complex 221

floating-point numbers, with each real and imaginary component having double precision 222

by default. However, QuEST can be configured to use single or quad precision if desired. 223

The simulator stores the state using C/C++ primitives, which means that by default, the 224

state vector alone consumes 16 × 2n bytes of memory. 225

3.8. Qrack 226

Qrack is a high-performance quantum computer simulator that is written in C++ and 227

supports OpenCL and CUDA [15] [16]. It is particularly notable for its ability to simulate 228

arbitrary numbers of entangled qubits, limited only by system resources. Qrack is designed 229

to be embedded in other projects and includes a comprehensive suite of standard quantum 230

gates, along with variations suitable for register operations and arbitrary rotations. The 231

simulator is integrated with other quantum computing frameworks like ProjectQ and 232

Qiskit, enhancing its versatility and application. Qrack also features optimizations for 233

noiseless pure state simulations and includes tools that aid in the control, extension, and 234

visualization of data from quantum circuits. Qrack maintains the state representation in 235

a factorized form to enhance simulation efficiency. A general ket state |ψ⟩ of n qubits is 236

described by O(2n ) complex amplitudes. 237

3.9. Simulators Comparison 238

Regarding academic, community and industry support for these simulators, the 239

continual updates, active support, and documentation for these tools contribute to their 240

status as leaders in the field. This ongoing development ensures they remain relevant 241

and valuable as quantum computing technology evolves. Each of these simulators offers 242

unique features and optimizations, making them suitable for various aspects of quantum 243

computing research and development. Their continual evolution is critical as the quantum 244

computing field strives to solve more complex problems and improve algorithm efficiency. 245

Table 1 shows a comparison of the evaluated simulators of their design properties and 246

optimization mechanisms. 247

Table 1. Leading Open-Source Simulators for Quantum Computing Comparison


Features Intel-QS Quantum++ QuEST Qrack qsim
Optimization Uses MPI for Employs tem- Utilizes GPU Leverages gate fu- Optimizes simu-
Techniques distributed com- plate metapro- acceleration, mul- sion and OpenCL lations with gate
puting, optimizes gramming for tithreading, and for parallel execu- fusion, AVX/FMA
for multi-core compile-time can be distributed tion across differ- vectorized in-
and multi-nodes, optimizations, across networked ent hardware plat- structions, and
supports MKL supports OpenMP supercomputers forms OpenMP
for mathematical for parallelization
operations
Hardware Optimized for Compatible with Supports laptops Designed for Runs efficiently on
Support high-performance various archi- to supercomput- broad compatibil- high-core-count
computing sys- tectures via C++ ers, compatible ity with OpenCL- CPUs and po-
tems, can be standardization, with GPUs and dis- supporting GPUs tentially on any
deployed on cloud no specific hard- tributed systems and CPUs system supported
infrastructures ware acceleration by Cirq
mentioned
Version October 24, 2024 submitted to Quantum Rep. 7

Programming Provides C++ and C++ library with Offers a C library C++ based, with Integrated with
Model Python interfaces, emphasis on flexi- that’s easy to inte- a focus on inte- Cirq, emphasizes
supports state vec- bility and ease of in- grate and extend, grating with other ease of use in
tor simulation tegration with optional quantum comput- Python for simu-
Python bindings ing frameworks lating quantum
like Qiskit circuits
Design Prop- Focuses on Prioritizes modu- Designed for preci- Prioritizes rapid Designed to simu-
erties scalability and lar, generic pro- sion and versatility prototyping and late large quantum
performance gramming for ease in quantum state flexibility for em- circuits with high
across different of adaptation and manipulation bedding in various precision
computational maintenance applications
environments
Unique Fea- Supports dynamic Highly adapt- Extensible and sup- Integrates clas- Deeply integrated
tures circuit simulation able to various ports detailed state sical computing with Google’s
and state manipu- quantum com- analysis tools like elements within quantum comput-
lation during run- puting models fidelity and entan- quantum simula- ing framework,
time due to generic glement measures tions for enhanced providing exten-
programming functionality sive simulation
approach capabilities

Other projects, like XACC and Qiskit, provide a full-stack approach to quantum 248

computing, including a simulator and compilers and the possibility to run the program on 249

real quantum processors. 250

For convenience and agility, those simulators that provided QFT in their examples 251

were compared under equal conditions. First, OpenMP capable simulators are shown in 252

Figure 3a. Second, GPU capable simulators are depicted in Figure 3b. 253

(a) Shared Memory Performance using platform 1 (b) GPU Performance using platform 2
Figure 3. Comparison of the quantum Fourier transform using different simulators and optimization
techniques.

4. Implementing a Simulator 254

To gain a deeper understanding of the fundamental operations of quantum computing 255

and to test the various memory management approaches, a software quantum simulator 256

prototype was developed in C++ (The Memory eFficient Quantum Simulator, TMFQS) [17]. 257

This prototype was designed in such a way that it allows us to change strategies easily 258

through minimal modifications. It allows us to easily adjust the data structures to represent 259

the fundamental concepts of quantum computing and the use of compression libraries. 260

On the other hand, to demonstrate the construction of a software quantum simulator in a 261

simple way, the scope of this work was limited to optimization techniques using shared 262

memory and distributed memory. The implementation using GPUs is left for further work. 263

It has to be pointed out that this prototype does not implement all the concepts of 264

quantum computing, such as quantum error correction, entanglement, measurement and 265

an extended set of quantum gates. The measurement operation, a fundamental aspect 266

of quantum computing, was not implemented in this prototype because the primary 267

focus of this research was to evaluate and optimize memory management strategies for 268

quantum state simulation. The objective was to explore various methods, such as state 269
Version October 24, 2024 submitted to Quantum Rep. 8

pruning, data compression, and parallelization, to enhance the efficiency of memory use 270

in large-scale quantum simulations. Since these techniques do not inherently depend 271

on measurement processes, implementing a measurement function was not essential to 272

achieving the research goals. However, the measurement operation could be incorporated 273

in future iterations to extend the simulator’s capabilities for practical quantum algorithm 274

execution. Several scenarios were implemented to carry out the tests. 275

• Dynamic memory management. The primary purpose is to test the strategy of remov- 276

ing less probable states. 277

• Full State: The objective is to accelerate the simulations, avoiding the overhead intro- 278

duced by the search of the states. 279

• Full State with OpenMP: The intention is to accelerate the simulations of the previous 280

version. 281

• Full State with data compression: The purpose is to test a lossy compression library 282

like ZFP. 283

• Full State with MPI: The main objective of this scenario is to distribute the amplitude 284

vector among different computing nodes, allowing for a greater number of qubits. 285

• Full State with MPI and data compression: Here, data compression was incorporated 286

into the previous scenario. 287

4.1. Representation of Fundamental Concepts 288

As we saw in previous section, the basic simulation concepts include the following 289

elements. 290

• Quantum Register: comprised of states vector and amplitudes vector. 291

• Quantum Gates: matrix representation of quantum gates. Only one-qubit and two- 292

qubit gates were considered. 293

• Applying quantum gates to a quantum register. 294

We have used an array of double-precision floating point numbers to store the ampli- 295

tudes. No data structure has been used to represent the states, since the vector indices are 296

used to refer to them. To implement the method to apply a quantum gate to a quantum 297

register (QuantumRegister::applyGate), we use the technique represented in equation 6 298

proposed by [18]. In the figure 4, we can observe the main classes of the prototype. 299

Figure 4. Class Diagram of the Prototype: QuantumRegister class represents a quantum state and
implements the main method to transform a quantum state (applyGate). The QuantumGate class
implements a small set of quantum gates using the matrix representation.
Version October 24, 2024 submitted to Quantum Rep. 9

4.2. Single Processor Case 300

To evaluate the quantum simulator’s performance and memory management strate- 301

gies, we first consider the case where the simulation is executed on a single processor. In 302

this section, we will explore the key elements involved in simulating quantum systems on 303

a single processor, focusing on the representation of the state vector and the application of 304

quantum gates. 305

4.2.1. State Vector 306

In this subsection, we discuss how the simulator represents the quantum state. Specif- 307

ically, we describe the structure of the state vector, which stores the amplitudes of all 308

possible quantum states, and explain how these amplitudes are organized to optimize 309

memory usage and computational efficiency. 310

The state vector is a linear combination of states represented by the following expres- 311

sion. 312

|ψ⟩ = α0 |0...00⟩ + α1 |0...01⟩ + ... + α2n −1 |1...11⟩ (3)


Where αi are the amplitudes. As we said previously, these amplitudes are complex 313

numbers, so we need two float or two double numbers to represent them in the code. Of 314

course, the state vector must fit in the local memory. 315

The amplitudes of the states are implemented using a single-dimension double- 316

precision array stored in a continuous memory space. To increase performance, a single 317

array was used to store both the real and the imaginary parts of each amplitude; that 318

is, the state vector was linearized. The real parts are placed in the odd positions of this 319

arrangement, and the imaginary parts are placed in the even positions. This strategy avoids 320

jumping between two arrays, one for the real part and one for the imaginary part. Figure 5 321

depicts this data structure. 322

Figure 5. Linearized state vector

4.2.2. Quantum Gates 323

Like other simulators such as Intel-QS, this prototype only implements single-qubit 324

gates and controlled two-qubit gates. The minimum list of quantum gates developed 325

to implement the Quantum Fourier Transform algorithm are: Identity, Hadamard, Con- 326

trolledPhaseShift, ControlledNot, Swap. All these quantum gates are implemented as 327

two-dimensional double-precision arrays. This reduced set of quantum gates limits the 328

simulation of algorithms that require additional gates. However, adding new single-qubit 329

and controlled two-qubit gates is very easy. Just insert the corresponding matrix into the 330

quantumGate.cpp source file. 331

4.2.3. Applying a Quantum Gate to a Quantum Register 332

To apply a quantum gate Gk to the k − th qubit of a state vector |ψ⟩ we have the 333

following result. 334


Version October 24, 2024 submitted to Quantum Rep. 10



α0...00

α′
 0...01 

 .. 
Gk |ψ⟩ = ψ′ = .  (4)
 
α′ 
1...10

α1...11

Applying single-qubit Gates 335

The first traditional approach to face this problem is using sparse matrix management 336

methods. However, [18] and [19] states that applying a single-qubit gate Gk 337

 
g00 g01
Gk = (5)
g10 g11
To the k-th qubit of a quantum register of N qubits is equivalent to applying the gate 338

to pairs of amplitudes whose indices differ by k-th bits from their binary index. 339

α′∗...∗0k ∗...∗ = g11 · α∗...∗0k ∗...∗ + g12 · α∗...∗1k ∗...∗


(6)
α′∗...∗1k ∗...∗ = g21 · α∗...∗0k ∗...∗ + g22 · α∗...∗1k ∗...∗
This implies that all state vector elements must be processed in pairs. Let’s see how 340

to apply the Hadamard gate to the first qubit of the state |00⟩. For the values: k = 0, 341

∗... ∗ 0k ∗ ...∗ = 00, ∗... ∗ 1k ∗ ...∗ = 10, α00 = 1 + 0i = 1, α10 = 0 and Hadamard gate. 342

 
1 1 1
H= √ (7)
2 1 −1
Replacing these values in equation 6, we obtain the following results. 343

′ 1 1 1
α00 = √ ·1+ √ ·0 = √
2 2 2
(8)
′ 1 1 1
α10 = √ ·1− √ ·0 = √
2 2 2

Applying two-qubits Gates 344

Similarly, to apply a controlled two-qubit quantum gate to a quantum register, using 345

a control qubit c on a target qubit t, authors of [19] state that the new amplitudes can be 346

obtained by performing the following operations: 347

α′∗..∗1c ∗..∗0t ∗..∗ = g11 · α∗..∗1c ∗..∗0t ∗..∗ + g12 · α∗..∗1c ∗..∗1t ∗..∗
(9)
α′∗..∗1c ∗..∗1t ∗..∗ = g21 · α∗..∗1c ∗..∗0t ∗..∗ + g22 · α∗..∗1c ∗..∗1t ∗..∗
Let’s see how to apply the CPS gate to the second qubit of the state |11⟩ controlled by 348

the first qubit. All amplitudes are equal to 0 except α11 which is equal to 1. Replacing these 349

values in the equation 9 we have: 350

′ ′
α10 = 1·0+0·0 = 0

(10)
α11 = 0 · 1 + eiϕ · 1 = eiϕ
Thus, we obtain the amplitude values for the states |10⟩ and |11⟩ 351

Qubits Order 352

Some simulators, like qiskit, reverse the order of the qubits such that qubit 0 corre- 353

sponds to the least significant bit of the binary representation of the state. In this case, the 354

distance between α′∗...∗0k ∗...∗ and α′∗...∗1 ∗...∗ is equal to 2k . 355


k
Version October 24, 2024 submitted to Quantum Rep. 11

In this work, we maintain the natural order of the qubits. For example, in state |011⟩, 356

qubit 0 is the leftmost, qubit 1 is in the middle, and qubit 2 is the rightmost. Therefore, the 357

distance between α′∗...∗0k ∗...∗ and α′∗...∗1 ∗...∗ is equal to 2(numQubits−1)−(k−th qubit) . To illustrate 358
k
this, figure 6 shows the distance between the states of a 4-qubit state vector. 359

Figure 6. States pairs affected by qubit operation

Generally, a single-qubit gate can be applied to a quantum register performing the 360

following pseudo-code. 361

362
for each amplitude in the state vector 363

do 364

Calculate the new amplitudes for the current state. 365

366

Find the pair corresponding to the current quantum state 367

368

Calculate new values for amplitude of the new state. 369

370

done 371
372

In summary, calculating the amplitudes for the current state and the new affected state 373

is done as follows: Determine the value of the current state’s amplitude using equation 6. 374

Then, find the pair corresponding to the current state, and finally, calculate the value of the 375

latter using that same equation. 376

To find the pair corresponding to the current state, we can use two methods: the first 377

calculates the distance using the relation 2(numQubits−1)−(k−th qubit) , as we explained before. 378

The second method applies an XOR operation between the binary representation of the 379

current state and a sequence of 0s with a 1 placed in the k-th position corresponding to the 380

qubit we are working on. For example, applying a quantum gate on the 0th qubit on a for 381

4-qubits state |0101⟩ we can find the corresponding pair using the following operation. 382

0101
1000
⊕ (11)
1101
This result can be corroborated in figure 6. C++ offers binary operations to execute 383

this operation efficiently and effortlessly. 384

385
unsigned int pos = numQubits - qubit - 1 ; 386

unsigned pairState = currentState ^ ((uint)1 << pos); 387


388
Version October 24, 2024 submitted to Quantum Rep. 12

4.2.4. State Pruning 389

In the version of the simulator where the least probable states are pruned, we use a 390

dynamic memory management because the states are stored non-sequentially in memory. 391

This arrangement results from their computation via equation 6. Consequently, a state 392

search method was developed to facilitate access to a specific state for calculations in 393

subsequent iterations. However, performance is negatively impacted because a lot of time 394

has to be spent searching for a state’s values before they are used in a calculation. Figure 7 395

depicts the order of a 3-qubits state vector after applying a single-qubit gate (Hadamard) 396

on the qubit 0. 397

Figure 7. State vector order using dynamic memory approach

Because of this, the less probable states elimination approach was discarded early, 398

therefore, we focus on pure states, which imply that the state vector contains the complete 399

information about the quantum state; and this approach was adopted for the rest of this 400

simulator’s design. 401

4.2.5. Compressing The State Vector 402

The large volumes of data produced by extreme-scale scientific research and appli- 403

cations have driven the development of various data compression techniques for years. 404

The compression methods are optimized for floating point data. However, they require 405

additional calculations to compress and decompress data before working with it. Leading 406

data compressors can be classified into two categories: 407

• Lossless Compressors This compressors preserve all the data. 408

– They use variable length encoding algorithms such as: Huffman encoding, Arith- 409

metic encoding and Dictionary encoders. 410

– Often produce low compression ratios, typically around 2:1. 411

• Error-Bounded Lossy Compressors allow some controlled distortion. Can be broadly 412

classified into: 413

– The data-prediction-based compression model. 414

– Domain-transform-based compression model. 415

A key objective of this work is to identify a method for reducing memory consumption 416

in a software quantum simulator. To achieve this, we have chosen error-bounded lossy 417

compressors, the compression technique that offers the best compression rate. 418

To compress the amplitude vector, we use the ZFP library [20] as it provides significant 419

performance in accuracy and data size reduction. Although ZFP supports both lossy and 420

lossless compression, as we stated before, we used the lossy approach to gain a better 421

compression rate. 422


Version October 24, 2024 submitted to Quantum Rep. 13

To go from a vector of amplitudes using traditional data types to a compressed vector, 423

change the corresponding line in the types.h source file from typedef std::vector<double> 424

AmplitudesVector; to typedef zfp::array1<double> AmplitudesVector; Of course, 425

the corresponding header file from the ZFP library must be included. 426

4.2.6. Shared Memory Case 427

In order to improve performance, parallelizing the code is necessary. The first method 428
is to apply a shared memory programming model. This was done using OpenMP. 429

We use valgrind to run program profiling and determine the sections of code that con- 430
sume the most resources. Afterward, it was determined that the QuantumRegister::applyGate431
method is the component of the simulator where we had to focus on increasing performance. 432
Figure 8 shows the profiling results. 433

Figure 8. Simulator profiling

The QuantumRegister::applyGate method iterates through the state vector, imple- 434

menting equation 6. To enhance performance, we partition the data and execute instructions 435

on segments of the state vector, thereby speeding up the simulation. It is crucial to carefully 436

manage the method’s internal variables to prevent race conditions. 437

4.3. Distributed Memory Case 438

In the distributed memory model, the state vector needs to be divided among numProcs 439

processes. On the other hand, the equation 6, proposed in [18], indicates that the calculation 440

of the amplitudes of the states must be done in pairs, therefore, we must guarantee that the 441

number of amplitudes per process is even. To achieve this, we use the relationship 442

2numQubits
numProcs = (12)
2m
Where 2m is the number of states per process. In this case we can face two cases: 443

• The pair corresponding to the current state is locally stored. 444

• The pair corresponding to the current state is located in other process. In this case it is 445

necessary to communicate values and results. 446

Figure 9 shows the pairwise calculation scheme for a 5-qubit state vector, applying each 447

qubit. Partitioning with 2, 4, and 8 processes is also shown to visualize the communication 448

process easily when we use the distributed programming model. 449

For instance, consider performing a calculation on qubit 0 of the state |00010⟩; the 450

corresponding pair would be |10010⟩. If two processes are used, communication should be 451

established with process 1. If four processes are utilized, the remote process is process 2. 452

Lastly, if eight processes are employed, the remote process will be process 4. 453

We use the following expression to calculate the process’s identifier where the corre- 454

sponding pair is located. 455

pairState
remoteProcID = (13)
2m
Version October 24, 2024 submitted to Quantum Rep. 14

Figure 9. Pairwise calculation scheme for a 5-qubit state vector

In Figure 9, it is evident that for 2 processes, specifically regarding qubit 0, the number 456

of communications required is 2numQubits/2 . This substantially degrades performance. To 457

mitigate the overhead caused by the extensive number of communications, the entire 458

segment of the state vector is exchanged between the peer processes involved, as outlined 459

in equation 6. The calculations are then made locally, and the results are communicated 460

back to the original process. 461

For this reason, we are unable to use the the total sum of local memory of each node 462

to augment the number of qubits, and can only utilize half of the combined memory from 463

all nodes. Figure 10 depicts this idea. 464

Combining amplitude vector compression with amplitude vector distribution across 465

multiple processes is an approach that can be effective both in terms of efficient memory 466

usage and overall simulation performance. The version where a compressed vector is used 467

to store the amplitudes was parallelized to achieve this. To obtain effective performance, the 468

state vector portions are transmitted in a compressed manner. This makes communications 469

faster because the message size is reduced. 470

To achieve compressed messaging, the compressed portions of the state vector must 471

be serialized, and a custom MPI data type must be used in send and receive functions. 472

4.4. Simulator Verification 473

In order to validate the accuracy of our quantum simulator, we have executed different 474

tests and compared the outputs with intel-qs and quantum++. 475

4.4.1. Quantum Gates Tests 476

To test the superposition principle we apply Hadamard gate to a quantum register of 477

4-qubits. The following quantum circuit was used: 478


Version October 24, 2024 submitted to Quantum Rep. 15

Figure 10. Data exchange between process

Figure 11. Test Quantum Circuit


Version October 24, 2024 submitted to Quantum Rep. 16

The test was executed by initializing the first state with a probability equal to one, 479

that is to say, 1 × |0000⟩. Then, we repeat the experience with 1 × |0001⟩ and so on until 480

executing the test with the last state 1 × |1111⟩. The results of executing this quantum 481

circuit with intel-qs, quantum++ and TMFQS were the same. 482

5. Results 483

This section presents the results of several quantum simulation tests performed using 484

TMFQS software quantum simulator developed in C++. By simulating fundamental 485

quantum operations, we can assess how well these strategies reduce memory consumption 486

and improve the efficiency of quantum computing simulations on classical hardware. 487

Throughout the section, we compare the simulator’s performance with and without the 488

proposed memory management techniques, highlighting the improvements achieved. 489

By providing a comprehensive evaluation of these memory management strategies, this 490

section aims to contribute to ongoing efforts to make quantum computing simulations 491

more efficient and scalable, ultimately advancing the field of quantum computing. 492

5.1. Algorithm Selected for Testing 493

TMFQS was evaluated using the quantum Fourier transform, as in assessing the 494

simulators presented previously. 495

5.2. Test platform 496

To run the simulations, we use two high-performance nodes from the scientific com- 497

puting center of the Universidad Industrial de Santander (SC3-UIS) with the following 498

characteristics: two AMD EPYC 9554 64-Core (two threads per core) @ 3.1 GHz Processors 499

and 375 GB of RAM memory. 500

5.2.1. States Pruning 501

We can free up memory that is not needed by eliminating the quantum states that have 502

the smallest chance of occurring, that is, eliminating those states whose amplitude is close 503

to zero. However, in addition to all the points against this approach like: loss of fidelity, 504

impact on algorithm accuracy, error accumulation, threshold sensitivity and impact on 505

quantum entanglement, this requires dynamic memory management, which introduces a 506

lot of extra work because before applying a quantum gate to a state, we need to search for it 507

in the array due the states are not ordered. That is to say, to apply every quantum gate, we 508

need to execute 2n search operations. This approach was tested using the Quantum Fourier 509

Transform algorithm. The quantum register contains all the states at the end of executing 510

the Quantum Fourier transform algorithm. Due to the initial superposition process, the 511

quantum register also has all the states in the first stage of Grover’s algorithm. Therefore, 512

this approach does not work well for these algorithms. 513

For these reasons, along with the risks outlined previously, we have decided to discard 514

this approach because its numerous disadvantages outweigh its benefits. 515

5.2.2. Full-State Quantum Register 516

A quantum register with all the states arranged in a sequence can reduce the overhead 517

of searching for quantum states. This also eliminates the need for an extra data structure to 518

store the states and uses the indices of the amplitude vector to handle the quantum states. 519

The total amount needed using this strategy 2numQubits ∗ 16 Bytes for double precision 520

floating point numbers. However, we save 2numQubits ∗ 4 bytes, avoiding the state vector 521

array. 522

The graph of figure 12a shows the performance of QFT applying state pruning (dy- 523

namic memory) vs full-state strategies. Simulations using dynamic memory involving 524

more than 20 qubits were discarded due to their execution time exceeding one day. 525

Exponential growth is observed from an 18-qubit state vector in dynamic memory 526

approach. This is a consequence of a substantial increase in the memory needed to represent 527
Version October 24, 2024 submitted to Quantum Rep. 17

the state in question and, therefore, the processing time required. As can be seen by 528

comparing these strategies, the workload overhead using dynamic memory is significant. 529

We have parallelized the full-state version to increase performance using the shared memory 530

model with OpenMP. In the graph of the figure 12b we can see the results. 531

(a) Dynamic Memory vs Full-State Approach (b) QFT Full-State Approach with OpenMP (log10 )
Figure 12. QFT performance with different approaches

We observe a significant decrease in the processing time between the serial and parallel 532

execution of the full-state version as the number of qubits increases. That is, a considerable 533

acceleration is obtained by parallelizing the simulation. However, for clearer interpretation 534

of the results, we show the results calculating the base 10 logarithm of the simulation time. 535

In the graph shown in Figure 12b, it is evident that for smaller numbers of qubits, there is 536

an overhead caused by the setup of the parallel environment. 537

5.2.3. Data Compression 538

We have selected one of the most widely used C++ libraries for data compression, ZFP, 539

to test this approach. We modified the full-state version of the simulator to compress the 540

amplitude vector. The graph of figure 13a shows the performance comparison between 541

full-state vs full-state using ZFP. The base-10 logarithm is used to more clearly highlight 542

the differences between the two simulations. It is evident that the overhead introduced by 543

the compression and decompression process is substantial. 544

The graph of figure 13b shows the amount of memory used by both simulator ver- 545

sions. We can observe that the compression approach is highly efficient. This enables the 546

possibility of increasing the number of qubits in simulations. 547

(a) QFT Full-State with ZFP (log10 ) (b) QFT Full-State with ZFP Data Size
Figure 13. QFT performance with ZFP
Version October 24, 2024 submitted to Quantum Rep. 18

5.2.4. Distribute the Quantum Register Across Multiple Nodes 548

We have developed a simulation version employing MPI to increase memory capacity 549

by leveraging the RAM of additional computer nodes. To uphold computational efficiency, 550

it is essential to underscore the necessity of maintaining an optimal ratio between the 551

number of processes and the allocation of qubits per process. The graph shown in Figure 552

14a demonstrates the performance of the Quantum Fourier Transform across a range of 553

qubit counts from 25 to 30, using 2, 4, 8, 16, 32, and 64 processes. The relationship of the 554

equation 12 is valid only from seven qubits onwards; however, to visualize the performance 555

more clearly, we use the range of 25 to 30 qubits. In this case, a logarithmic scale was 556

unnecessary since the graphs were clear enough. 557

In addition to achieving better performance, we can see that by increasing the number 558

of processes, we can increase the number of qubits and reduce the size of messages required 559

to exchange partial results between processes. We can see also that parallelism is helpful 560

for a number greater than 26 qubits. 561

5.2.5. Combination of Distributed Memory and Shared Memory Approaches 562

We have developed a simulator version that combines MPI with OpenMP to achieve 563

better performance. In this approach, the state vector is evenly distributed across the 564

processes using MPI. OpenMP is then employed to parallelize the applyGate method, 565

further enhancing performance. The graph in figure 14b illustrates the performance of 566

Quantum Fourier Transform using this hybrid approach. Once again, the graph is clear 567

enough; therefore, a logarithmic scale is not necessary. 568

(a) QFT Full-State with MPI (b) QFT Full-State with MPI and OpenMP
Figure 14. QFT performance using parallel techniques. Each line corresponds to a specific number of
Qubits

Comparing the results of the graphs in Figures 14a and 14b, we see that the combina- 569

tion of MPI and OpenMP increases the performance, especially for cases where the size of 570

the state vector portion at each node is large. 571

5.2.6. Combination of Distributed Memory and Data Compression Approaches 572

Taking advantage of distributed resources to have more memory available, combined 573

with data compression, makes it possible to perform simulations with a larger number of 574

qubits. 575

We have already seen that there is a processing overhead introduced by the com- 576

pression process, however, the transmission of compressed data contributes positively to 577

overall performance. We have modified TMFQS to test this approach. Figure 15 shows the 578

execution for a range of 25 to 30 qubits with a variation in the number of processes equal to 579

2, 4, 8, 16, 32, 64. It shows the performance of the distributed memory approach with data 580

compression. 581
Version October 24, 2024 submitted to Quantum Rep. 19

Figure 15. QFT with MPI with Data Compression

In this graph, we can see that performance has decreased; however, the reduction of 582

the required memory is significant because the same strategy used in the section on data 583

compression is adopted here. It has to be pointed out that this strategy is valid only if 584

portions of the quantum register are transmitted in a compressed form. 585

5.2.7. Quantum Simulators Comparison 586

To validate the results obtained with TMFQS, a comparison is made with other sim- 587

ulators. First, specific conditions must be established to allow a fair comparison of the 588

simulators studied. The common conditions were using a single computing node with a 589

shared memory model. Simulators that use GPUs are excluded because their performance 590

is much higher than the others, but their scaling is limited. The case of using distributed 591

memory is also excluded because only some include this capability. The graph in Figure 16 592

shows the performance of the quantum Fourier transform for intel-qs, quantum++, QuEST, 593

and TMFQS using shared memory model. All simulators were compiled with Intel OneAPI 594

suite and traditional optimization flag (-O2). The selected simulators use the C++ complex 595

numbers data type for memory management. They use a full-state vector scheme, which 596

allows better performance but does not reduce memory consumption. 597

Figure 16. Quantum Simulators Comparison

As can be seen in the graph in Figure 16, the Intel-QS simulator performs lower than 598

the other simulators. QuEST exhibited the best performance. TMFQS performs acceptably 599

compared to these mature tools that have been optimized, for example, by using libraries 600

such as MKL in the case of Intel-QS. 601


Version October 24, 2024 submitted to Quantum Rep. 20

6. Conclusions 602

In conclusion, building a software quantum simulator requires a delicate balance 603

between theoretical understanding and practical implementation strategies. The limita- 604

tions of current quantum hardware, including qubit count and quality, drive the need for 605

quantum simulators that allow researchers to explore quantum algorithms on classical 606

computers. This work has shown that memory management techniques, such as dynamic 607

pruning, full-state representation, and data compression, are essential for optimizing the 608

simulation of quantum systems. While pruning techniques introduce certain challenges, 609

such as fidelity loss and increased computational complexity, full-state representation with 610

parallelization (via OpenMP or MPI) provides a robust framework for simulating larger 611

quantum states. The use of data compression, such as ZFP, further extends the capacity to 612

simulate a greater number of qubits without exceeding memory limits, though it introduces 613

some overhead in processing time. 614

The comparative performance of the prototype simulator against established simu- 615

lators like Intel-QS, QuEST, and qsim demonstrates the viability of these memory man- 616

agement techniques. By combining distributed and shared memory models, along with 617

data compression, the simulator can handle increasingly complex simulations. Ultimately, 618

this work contributes valuable insights into making quantum computing simulations more 619

scalable and efficient, supporting the broader field of quantum computing as it continues 620

to evolve. 621

6.1. Further Work 622

While it is critical to investigate the potential of data compression and distribution to 623

enable the simulation of a more significant number of qubits, performing such experiments 624

would require a substantial amount of additional time and computational resources, es- 625

pecially given the exponential growth of memory demands with each added qubit. This 626

work was framed within a more extensive project whose main goal of this study was to 627

explore memory management techniques in practical and scalable scenarios. However, a 628

thorough evaluation of the maximum number of qubits achievable through compression 629

and distribution, while essential, extends beyond the scope of the current research. This 630

evaluation will be addressed in future work, where a more detailed exploration of the 631

trade-offs between memory compression efficiency and computational overhead will be 632

performed. 633

Author Contributions: G.D.: Project development, conceptualization, investigation, formal analysis, 634

software development, methodology, simulations excecutions, manuscript writing. L.S.: supervision 635

and review. C.B.: supervision and review. J.C.: supervision and review. All authors have read and 636

agreed to the published version of the manuscript. 637

Funding: This research received no external funding. 638

Institutional Review Board Statement: Not applicable. 639

Informed Consent Statement: Not applicable. 640

Data Availability Statement: This work did not utilize any input data or generate new output data 641

for analysis. Instead, simulations were conducted, and their results were compared with those from 642

other leading simulators in the field. The outcomes are fully reproducible by running the example 643

scenarios provided with these simulators. 644

Conflicts of Interest: The authors declare no conflicts of interest. 645

Bibliography 646

1. Report, Q.C. Qbit Count. https://quantumcomputingreport.com/scorecards/qubit-count/, 647

2019. 648

2. Quantiki. List of QC simulators. https://www.quantiki.org/wiki/list-qc-simulators, 2019. 649

3. Fingerhuth, M. Open-Source Quantum Software Projects. https://github.com/qosf/os_ 650

quantum_software, 2019. 651

4. Team, Q.O.S.F. Quantum Open Source Foundation. https://qosf.org/, 2019. 652


Version October 24, 2024 submitted to Quantum Rep. 21

5. Bergou, J.A.; Hillery, M. Introduction to the Theory of Quantum Information Processing; Springer 653

Publishing Company, Incorporated, 2013. 654

6. Artur Ekert, P.H.; Inamori, H. Basic concepts in quantum computation. Coherent atomic matter 655

waves 2001, pp. 661–701. 656

7. Shoshany, B. In layman’s term, what is a quantum state? https://www.quora.com/In-laymans- 657

term-what-is-a-quantum-state, 2018. 658

8. Williams, C.P. Explorations in Quantum Computing, Second Edition; Texts in Computer Science, 659

Springer, 2011. https://doi.org/10.1007/978-1-84628-887-6. 660

9. Eleanor, R.; Wolfgang, P. Quantum Computing, A Gentle Introduction; The MIT Press, 2011. 661

10. Insider, Q. Top 63 Quantum Computer Simulators For 2024. https://thequantuminsider.com, 662

2024. Accessed: 2024-05-12. 663

11. Guerreschi, G.G.; Hogaboam, J.; Baruffa, F.; Sawaya, N. Intel Quantum Simulator: A cloud-ready 664

high-performance simulator of quantum circuits, 2020, [arXiv:quant-ph/2001.10554]. 665

12. Gheorghiu, V. Quantum++: A modern C++ quantum computing library 2014. [arXiv:1412.4704]. 666

https://doi.org/10.1371/journal.pone.0208073. 667

13. team, Q.A.; collaborators. qsim, 2020. https://doi.org/10.5281/zenodo.4023103. 668

14. Jones, T.; Brown, A.; Bush, I.; Benjamin, S.C. QuEST and High Performance Simulation of 669

Quantum Computers. Scientific Reports 2019, 9, 10736. https://doi.org/10.1038/s41598-019-471 670

74-9. 671

15. Strano, D. Qrack. https://vm6502q.readthedocs.io/en/latest/, 2019. 672

16. Strano, D.; Bollay, B.; Blaauw, A.; Shammah, N.; Zeng, W.J.; Mari, A. Exact and approximate 673

simulation of large quantum circuits on a single GPU, 2023, [arXiv:quant-ph/2304.14969]. 674

17. Díaz, G. Prototype Quantum Computing Simulator. https://github.com/diaztoro/ 675

TMFQSfullstate.git, 2024. 676

18. Trieu, D.B. Large-Scale Simulations of Error-Prone Quantum Computation Devices. Dr. (univ.), 677

Universität Wuppertal, Jülich, 2009. Record converted from VDB: 12.11.2012; Universität 678

Wuppertal, Diss., 2009. 679

19. Smelyanskiy, M.; Sawaya, N.P.D.; Aspuru-Guzik, A. qHiPSTER: The Quantum High Perfor- 680

mance Software Testing Environment. CoRR 2016, abs/1601.07195. 681

20. Lindstrom, P. Fixed-Rate Compressed Floating-Point Arrays. IEEE Transactions on Visualization 682

and Computer Graphics 2014, 20, 2674–2683. https://doi.org/10.1109/TVCG.2014.2346458. 683

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are 684

solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). 685

MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from 686

any ideas, methods, instructions or products referred to in the content. 687

You might also like