A Modular Quantum Compilation Framework For Distributed Quantum Computing
A Modular Quantum Compilation Framework For Distributed Quantum Computing
ABSTRACT For most practical applications, quantum algorithms require large resources in terms of qubit
number, much larger than those available with current noisy intermediate-scale quantum processors. With
the network and communication functionalities provided by the quantum Internet, distributed quantum
computing (DQC) is considered as a scalable approach for increasing the number of available qubits
for computational tasks. For DQC to be effective and efficient, a quantum compiler must find the best
partitioning for the quantum algorithm and then perform smart remote operation scheduling to optimize
Einstein–Podolsky–Rosen (EPR) pair consumption. At the same time, the quantum compiler should also find
the best local transformation for each partition. In this article, we present a modular quantum compilation
framework for DQC that takes into account both network and device constraints and characteristics. We
implemented and tested a quantum compiler based on the proposed framework with some circuits of
interest, such as the VQE and QFT ones, considering different network topologies, with quantum processors
characterized by heavy-hexagon coupling maps. We also devised a strategy for remote scheduling that can
exploit both TeleGate and TeleData operations and tested the impact of using either only TeleGates or both.
The evaluation results show that TeleData operations can have a positive impact on the number of consumed
EPR pairs, depending on the characteristic of compiled circuit. Meanwhile, choosing a more connected
network topology helps reduce the number of layers dedicated to remote operations.
INDEX TERMS Distributed quantum computing (DQC), quantum compilation, quantum Internet.
EPR pair. The chosen gate set contains every one-qubit gate perform optimization by means of techniques coming from
and two single two-qubit gates, namely the cnot and the the literature, such as a time-expanded representation of the
cz gate (i.e., the controlled version of the Z gate). The au- distributed architecture.
thors considered no restriction on the connectivity between Ovide et al. [26], investigated the performance of the qubit
QPUs. Then, they reduced the problem of distributing a cir- assignment strategy proposed by Baker et al. [27] on the
cuit across multiple QPUs to hypergraph partitioning. The Cuccaro and QFT adder circuits [35], [36], under the as-
proposed approach, which was evaluated against five quan- sumption of local and network all-to-all connectivity. In [27],
tum circuits (including QFT), presents a caveat, in particular qubit assignment was treated as a graph partitioning problem,
there is no way to customize the number of communication under the assumption that a SWAP operation primitive exists
qubits of each QPU. to exchange data qubits between different QPUs, i.e., it is not
Sundaram et al. [20] presented a two-step solution, where required to check if there are free data qubits available on
the first step was qubit assignment. Circuits were represented the QPUs. Ovide et al. showed that, in general, the wider the
as edge-weighted graphs with qubits as vertices. The edge circuit, the higher the number of remote operations, although
weights corresponded to an estimation for the number of it highly depends on the specific circuit to be compiled.
cat-entanglement operations. The problem was then solved
as a minimum k-cut, where partitions had roughly the same III. MODULAR QUANTUM COMPILATION FRAMEWORK
size. The second step was finding the smallest set of cat- As mentioned in Section I, there is a lack of a modular frame-
entanglement operations that would enable the execution work for compiling quantum circuits to DQC architectures.
of all TeleGates. The authors started with the assump- Such a framework should be circuit agnostic, i.e., able to
tion that each remote gate could be executed, by means of compile any circuit to any suitable DQC architecture. Current
a one cat-entanglement, only in the partition of one of its proposals from the literature tackle the problems of qubit
operands. In this setting, the problem could be reduced to assignment and remote gate scheduling, but do not take into
a vertex-cover problem over bipartite graphs, allowing for account the local connectivity of each QPU. The framework
a polynomial-time optimal solution based on integer linear presented in this article and illustrated in Fig. 3 fills the gap
programming. They also provided a O(log n)-approximate between local compilation and compilation for DQC. The
solution, where n was the total number of global gates, for proposed framework is modular, meaning that each module
a generalized setting, where remote gates could be executed tackles a different aspect of the quantum compilation prob-
on an intermediary partition, by means of greedy search lem for DQC. Within the framework, modules do not depend
algorithm. In [21], the same authors extended their approach on the specific implementation of the others, but only on their
to the case of an arbitrary-topology network of heteroge- functionality and outputs, meaning that one could use a better
neous quantum computers by means of a Tabu search algo- implementation of one module without changing any other
rithm. module.
In [22], by Daei et al., the circuit becomes an undirected The quantum compilation framework can take any quan-
graph with qubits as vertices, while edge weights correspond tum circuit and any network configuration as input. Fig. 4
to the number of two-qubit gates between them. Then, the depicts an example of a network configuration, which de-
graph is partitioned using the Kernighan–Lin (K–L) algo- scribes how QPUs are connected into the target DQC archi-
rithm for VLSI design [34], so that the number of edges tecture, including quantum channels capacity, i.e., the num-
between partitions is minimized. Finally, each graph partition ber of communication qubits for each channel. The network
is converted to a quantum circuit. configuration includes descriptions of the internal configura-
In [23], the authors represented circuits as bipartite graphs tions of the QPUs, i.e., the coupling map and the set of avail-
with two sets of vertices—one set for the qubits and one for able data qubits and communication qubits. More specifi-
the gates—and edges to encode dependencies of qubits and cally, data qubits are a qubit subset dedicated to computa-
gates. Then, for the qubit assignment problem, they proposed tional tasks, while communication qubits are another subset
a partitioning algorithm via dynamic programming to mini- reserved for entanglement generation over the network [37].
mize the number of TeleData operations. The coupling map may have any shape. The coupling map is
Nikahd et al. [24] formulated the minimum k-cut partition- a directed graph where each vertex corresponds to a qubit
ing problem as an ILP optimization problem, to minimize and directed edges determine the possibility of executing
the number of remote interactions. They employed a moving two-qubit gates between the connected qubits.1 Fig. 5 shows
time-window and applied the partitioning algorithm to small an example of a coupling map with 20 data qubits and 8
sections of the circuit, thus the partition might change with communication qubits, highlighted in blue.
the moving window by means of TeleData operations. Having these inputs, the first module of the framework re-
In [25], Cuomo et al. modeled the compilation problem gards the qubit assignment. Once a suitable qubit assignment
with an integer linear programming formulation. The formu-
lation is inspired to the vast theory on dynamic network prob- 1 Specifically, the source and destination vertices of a directed edge can
lems. The authors manage to define the problem as a general- be the control and target qubit respectively of a two-qubit gate. An edge
ization of quickest multicommodity flow. This result allows to could be undirected, meaning that both qubits can act as control or target.
FIGURE 3. Workflow of the proposed modular quantum compilation framework for DQC architectures.
A. QUBIT ASSIGNMENT
As mentioned in Section I, the goal is to partition the circuit
has been found, the next module schedules remote gates ac- in order to minimize the communication cost, i.e., the num-
cordingly. Meanwhile, the local mapping of qubits assigned ber of remote operations and consequently, the number of
to each QPU is performed. Finally, the last module performs consumed EPR pairs. To this aim, a quantum circuit qc can
local routing while taking into account the given schedule for be represented as an undirected weighted graph Gqc (V, E ), as
remote gates and the local mappings. shown in Fig. 6, where each edge e ∈ E has weight W (e) ∈
FIGURE 7. (a) Example of initial graph partitioning. There are three partitions, each holding 4 qubits, and 8 edges between different partitions. The total
communication cost is equal to 8. White nodes represent available qubits that have not been utilized. (b) Graph partitioning where qubits have been
moved to achieve a better solution. The new total communication cost is now instead equal to 6.
N. The set of vertices V corresponds to the qubits in qc and 3) a suitable qubit assignment, as computed by in the pre-
the weight of each edge is equal to the number of two-qubit vious module, described in Section III-A. To schedule re-
gates between the corresponding qubits. mote gates, the algorithm described in Algorithm 1 scans the
The qubit assignment problem can then be treated as a quantum circuit gate by gate and stops when it encounters a
graph partitioning problem where the objective is to compute gate that, based on the current partitioning, involves qubits
a k-way partitioning such that the sum of inter-partition edge on different QPUs. The algorithm then searches for feasible
weights is minimized. Given k available QPUs, the result TeleData operations that could cover the gate under con-
of k-way partitioning are k roughly equally sized circuit sideration.2 TeleData operations are scheduled by taking
partitions. There are several algorithms available that can into account the memory capacity of each QPU, otherwise
efficiently find a solution. In this work, METIS’s multilevel data qubits storing valuable information would get overwrit-
k-way partitioning [39] was used. This is an initial solution ten by a teleportation. Finally, to each possible TeleData
and not an optimal one, as the QPUs would probably be is assigned the following cost function:
underutilized and the circuit’s qubits unnecessarily scattered nEPR delay
through all QPUs. In fact, for each partition it is checked (1)
ncov d̄t
if changing the assignment of a qubit to another partition
would benefit the overall communication cost. Between all where nEPR is the number of consumed EPR pairs, ncov is the
the useful changes found, the best one is chosen and itera- number of covered gates, which may include more gates than
tively the search for possible changes continues, until either the one under direct analysis—as shown in Fig. 1(a)—and
all qubits have changed partition one time or no further good delay is the time, measured in discrete intervals, that must be
improvements can be found. An example of improvement waited before actually executing the gate. The delay is esti-
from the initial solution is depicted in Fig. 7. Regarding mated by the cost function, based on when the quantum links
the computational complexity, we could not find a rigorous for entanglement generation were last used and when the gate
analysis of the METIS’s algorithm complexity but the au- should be executed. It may be the case that before executing
thors presented a technical report [40] showing outstanding a TeleData operation, one needs to wait for a previously
performance when partitioning graphs with millions of nodes scheduled one to complete. The aim is to evaluate possible
and edges. Focusing on the initial solution improvement, the remote operations against resource consumption (nEPR ) and
algorithm needs to iterate over all qubits and partitions. In the execution time (delay), both crucial for the performance of
worst case scenario, the complexity of this step is O(n3 p), the system, with respect to the number of covered gates
where n is the number of qubits in the circuit and p the (ncov ). The delay is scaled with the mean decoherence time d̄t
number of partitions. of the physical qubits, turning it into a dimensionless number.
The algorithm selects the TeleData operation with the
B. REMOTE GATE SCHEDULING lowest cost and then, covers a portion of the following
We implemented an algorithm to schedule remote gates for gates3 with TeleGate operations. TeleGate operations
DQC architectures in order to investigate the impact of us- are chosen and scheduled in a similar manner to the Tele-
ing both TeleData and TeleGate operations. The main Data ones. TeleGates exploit the Cat-Ent primitive, as
strategy, described in Algorithm 1, requires three inputs:
1) the quantum circuit to distribute; 2) the configuration of 2 Here, “to cover” means to make a gate executable.
the network onto which such circuit will be executed; and 3 Dimension of the portion to cover is set by a customizable parameter.
C. LOCAL ROUTING
shown in Fig. 1(b) [16]. Indeed, TeleGates can be divided The designed local routing algorithm takes as input a parti-
in three steps. First, with the Cat-Ent primitive, the one in tioned circuit with already scheduled remote operations and
the blue box in Fig. 1(b), the control qubit of a remote gate is handles the local routing accordingly. The algorithm requires
entangled with a communication qubit on the QPU holding the partitioned circuit, with scheduled remote operations, the
the target qubit. This qubit is called a “shared” control. Then, network configuration, and all QPU configurations, specif-
gates controlled by the same shared control qubit can be be ically coupling maps including connections between data
executed locally. Finally, with the Cat-DisEnt primitive, qubits and communication qubits.
the one in the violet box in Fig. 1(b), the shared control The core strategy is the following: The algorithm scans
qubit is measured and a correction operation is applied to the circuit and for every gate that involves qubits not di-
the original control qubit at the first QPU. rectly connected on their specific QPU, computes the shortest
Both TeleGate and TeleData can either migrate one sequence of necessary SWAP gates. When it encounters a
qubit to the other’s QPU or both to a different one, as shown TeleData or TeleGate operation, it first checks if the
in Fig. 8, depending on the cost of such operation (computed involved data qubits are in proximity of one of the avail-
as in (1)). Fig. 8(a) shows the first case, in which gate g0 is able communication qubits. If not, it computes the shortest
covered by sharing qubit q1 and q4 with QPU1 through two paths to the less recently used communication qubit. The
TeleGate operations. Fig. 8(b) depicts the second case, less recently used communication qubit is chosen to avoid
where gate g0 is covered by sharing qubits q1 and q6 with as much delay as possible in entanglement generation. At
QPU1 , using two TeleDatas. By doing this, also gates g1 this stage of compilation, due to local SWAPs, the state of
and g2 are covered. a data qubit may now reside on a communication qubit and
The scheduler also compiles the same portion of circuit by vice versa. This is not necessarily an issue [41], but it is
scheduling only TeleGate operations. Finally, it computes better to move the communication qubit back to its orig-
a cost for the two different strategies—one with TeleData inal position, after the remote operation is completed and
and TeleGate, the other with just TeleGate—and se- before it is used again. This is crucial to not lose the state
lects the one with the lowest amount of consumed EPR pairs. of a data qubit physically stored at a communication qubit
Finally, the scheduler resumes scanning gates in search of the location, due to a new remote operation. An example of such
next gate to cover. Given that the scheduler has to search for instance is shown in Fig. 9. The computational complexity
both TeleGate and TeleData operations to cover each of this local routing implementation depends directly on the
remote gate, and that we also take into account a portion of number of two-qubit gates and remote operations, and can
FIGURE 9. Example of remote gate scheduling and local routing. Local gates are interlaced with Cat-Ent, Cat-DisEnt, and TeleData operations as
well as SWAP gates.
FIGURE 10. (a) Example of graph state used to create graph state
circuits. Each vertex represents a qubit in the graph state, and there is an
edge between every pair of interacting qubits. This graph state has 6
qubits, but can be scaled up to an arbitrary large number of qubits.
(b) Quantum circuit to create the 6-qubit graph state represented in (a).
be estimated as O((g + r)n log n). This result stems form the
fact that in the worst case scenario, for each two-qubit gate g
and remote operation r, the algorithm uses Dijkstra’s shortest synchronization, secure communication, distributed sensing,
path algorithm with a complexity of O(n log n) on coupling distributed one-way quantum computations [45].
map with n qubits. The purpose of the proposed evaluation is twofold. On the
one hand, showing that the proposed framework is modular
IV. EVALUATION and flexible enough to allow the user for testing different
We implemented a quantum compiler based on the modular compilation strategies against multiple network topologies.
framework presented in Section III. The compiler was tested On the other hand, illustrating the specific impact of different
against three classes of quantum circuits, namely, VQE, QFT, configurations on the considered quantum circuits. For exam-
and graph state circuits (an example of graph state used is ple, it will be shown that QFT circuit compilation benefits
shown in Fig. 10). QFT and VQE were selected because they from telegate+teledata, but heavily relies on remote opera-
are widely used in many different contexts. Together with the tions, while VQE circuit compilation does not benefit from
Grover Operator (GO) [42] and the Harrow/Hassidim/Lloyd teledata, but requires only a handful of remote operations.
(HHL) method for linear systems [43], these circuits cover The circuits were compiled for the DQC architecture illus-
most practical scenarios [44]. The graph state circuit was in- trated in Figs. 4 and 11, comprised, respectively, of 3 and 5
cluded as graph states are relevant for applications like clock QPUs, denoted as Net-3 and Net-5. To increase the number
FIGURE 12. Compilation results for the Net-3 with QPU-21 of (a) VQE, (b) QFT, and (c) Graph circuits. The number of qubits of the circuits varies from
40 to 50, while the channel capacity varies from 2 to 4.
FIGURE 14. Compilation results for the Net-3 with QPU-63 of (a) VQE, (b) QFT, and (c) Graph circuits. The number of qubits of the circuits varies from
80 to 140, while the channel capacity varies from 2 to 6.
Fig. 13(a). If a sequence involves remote operations, it is of this article, whose purpose is to illustrate a modular frame-
better to teleport the control qubit and execute all subsequent work and show the multiplicity of circuits and network con-
two qubit gates in the sequence locally. On the other hand, figurations that can be evaluated by means of that framework.
the VQE circuits used are characterized by chains of two From previous work, we know that local routing could be
qubit gates, such that each qubit in the chain acts as the improved by means of circuit optimization techniques [52]
control of a gate and the target of the subsequent gate, as and noise-aware strategies [53]. Interestingly, there seems to
shown in Fig. 13(b). This means that there will be only a be no difference in the number of layers dedicated to remote
handful of remote operations and there is no need to teleport operations with respect to the channel capacity. We suppose
a control qubit. Regarding Graph state circuits, there is an that, due to the low connectivity between data qubits and
increase of one unit in the remote operations layers, as shown communication qubits on each QPU, local routing operations
in Fig. 12(c), which is opposed to a decrease of also one unit create an upstream bottleneck with deleterious effects despite
in the number of EPR consumed, when using TeleData the increase in channel capacity.
operations. Some tests were also made using Net-3 with QPU-63 de-
All figures show a slight increase in total circuit depth vices, with the number of data qubits used by the circuits
when the channel capacity changes from 2 to 4. At first varying between 80, 110, and 140. The results are reported
glance, this may seem counter intuitive. Indeed, it is worth in Fig. 14. While there is still not much of a difference when
pointing out that the adopted local routing does not change changing the channel capacity, the use of TeleData oper-
the number of consumed EPR pairs defined by the qubit as- ations is greatly beneficial when distributing QFT circuits,
signment and remote scheduling modules, but may increase which, from the number of EPR consumed, appear to be the
the number of remote operation layers. The reason is that the circuit class that more heavily depends on remote operations,
local routing tries to use all available communication qubits, among those tested.
regardless of their distance from data qubits in the local cou- By maintaining Net-3 but changing to QPU-125 devices,
pling map. Designing a different strategy is out of the scope the compiler was tested on circuits with up to 250 qubits.
FIGURE 15. Compilation results for the Net-3 with QPU-125 of (a) VQE, (b) QFT, and (c) Graph circuits. The number of qubits of the circuits varies from
150 to 200, while the channel capacity varies from 6 to 10.
FIGURE 16. Compilation results for the Net-5 with QPU-125 of (a) VQE, (b) QFT, and (c) Graph circuits. The number of qubits of the circuits varies from
300 to 600, while the channel capacity varies from 2 to 4.
At this stage, an interesting observation can be made from suitable metrics. To produce such metrics we need to actually
Fig. 15. It seems that, for Graph states circuits, when the execute the compiled circuits, either by means of a quantum
number of qubits grows, and the data qubits capacity of the network simulator or on real hardware. In the first case, there
network is topped up, while the number of EPR pairs con- are already available simulators with different levels of ab-
sumed remains unchanged, there is an almost unnoticeable straction, depending on how realistic the simulations need
increase in the number of layers for remote operations. to be. These simulations will be crucial to understand the
Finally, the total number of data qubits was further in- impact that remote operations, and any resulting local routing
creased, by exploiting Net-5 with QPU-125 devices. There- overhead, have on the quality of the computation due to the
fore, it was possible to compile circuits up to 600 qubits, effects of noise.
as depicted in Fig. 16. There are two results that stand out
in these figures. Firstly, for VQE circuits, there is a slight ACKNOWLEDGMENT
increase in the layers of remote operations when the max- This research benefits from the High Performance Comput-
imum number of qubits is reached and TeleData opera- ing facility of the University of Parma, Italy.
tions are employed. The opposite can be observed for Graph
state circuits, where the number of remote operations layers DATA AVAILABILITY
decreases, although marginally, when the maximum number All data and code required to reproduce all plots shown her-
of data qubits allowed by the network is reached. This trend ein are available at https://doi.org/10.5281/zenodo.7896588.
goes against the observation made previously for the same
type of circuits albeit with the Net-3 topology, which outlines REFERENCES
the impact of different network topologies and suggests that [1] M. Caleffi et al., “Distributed quantum computing: A survey,” 2022,
choosing a more connected network is in fact beneficial. arXiv:2212.10609, doi: 10.48550/arXiv.2212.10609.
A final remark should be made regarding the general dif- [2] R. V. Meter and S. J. Devitt, “The path to scalable distributed quan-
tum computing,” Computer, vol. 49, no. 9, pp. 31–42, Sep. 2016,
ference between the results for different classes of circuits. doi: 10.1109/MC.2016.291.
Although the QFT circuits greatly benefit from the use of [3] S. Pirandola and S. L. Braunstein, “Physics: Unite to build a
TeleData, they still need a massive amount of remote op- quantum internet,” Nature, vol. 532, no. 7598, pp. 169–171, Apr. 2016,
erations. In contrast, the VQE and Graph circuits, which are doi: 10.1038/532169a.
[4] E. Gibney, “Chinese satellite is one giant step for the quantum internet,”
not particularly affected by the use of TeleData, require Nature, vol. 535, no. 7613, pp. 478–479, Jul. 2016, doi: 10.1038/535478a.
very few remote operations for their distributed execution, [5] W. Dür, R. Lamprecht, and S. Heusler, “Towards a quantum in-
which makes them better candidates for DQC. ternet,” Eur. J. Phys., vol. 38, no. 4, May 2017, Art. no. 043001,
doi: 10.1088/1361-6404/aa6df7.
[6] C. Simon, “Towards a global quantum network,” Nature Photon., vol. 11,
V. CONCLUSION no. 11, pp. 678–680, 2017, doi: 10.1038/s41566-017-0032-0.
In this work, we introduced a general-purpose modular quan- [7] S. Wehner, D. Elkouss, and R. Hanson, “Quantum internet: A vision for
tum compilation framework for DQC that takes into account the road ahead,” Science, vol. 362, no. 6412, 2018, Art. no. eaam9288,
doi: 10.1126/science.aam9288.
both network and device constraints and characteristics. We [8] M. Zomorodi-Moghadam, M. Houshmand, and M. Houshmand, “Opti-
illustrated the experimental evaluation of a quantum com- mizing teleportation cost in distributed quantum circuits,” Int. J. Theor.
piler based on the proposed framework, using some circuits Phys., vol. 57, pp. 848–861, 2018, doi: 10.1007/s10773-017-3618-x.
[9] M. Caleffi, A. S. Cacciapuoti, and G. Bianchi, “Quantum internet:
of interest (VQE, QFT, graph state preparation) character-
From communication to distributed computing!,” in Proc. 5th
ized by different widths (up to 600 qubits). We considered ACM Int. Conf. Nanoscale Comput. Commun., 2018, pp. 1–4,
different network topologies, with quantum processors char- doi: 10.1145/3233188.3233224.
acterized by heavy hexagon coupling maps. We also pre- [10] M. Caleffi, D. Chandra, D. Cuomo, S. Hasaanpour, and A. S. Cacciapuoti,
“The rise of the quantum internet,” Computer, vol. 53, no. 6, pp. 67–72,
sented a strategy for remote scheduling that can exploit both Jun. 2020, doi: 10.1109/MC.2020.2984871.
TeleGate and TeleData operations, and tested the im- [11] L. Gyongyosi and S. Imre, “Entanglement concentration service for
pact of using either only TeleGates or both operations. We the quantum internet,” Quantum Inf. Process., vol. 19, no. 8, 2020,
Art. no. 221, doi: 10.1007/s11128-020-02716-3.
observed that TeleData operations may have a positive [12] L. Gyongyosi and S. Imre, “Routing space exploration for scalable routing
impact on the number of consumed EPR pairs, depending in the quantum internet,” Sci. Rep., vol. 10, no. 1, 2020, Art. no. 11874,
on the specific characteristics of the circuit. In fact, we also doi: 10.1038/s41598-020-68354-y.
observed that some classes of circuits are more suitable for [13] M. Amoretti and S. Carretta, “Entanglement verification in quantum net-
works with tampered nodes,” IEEE J. Sel. Areas Commun., vol. 38, no. 3,
DQC than others, i.e., they can be distributed more effi- pp. 598–604, Mar. 2020, doi: 10.1109/JSAC.2020.2967955.
ciently. Furthermore, we showed that choosing a more con- [14] A. S. Cacciapuoti, M. Caleffi, R. V. Meter, and L. Hanzo, “When entan-
nected network topology helps reduce the number of layers glement meets classical communications: Quantum teleportation for the
quantum internet,” IEEE Trans. Commun., vol. 68, no. 6, pp. 3808–3833,
dedicated to remote operations. Jun. 2020, doi: 10.1109/TCOMM.2020.2978071.
Regarding future work, we will focus on integrating noise- [15] J. Eisert, K. Jacobs, P. Papadopoulos, and M. B. Plenio, “Optimal local im-
adaptive compilation strategies into the framework, both for plementation of nonlocal quantum gates,” Phys. Rev. A, vol. 62, Oct. 2000,
Art. no. 052317, doi: 10.1103/PhysRevA.62.052317.
local routing [53] and remote gate scheduling. We shall then
[16] A. Yimsiriwattana and S. Lomonaco, “Generalized GHZ states and dis-
evaluate the impact of different strategies on the quality of tributed quantum computing,” Contemp. Math., vol. 381, pp. 131–147,
computation results, which depend also on the selection of 2005, doi: 10.1090/conm/381/07096.
[17] R. V. Meter, K. Nemoto, W. J. Munro, and K. M. Itoh, “Distributed arith- [39] “Metis github repository,” 2023. [Online]. Available: https://github.com/
metic on a quantum multicomputer,” in Proc. 33rd Int. Symp. Comput. KarypisLab/METIS
Architecture, 2006, pp. 354–365. [40] G. Karypis and V. Kumar, “A fast and high quality multilevel scheme
[18] R. V. Meter, W. J. Munro, and K. Nemoto, “Architecture of a quan- for partitioning irregular graphs,” SIAM J. Sci. Comput., vol. 20, no. 1,
tum multicomputer implementing Shor’s algorithm,” in Theory of pp. 359–392, 1998, doi: 10.1137/S1064827595287997.
Quantum Computation, Communication, and Cryptography, Y. Kawano [41] A. Dahlberg et al., “NetQASM—a low-level instruction set architec-
and M. Mosca, Eds. Berlin, Heidelberg: Springer, 2008, pp. 105–114, ture for hybrid quantum–classical programs in a quantum internet,”
doi: 10.1007/978-3-540-89304-2_10. Quantum Sci. Technol., vol. 7, no. 3, Jun. 2022, Art. no. 035023,
[19] P. Andrés-Martínez and C. Heunen, “Automated distribution of quantum doi: 10.1088/2058-9565/ac753f.
circuits via hypergraph partitioning,” Phys. Rev. A, vol. 100, Sep. 2019, [42] L. K. Grover, “A fast quantum mechanical algorithm for database search,”
Art. no. 032308, doi: 10.1103/PhysRevA.100.032308. in Proc. 28th Annu. ACM Symp. Theory Comput., 1996, pp. 212–219,
[20] R. G. Sundaram, H. Gupta, and C. R. Ramakrishnan, “Efficient distribu- doi: 10.1145/237814.237866.
tion of quantum circuits,” in Proc. 35th Int. Symp. Distrib. Comput., 2021, [43] A. W. Harrow, A. Hassidim, and S. Lloyd, “Quantum algorithm for
pp. 41:1–41:20, doi: 10.4230/LIPIcs.DISC.2021.41. linear systems of equations,” Phys. Rev. Lett., vol. 103, Oct. 2009,
[21] R. G. Sundaram, H. Gupta, and C. R. Ramakrishnan, “Distri- Art. no. 150502, doi: 10.1103/PhysRevLett.103.150502.
bution of quantum circuits over general quantum networks,” in [44] A. Jayakumar et al., “Quantum algorithm implementations for begin-
Proc. IEEE Int. Conf. Quantum Comput. Eng., 2022, pp. 415–425, ners,” ACM Trans. Quantum Comput., vol. 3, no. 4, pp. 1–92, Jul. 2022,
doi: 10.1109/QCE53715.2022.00063. doi: 10.1145/3517340.
[22] O. Daei, K. Navi, and M. Zomorodi-Moghadam, “Optimized quantum [45] C. Meignant, D. Markham, and F. Grosshans, “Distributing graph states
circuit partitioning,” Int J. Theor. Phys., vol. 59, no. 12, pp. 3804–3820, over arbitrary quantum networks,” Phys. Rev. A, vol. 100, Nov. 2019,
Dec. 2020, doi: 10.1007/s10773-020-04633-8. Art. no. 052333, doi: 10.1103/PhysRevA.100.052333.
[23] Z. Davarzani, M. Zomorodi-Moghadam, M. Houshmand, and M. Nouri- [46] Y. Kim et al., “Scalable error mitigation for noisy quantum circuits
baygi, “A dynamic programming approach for distributing quantum produces competitive expectation values,” Nature Phys., vol. 19, no. 5,
circuits by bipartite graphs,” Quantum Inf. Process., vol. 19, 2020, pp. 752–759, May 2023, doi: 10.1038/s41567-022-01914-3.
Art. no. 360, doi: 10.1007/s11128-020-02871-7. [47] Google Quantum AI, “Suppressing quantum errors by scaling a surface
[24] E. Nikahd, N. Mohammadzadeh, M. Sedighi, and M. S. Zamani, “Auto- code logical qubit,” Nature, vol. 614, no. 7949, pp. 676–681, Feb. 2023,
mated window-based partitioning of quantum circuits,” Physica Scripta, doi: 10.1038/s41586-022-05434-1.
vol. 96, no. 3, Jan. 2021, Art. no. 035102, doi: 10.1088/1402-4896/abd57c. [48] T. V. Der et al., “Decoherence-protected quantum gates for a hybrid solid-
[25] D. Cuomo et al., “Optimized compiler for distributed quantum comput- state spin register,” Nature, vol. 484, no. 7392, pp. 82–86, Apr. 2012,
ing,” ACM Trans. Quantum Comput., vol. 4, no. 2, pp. 1–29, Feb. 2023, doi: 10.1038/nature10900.
doi: 10.1145/3579367. [49] P. Hrmo et al., “Native qudit entanglement in a trapped ion quantum
[26] A. Ovide et al., “Mapping quantum algorithms to multi-core processor,” Nature Commun., vol. 14, no. 1, Apr. 2023, Art. no. 2242,
quantum computing architectures,” 2023, arXiv:2303.16125, doi: doi: 10.1038/s41467-023-37375-2.
10.48550/arXiv.2303.16125. [50] IBM, “Expanding the IBM Quantum roadmap to anticipate the
[27] J. M. Baker, C. Duckering, A. Hoover, and F. T. Chong, “Time- future of quantum-centric supercomputing,” 2022. [Online]. Avail-
sliced quantum circuit partitioning for modular architectures,” in able:https://research.ibm.com/blog/ibm-quantum-roadmap-2025
Proc. 17th ACM Int. Conf. Comput. Front., 2020, pp. 98–107, doi: [51] S. L. N. Hermans, M. Pompili, H. K. C. Beukers, S. Baier, J. Borregaard,
10.1145/3387902.3392617. and R. Hanson, “Qubit teleportation between non-neighbouring nodes in
[28] D. Ferrari, A. S. Cacciapuoti, M. Amoretti, and M. Caleffi, “Compiler a quantum network,” Nature, vol. 605, no. 7911, pp. 663–668, May 2022,
design for distributed quantum computing,” IEEE Trans. Quantum Eng., doi: 10.1038/s41586-022-04697-y.
vol. 2, 2021, Art. no. 4100720, doi: 10.1109/TQE.2021.3053921. [52] D. Ferrari, I. Tavernelli, and M. Amoretti, “Deterministic algo-
[29] IBM, “The IBM Quantum heavy hex lattice,” 2021. [Online]. Avail- rithms for compiling quantum circuits with recurrent patterns,” Quan-
able:https://research.ibm.com/blog/heavy-hex-lattice tum Inf. Process., vol. 20, no. 6, Jun. 2021, Art. no. 213, doi:
[30] M. Takita, A. W. Cross, A. D. Córcoles, J. M. Chow, and J. M. Gam- 10.1007/s11128-021-03150-9.
betta, “Experimental demonstration of fault-tolerant state preparation [53] D. Ferrari and M. Amoretti, “Noise-adaptive quantum compilation
with superconducting qubits,” Phys. Rev. Lett., vol. 119, Oct. 2017, strategies evaluated with application-motivated benchmarks,” in
Art. no. 180501, doi: 10.1103/PhysRevLett.119.180501. Proc. 19th ACM Int. Conf. Comput. Front., 2022, pp. 237–243, doi:
[31] N. Sundaresan, I. Lauer, E. Pritchett, E. Magesan, P. Jurcevic, and J. 10.1145/3528416.3530250.
M. Gambetta, “Reducing unitary and spectator errors in cross reso-
nance with optimized rotary echoes,” PRX Quantum, vol. 1, Dec. 2020,
Art. no. 020318, doi: 10.1103/PRXQuantum.1.020318.
[32] A. D. Córcoles et al., “Demonstration of a quantum error detection code
using a square lattice of four superconducting qubits,” Nature Commun.,
vol. 6, no. 1, Apr. 2015, Art. no. 6979, doi: 10.1038/ncomms7979. Davide Ferrari (Member, IEEE) received the
Ph.D. degree in information technologies from
[33] C. Chamberland, G. Zhu, T. J. Yoder, J. B. Hertzberg, and A. W. Cross,
“Topological and subsystem codes on low-degree graphs with flag qubits,” the University of Parma, Parma, Italy, in 2023.
Phys. Rev. X, vol. 10, Jan. 2020, Art. no. 011022, doi: 10.1103/Phys- Druring his doctoral studies, he worked on
RevX.10.011022. quantum compiling, quantum optimization, and
[34] B. W. Kernighan and S. Lin, “An efficient heuristic procedure for partition- distributed quantum computing. He was a Re-
ing graphs,” Bell System Tech. J., vol. 49, no. 2, pp. 291–307, Feb. 1970, search Scholar with Future Technology Lab, Uni-
doi: 10.1002/j.1538-7305.1970.tb01770.x. versity of Parma, working on the design of ef-
[35] S. A. Cuccaro, T. G. Draper, S. A. Kutin, and D. P. Moulton, “A new ficient algorithms for quantum compiling. He is
quantum ripple-carry addition circuit,” 2004, arXiv:quant-ph/0410184, currently a Research Fellow with the Department
doi: 10.48550/arXiv.quant-ph/0410184. of Engineering and Architecture, University of
[36] L. Ruiz-Perez and J. C. Garcia-Escartin, “Quantum arithmetic with Parma. He is involved in the Quantum Information Science research ini-
the quantum fourier transform,” Quantum Inf. Process., vol. 16, no. 6, tiative with the University of Parma, where he is a member of the Quantum
Apr. 2017, Art. no. 152, doi: 10.1007/s11128-017-1603-1. Software Laboratory. His research interests include quantum optimization
[37] W. Kozlowski et al., “Architectural principles for a quantum internet,” In- applications and efficient quantum compiling for local and distributed quan-
ternet Eng. Task Force, Internet-Draft draft-irtf-qirg-principles-10, 2022. tum computing.
[38] IBM, “IBM quantum systems,” 2020. [Online]. Available: Dr. Ferrari was the recipient of the IBM Quantum Awards Circuit Opti-
https://quantum-computing.ibm.com/services/resources mization Developer Challenge in 2020.
Stefano Carretta received the Ph.D. degree in Michele Amoretti (Senior Member, IEEE) re-
physics from the University of Parma, Parma, ceived the Ph.D. degree in information technolo-
Italy, in 2005. gies from the University of Parma, Parma, Italy,
He is currently a Full Professor in physics of in 2006.
matter with the University of Parma. He con- In 2013, he was a Visiting Researcher with LIG
tributed some of the first proposals for the use of Lab, Grenoble, France. He is currently an Asso-
magnetic molecules as qubits and the first pro- ciate Professor of computer engineering with the
posal for exploiting molecular nanomagnets as University of Parma. He has authored or coau-
quantum simulators. He has authored or coau- thored more than 130 research papers in refer-
thored more than 150 research papers published eed international journals, conference proceed-
in international journals. His research interests ings, and books. He is involved in the Quantum
include the theoretical modeling of the quantum behavior of magnetic Information Science research and teaching initiative with the University of
molecules and quantum information processing. Parma, where he leads the Quantum Software Laboratory. His current re-
Dr. Carretta was the recipient of the Le Scienze (the Italian version of search interests include quantum computing, high-performance computing,
Scientific American) medal and of the President of the Italian Republic and the Internet of Things.
medal for his research on molecular nanomagnets in 2006, and the pres- Dr. Amoretti is an Associate Editor for the journals IEEE TRANSAC-
tigious Olivier Kahn International Award for his contribution to the theory TIONS ON QUANTUM E NGINEERING and International Journal of Dis-
of molecular magnetism in 2011. He was a member of the commission of tributed Sensor Networks. He is currently one of the Principal Investigators
experts on quantum technologies for the 2021–27 Italian National Research of the European HE project Quantum Internet Alliance. He is the CINI
Program (PNR), and he is involved in several national and European projects Consortium delegate in the UNI/CT 535 “Quantum Technologies” UNINFO
involving quantum technologies. He is currently one of the Principal Inves- Commission, which is the national mirror of ISO/IEC JTC 1 WG 14 “Quan-
tigators of an European ERC Synergy Project. tum Information Technologies” and of CEN/CENELEC JTC 22 “Quantum
Technologies.”
Open Access funding provided by ‘Università degli Studi di Parma’ within the CRUI CARE Agreement