0% found this document useful (0 votes)
48 views18 pages

Resource-Efficient Quantum Computing by Breaking Abstractions

This document summarizes a research article that proposes more efficient approaches to quantum computing by breaking abstractions between software layers. The article reviews works that break the quantum instruction set architecture abstraction and error correction/information processing schemes. It suggests hardware-aware compilation optimizations and breaking qubit abstractions could accelerate the development of practical quantum computing applications.

Uploaded by

マルワ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views18 pages

Resource-Efficient Quantum Computing by Breaking Abstractions

This document summarizes a research article that proposes more efficient approaches to quantum computing by breaking abstractions between software layers. The article reviews works that break the quantum instruction set architecture abstraction and error correction/information processing schemes. It suggests hardware-aware compilation optimizations and breaking qubit abstractions could accelerate the development of practical quantum computing applications.

Uploaded by

マルワ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

Resource-Efficient Quantum
Computing by Breaking
Abstractions
By Y UNONG S HI , P RANAV G OKHALE , P RAKASH M URALI , J ONATHAN M. B AKER ,
CASEY D UCKERING , Y ONGSHAN D ING , N ATALIE C. B ROWN , C HRISTOPHER C HAMBERLAND ,
A LI JAVADI -A BHARI , A NDREW W. C ROSS , DAVID I. S CHUSTER , K ENNETH R. B ROWN ,
M ARGARET M ARTONOSI , AND F REDERIC T. C HONG

ABSTRACT | Building a quantum computer that surpasses and two error-correction/information-processing schemes that
the computational power of its classical counterpart is a break the qubit abstraction. Last, we discuss several possible
great engineering challenge. Quantum software optimizations future directions.
can provide an accelerated pathway to the first generation
KEYWORDS | Quantum computing (QC), software design, sys-
of quantum computing (QC) applications that might save
tem analysis and design.
years of engineering effort. Current quantum software stacks
follow a layered approach similar to the stack of classical
computers, which was designed to manage the complexity. I. I N T R O D U C T I O N
In this review, we point out that greater efficiency of QC sys- Quantum computing (QC) has recently transitioned from a
tems can be achieved by breaking the abstractions between theoretical prediction to a nascent technology. With
these layers. We review several works along this line, includ- development of noisy intermediate-scale quantum (NISQ)
ing two hardware-aware compilation optimizations that break devices, cloud-based quantum information processing
the quantum instruction set architecture (ISA) abstraction (QIP) platforms with up to 53 qubits are currently acces-
sible to the public. It has also been recently demonstrated
Manuscript received October 1, 2019; revised December 29, 2019 and March 23,
by the Quantum Supremacy experiment on the Sycamore
2020; accepted May 5, 2020. This work was supported in part by Enabling quantum processor, a 53-qubit QC device manufactured
Practical-scale Quantum Computing (EPiQC), an NSF Expedition in Computing,
by Google, that quantum computers can outperform cur-
under Grant CCF-1730449/1832377/1730082; in part by Software-Tailored
Architectures for Quantum co-design (STAQ) under Grant NSF Phy-1818914; and rent classical supercomputers in certain computational
in part by DOE under Grant DE-SC0020289 and Grant DE-SC0020331. Yunong
Shi is funded in part by the NSF QISE-NET fellowship under grant number
tasks [7], although alternative classical simulations have
1747426. Pranav Gokhale is supported by the Department of Defense (DoD) been proposed that scale better [73], [74]. These develop-
through the National Defense Science & Engineering Graduate Fellowship
(NDSEG) Program. This work was completed in part with resources provided by
ments suggest that the future of QC is promising. Neverthe-
the University of Chicago Research Computing Center. (Corresponding author: less, there is still a gap between the ability and reliability
Frederic T. Chong.)
of current QIP technologies and the requirements of the
Yunong Shi and David I. Schuster are with the Department of Physics, The
University of Chicago, Chicago, IL 60637 USA. first useful QC applications. The gap is mostly due to
Pranav Gokhale, Jonathan M. Baker, Casey Duckering, Yongshan Ding, the presence of qubit decoherence and systematic errors
and Frederic T. Chong are with the Department of Computer Science, The
University of Chicago, Chicago, IL 60637 USA (e-mail: [email protected]). including gate errors, state preparation, and measurement
Prakash Murali and Margaret Martonosi are with the Department of (SPAM) errors. As an example, the best reported qubit
Computer Science, Princeton University, Princeton, NJ 08544 USA.
Natalie C. Brown and Kenneth R. Brown are with the Department of
decoherence time on a superconducting (SC) QIP platform
Electrical and Computer Engineering, Duke University, Durham, NC 27708 USA. is around 500 μs (meaning that in 500 μs, the probability
Christopher Chamberland is with the AWS Center for Quantum Computing,
Pasadena, CA 91125 USA, and also with the Institute for Quantum Information
of a logical 1 state staying unflipped drops to 1/e ≈
and Matter, California Institute of Technology, Pasadena, CA 91125 USA. 0.368), the best error rate of 2-qubit gates is around 0.3%–
Ali Javadi-Abhari and Andrew W. Cross are with the IBM Thomas J. Watson
1% in a device, measurement error of a single qubit is
Research Center, Ossining, NY 10598 USA.
between 2% and 5% [1], [75]. In addition to the errors
Digital Object Identifier 10.1109/JPROC.2020.2994765 in the elementary operations, emergent error modes such

0018-9219 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

P ROCEEDINGS OF THE IEEE 1

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

philosophy is similar to that of its classical counterpart,


which has slowly converged to this layered approach over
many years to manage the increasing complexity that
comes with the exponentially growing hardware resources.
In each layer, burdensome hardware details are well encap-
sulated and hidden behind a clean interface, which offers
a well-defined, manageable optimization task to solve.
Thus, this layered approach provides great portability and
modularity. For example, the Qiskit compiler supports both
the SC QIP platform and the trapped ion QIP platform as
the backend (see Fig. 2). In the Qiskit programming envi-
ronment, these two backends share a unified, hardware-
agnostic programming frontend even though the hardware
characteristics, and the qubit control methods of the two
platforms are rather different. SC qubits are macroscopic
LC circuits placed inside dilution fridges of temperature
near absolute zero. These qubits can be regarded as arti-
ficial atoms and are protected by a metal transmission
line from environmental noise. For SC QIP platforms, qubit
control is achieved by sending microwave pulses into the
transmission line that surrounds the LC circuits to change
the qubit state, and those operations are usually done
within several hundreds of nanoseconds. On the other
hand, trapped ion qubits are ions confined in the potential
of electrodes in vacuum chambers. Trapped ion qubits have
a much longer coherence time (>1 s) and a modulated
laser beam is utilized (in addition to microwave pulse
control) in performing quantum operations. The quantum
gates are also much slower than that of SC qubits but the
Fig. 1. Workflow of the QC stack roughly followed by current
qubit connectivity (for 2-qubit gates) are much better. In
programming environments (e.g., Qiskit, Cirq, ScaffCC) based on the
quantum circuit model.
Qiskit’s early implementation, the hardware characteristics
of the two QIP platforms are abstracted away in the quan-
tum circuit model so that the higher level programming
as crosstalk are reported to make significant contributions environment can work with both backends.
to the current noise level in quantum devices [18], [60]. However, the abstractions introduced in the layered
With these sources of errors combined, we are only able to approach of current QC stacks restrict opportunities for
run quantum algorithms of very limited size on current QC cross-layer optimizations. For example, without accessing
devices. the lower level noise information, the compiler might not
Thus, it will require tremendous efforts and invest- be able to properly optimize gate scheduling and qubit
ment to solve these engineering challenges, and we can- mapping with regard to the final fidelity. For near-term
not expect a definite timeline for its success. Because of QC, maximal utilization of the scarce quantum resources
the uncertainties and difficulties in relying on hardware and reconciling quantum algorithms with noisy devices
breakthroughs, it will also be crucial in the near term to is of more importance than to manage complexity of
close the gap using higher-level quantum optimizations the classical control system. In this review, we propose a
and software hardware codesign, which could maximally shift of the QC stack toward a more vertical integrated
utilize noisy devices and potentially provide an accelerated architecture. We point out that breaking the abstraction
pathway to real-world QC applications. layers in the stack by exposing enough lower level details
Currently, major quantum programming environments, could substantially improve the quantum efficiency. This
including Qiskit [6] by IBM, Cirq [3] by Google, PyQuil claim is not that surprising—there are many supporting
[58] by Rigetti, and strawberry fields [66] by Xanadu, examples from the classical computing world such as the
follow the quantum circuit model. These programming emergence of application-specific architectures like the
environments support users in configuring, compiling, and graphics processing unit (GPU) and the tensor processing
running their quantum programs in an automated work- unit (TPU). However, this view is often overlooked in the
flow and roughly follow a layered approach as illustrated software/hardware design in QC.
in Fig. 1. In these environments, the compilation stack We examine this methodology by looking at several
is divided into layers of subroutines that are built upon previous works along this line. We first review two
the abstraction provided by the next layer. This design compilation optimizations that break the instruction set

2 P ROCEEDINGS OF THE IEEE

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

Fig. 2. Same abstractions in the QC stack on the logical level can be mapped to different physical implementations. Here, we take the SC
QIP platform and the trapped ion QIP platform as examples of the physical implementations. (Left) In the quantum circuit model, both SC
qubits and trapped-ion qubits are abstracted as two-level quantum systems and their physical operations are abstracted as quantum
gates, even though these two systems have different physical properties. (Middle) SC qubits are SC circuits placed inside a long, metal
transmission line. The apparatus requires a dilution fridge of temperature near absolute zero. The orange standing waves are oscillations in
the transmission line, which are driven by external microwave pulses and used to control the qubit states. (Right) Trapped ion qubits are
confined in the potential of cylindrical electrodes. Modulated laser beam can provide elementary quantum operations for trapped ion qubits.
The apparatus is usually contained inside a vacuum chamber of pressure around 10−8 Pa. The two systems require different high-level
optimizations for better efficiency due to their distinct physical features.

architecture (ISA) abstraction by exposing pulse level A. Quantum Compilation


information (see Section II) and noise information (see Since the early days of QC, quantum compilation has
Section III). Then, we discuss an information processing been recognized as one of the central tasks in realiz-
scheme that improves general circuit latency by exposing ing practical quantum computation. Quantum compilation
the third energy level of the underlying physical space, was first defined as the problem of synthesizing quantum
that is, breaking the qubit abstraction using qutrits (see circuits for a given unitary matrix. The celebrated Solovay–
Section IV). Then, we discuss the Gottesman–Kitaev– Kitaev theorem [34] states that such synthesis is always
Preskill (GKP) qubit encoding in a quantum harmonic possible if a universal set of quantum gates is given. Now
oscillator (QHO) that exposes error information in the the term of quantum compilation is used more broadly and
form of small shifts in the phase space to assist the almost all stages in Fig. 1 can be viewed as part of the
upper level error mitigation/correction procedure (see quantum compilation process.
Section V). There are many indications that current quantum com-
At last, we envision several future directions that could pilation stack (see Fig. 1) is highly inefficient. First, current
further explore the idea of breaking abstractions and assist circuit synthesis algorithms are far from saturating (or
the realization of the first quantum computers for real- being closed to) the asymptotic lower bound in the general
world applications. case [34], [49]. Also, the formulated circuit synthesis prob-
lem is based on the fundamental abstraction of quantum
II. B R E A K I N G T H E I S A A B S T R A C T I O N ISA (see Section II-B) and largely discussed in a hardware-
U S I N G P U L S E - L E V E L C O M P I L AT I O N agnostic settings in previous work but the underlying
In this section, we describe a quantum compilation physical operations cannot be directly described by the
methodology proposed in [28] and [67] that achieves logical level ISA (as shown in Fig. 2). The translation from
an average of 5× speedup in terms of generated circuit the logical ISA to the operations directly supported by the
latency, by employing the idea of breaking the ISA abstrac- hardware is typically done in an ad hoc way. Thus, there
tion and compiling directly to control pulses. is a mismatch between the expressive logical gates and the

P ROCEEDINGS OF THE IEEE 3

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

set of instructions that can be efficiently implemented on The underlying evolution of the quantum system is
a real system. This mismatch significantly limits the effi- continuous and so are the control signals. The continuous
ciency of the current QC stack, thus underlying quantum control signals offer much richer and flexible controllabil-
devices’ computing ability and wastes precious quantum ity than the quantum ISA. The control pulses can drive
coherence. While improving the computing efficiency is the QC hardware to a desired quantum states by varying
always valuable, improving QC efficiency is do-or-die: a system-dependent and time-dependent quantity called
computation has to finish before qubit decoherence or the the Hamiltonian. The Hamiltonian of a system determines
results will be worthless. Thus, improving the compilation the evolution path of the quantum states. The ability to
process is one of the most, if not the most, crucial goals in engineer real-time system Hamiltonian allows us to navi-
near-term QC system design. gate the quantum system to the quantum state of interest
By identifying this mismatch and the fundamental lim- through generating accurate control signals. Thus, quan-
itation in the ISA abstraction, in [28] and [66], we pro- tum computation can be done by constructing a quantum
posed a quantum compilation technique that optimizes system in which the system Hamiltonian evolves in a way
across existing abstraction barriers to greatly reduce that aligns with a QC task, producing the computational
latency while still being practical for large numbers of result with high probability upon final measurement of the
qubits. Specifically, rather than limiting the compiler to use qubits. In general, the path to a final quantum state is not
1- and 2-qubit quantum instructions, our framework aggre- unique, and finding the optimal evolution path is a very
gates the instructions in the logical ISA into a customized important but challenging problem [25], [39], [62].
set of instructions that corresponds to optimized control
pulses. We compare our methodology to the standard D. Mismatch Between ISA and Control
compilation workflow on several promising NISQ quantum
Being hardware-agnostic, the quantum operation
applications and conclude that our compilation method-
sequences composed by logical ISA have limited freedom
ology has an average speedup of 5× with a maximum
in terms of controllability and usually will not be mapped
speedup of 10×. We use the rest of this section to introduce
to the optimal evolution path of the underlying quantum
this compilation methodology, starting with defining some
system, thus there is a mismatch between the ISA and
basic concepts.
low-level quantum control. With two simple examples,
we demonstrate this mismatch.
B. Quantum ISA 1) We can consider the instruction sequence consists of
a CNOT gate followed by an X gate on the control
In the QC stack, a restricted set of 1- and 2-qubit bit. In current compilation workflow, these two logical
quantum instructions are provided for describing the high- gates will be further decomposed into the physical
level quantum algorithms, analogous to the ISA abstraction ISA and be executed sequentially. However, on SC
in classical computing. In this article, we call this instruc- QIP platforms, the microwave pulses that implement
tion set the logical ISA. The 1-qubit gates in the logical these two instructions could in fact be applied simul-
ISA include the Pauli gates, P = {X, Y, Z}. It also includes taneously (because of their commutativity). Even the
the Hadamard H gate, the symbol in the circuit model of commutativity can be captured by the ISA abstraction,
which is given as an example in Fig. 2 on the left column. in the current compilation workflow, the compiled
The typical 2-qubit instruction in the logical instruction control signals are suboptimal.
set is the controlled-NOT (CNOT) gate, which flips the 2) SWAP gate is an important quantum instruction
state of the target qubit based on the state of the control for circuit mapping. The SWAP operation is usually
qubit. decomposed as three CNOT operations, as realized
However, usually QC devices does not directly support in the circuit below. This decomposition could be
the logical ISA. Based on the system characteristics, we can thought of the implementation of in-place mem-
define the physical ISA that can be directly mapped to ory SWAPs with three alternating XORs for classi-
the underlying control signals. For example, SC devices cal computation. However, for systems like quantum
typically has cross-resonance (CR) gate or iSWAP gate dots [41], the SWAP operation is directly supported
as their intrinsic 2-qubit instruction, whereas for trapped- by applying particular constant control signals for a
ion devices the intrinsic 2-qubit instruction can be the certain period of time. In this case, this decomposi-
Mølmer–Sørensen gate or the controlled phase gate. tion of SWAP into three CNOTs introduces substantial
overhead.
C. Quantum Control
As shown in Fig. 2 and discussed in Section I, underlying
physical operations in the hardware such as microwave In experimental physics settings, equivalences from sim-
control pulses and modulated laser beam are abstracted ple gate sequences to control pulses can be hand opti-
as quantum instructions. A quantum instruction is simply mized [61]. However, when circuits become larger and
as prefined control pulse sequences. more complicated, this kind of hand optimization becomes

4 P ROCEEDINGS OF THE IEEE

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

less efficient and the standard decomposition becomes less


favorable, motivating a shift toward numerical optimiza-
tion methods that are not limited by the ISA abstraction.

E. Quantum Optimal Control


Quantum optimal control (QOC) theory provides an
alternative in terms of finding the optimal evolution path
for the quantum compilation tasks. QOC algorithms typ-
ically perform analytical or numerical methods for this
optimization, among which, gradient ascent methods,
such as the GRadient Ascent Pulse Engineering (GRAPE)
[15], [33] algorithm, are widely used. The basic idea of
GRAPE is as follows: for optimizing the control signals of
M parameters (u1 , . . . , uM ) for a target quantum state,
in every iteration, GRAPE minimizes the deviation of the
system evolution by calculating the gradient of the final
fidelity with respect to the M control parameters in the
M -dimensional space. Then GRAPE will update the para-
meters in the direction of the gradient with adaptive step
size [15], [33], [39]. With a large number of iterations,
the optimized control signals are expected to converge and
find optimized pulses.
In [65], we utilize GRAPE to optimize our aggregated
instructions that are customized for each quantum circuit
as opposed to selecting instructions from a predefined
pulse sequences. However, one disadvantage of numer-
ical methods like GRAPE is that the running time and
memory use grow exponentially with the size of the
quantum system for optimization. In our work, we are
able to use GRAPE for optimizing quantum systems of
up to 10 qubits with the GPU-accelerated optimal control
unit [39]. As shown in our result, the limit of 10 qubits
does not put restrictions on the result of our compilation
methodology.

F. Pulse-Level Optimization: A Motivating


Example
Fig. 3. Example of a QAOA circuit. (a) QAOA circuit with the logical
Next, we will illustrate the workflow of our compila- ISA. (b) QAOA circuit with aggregated instructions. (c) Generated
tion methodology with a circuit instance of the quantum control pulses for G3 in the ISA-based compilation. (d) Control pulses
approximate optimization algorithm (QAOA) for solving for G3 from aggregated instructions based compilation. Each curve

the MAXCUT problem on the triangle graph (see Fig. 3).1 is the amplitude of a relevant control signal. The pulse sequences in
(d) provides a 3× speedup comparing to the pulse sequences in (c).
This QAOA circuit with logical ISA (or variants of it up
Pulse sequences reprinted with permission from [65].
to single qubit gates) can be reproduced by most existing
quantum compilers. This instance of the QAOA circuit is
generated by the ScaffCC compiler, as shown in Fig. 3(a). minimal circuit instance, our compilation method reduces
We assume this circuit is executed on an SC architecture the total execution time of the circuit by about 2.97× com-
with 1-D nearest neighbor qubit connectivity. A SWAP pared to compilation with restricted ISA. Fig. 3(c) and (d)
instruction is inserted in the circuit to satisfy the linear shows the generated pulses for G3 with ISA-based compila-
qubit connectivity constraints. tion and with our aggregated instruction based, pulse-level
On the other hand, our compiler generates the aggre- optimized compilation.
gated instruction set G1 –G5 as illustrated in Fig. 3(b) auto-
matically, and uses GRAPE to produce highly optimized G. Optimized Pulse-Level Compilation Using
pulse sequences for each aggregated instruction. In this Gate Aggregation: The Workflow
1 The angle parameters γ and β can be determined by variational Now, we give a systematic view of the workflow of
methods [44] and are set to 5.67 and 1.26. our compiler. First, at the program level, our compiler

P ROCEEDINGS OF THE IEEE 5

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

Fig. 4. Example in Fig. 5 in the form of GDG. (a) Input GDG. (b) Commutativity detection. (c) Commutativity-aware scheduling.

performs module flattening and loop unrolling to pro-


duce the quantum assembly (QASM), which represents
a schedule of the logical operations. Next, the compiler
enters the commutativity detection phase. Different from
the ISA-based approach, in this phase, our compilation
process converts the QASM code to a more flexible logical
schedule that explores the commutativity between instruc-
tions. To further explore the commutativity in the schedule,
the compiler aggregates instructions in the schedule to
produce a new logical schedule with instructions that rep-
resents diagonal matrices (and are of high commutativity).
Then the compiler enters the scheduling and mapping
phase. Because of commutativity awareness, our compiler
can generate a much more efficient logical schedule by
rearranging the aggregated instructions with high com-
mutativity. The logical schedule is then converted to a
physical schedule after the qubit mapping stage. Then the
compiler generates the final aggregated instructions for
pulse optimization and use GRAPE for producing the corre-
sponding control pulses. The goal of the final aggregation
is to find the optimal instruction set that produces the
lowest-latency control pulses while preserving the paral-
lelism in the circuit aggregations that are small as much
as possible. Finally, our compiler outputs an optimized
physical schedule along with the corresponding optimized
control pulses. Fig. 4 shows the gate dependence graph
(GDG) of the QAOA circuit in Fig. 5 in different compila- Fig. 5. Example of CLS. With commutativity detected, the circuit

tion stages. Next, we walk through the compilation back- depth can be shortened. (a) Input circuit. (b) Commutativity
detection. (c) Commutativity-aware scheduling.
end with this example, starting from the commutativity
detection phase.

1) Commutativity Detection: In the commutativity detec- that for instructions within an instruction block to not
tion phase, the false dependence between commutative commute, but for the full instruction block to commute
instructions are removed and the GDG is restructureed. with each other [19], [37]. As an example, in Fig. 5,
This is because if a pair of gates commutes, the gates can the CNOT-Rz-CNOT instruction blocks commute with each
be scheduled in either order. Also, it can be further noticed other because these blocks correspond to diagonal unitary
that, in many NISQ quantum algorithms, it is ubiquitous matrices. However, each individual instruction in these

6 P ROCEEDINGS OF THE IEEE

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

circuit blocks does not commute. Thus, after aggregating


these instructions, the compiler is able to schedule new
aggregated instructions in any order, which is impossible
before. This commutativity detection procedure opens up
opportunities for more efficient scheduling.
2) Scheduling and Mapping:
a) Commutativity-aware logical scheduling (CLS): In
our scheduling phase, our logical scheduling algorithm
is able to fully utilize the detected commutativity in the
last compilation phase. The CLS iteratively schedules the
available instructions on each qubits. At each iteration,
the CLS draws instruction candidates that can be executed
in the earliest time step to schedule.
b) Qubit mapping: In this phase of the compilation,
the compiler transform the circuit to a form that respect
the topological constraints of hardware connectivity [43].
To conform to the device topology, the logical instructions
are processed in two steps. First, we place frequently
interacting qubits near each other by bisecting the qubit
interaction graph along a cut with few crossing edges,
computed by the METIS graph partitioning library [32].
Once the initial mapping is generated, 2-qubit operations
between nonneighboring qubits are prepended with a Fig. 6. Daily variations in qubit coherence time (larger is better)

sequence of SWAP rearrangements that move the control and gate error rates (lower is better) for selected qubits and gates
in IBM’s 16-qubit system. The most or least reliable system
and target qubits to be adjacent.
elements change across days. (a) Coherence time (T2). (b) CNOT

3) Instruction Aggregation: In this phase, the compiler gate error rate.

iterates with the optimal control unit to generate the final


aggregated instructions for the circuit. Then, the optimal
control unit optimizes each instruction individually with
interface between the hardware and software. The ISA
GRAPE.
abstraction allows software to execute correctly on any
4) Physical Execution: Finally, the circuit will be sched- hardware which implements the interface. This enables
uled again using the CLS from Section II-G2, the physical application portability and decouples hardware and soft-
schedules will be sent to the control unit of the underly- ware development.
ing quantum hardware and trigger the optimized control For QC systems, the hardware–software interface is typ-
pulses at appropriate timing and the physical execution. ically defined as a set of legal instructions and the con-
nectivity topology of the qubits [14], [20]–[22], [58]—it
H. Discussion does not include information about qubit quality, gate
fidelity, or micro-operations used to implement the ISA
In [65], we selected nine important quantum/classical-
instructions. While technology independent abstractions
quantum hybrid algorithms in the NISQ era as our
are desirable in the long run, our work [46], [47]
benchmarks. Across all nine benchmarks, our compilation
reveals that such abstractions are detrimental to program
scheme achieves a geometric mean of 5.07× pulse time
correctness in NISQ quantum computers. By exposing
reduction comparing to the standard gate-based compi-
microarchitectural details to software and using intelligent
lation. The result in [65] indicates that addressing the
compilation techniques, we show that program reliability
mismatch between quantum gates and the control pulses
can be improved significantly.
by breaking the ISA abstraction can greatly improve the
compilation efficiency. Going beyond the ISA-based com-
pilation, this article opens up a door to new QC system A. Noise Characteristics of QC Systems
designs. QC systems have spatial and temporal variations in
noise due to manufacturing imperfections, imprecise qubit
III. B R E A K I N G T H E I S A A B S T R A C T I O N control, and external interference. To motivate the neces-
USING NOISE-ADAPTIVE sity for breaking the ISA abstraction barrier, we present
C O M P I L AT I O N real-system statistics of hardware noise in systems from
In recent years, QC compute stacks have been devel- three leading QC vendors—IBM, Rigetti, and University
oped using abstractions inspired from classical computing. of Maryland. IBM and Rigetti systems use SC qubits
The ISA is a fundamental abstraction which defines the [10], [12] and the University of Maryland (UMD) uses

P ROCEEDINGS OF THE IEEE 7

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

Fig. 7. (a) IR of the Bernstein-Vazirani algorithm (BV4). Each horizontal line represents a program qubit. X and H are single qubit gates.
The CNOT gates from each qubit p0−−2 to p3 are marked by vertical lines with XOR connectors. The readout operation is indicated by the
meter. (b) Qubit layout in IBMQ16, a naive mapping of BV4 onto this system. The black circles denote qubits and the edges indicate hardware
CNOT gates. The edges are labeled with CNOT gate error (×10−2 ). The hatched qubits and crossed gates are unreliable. In this mapping,
a SWAP operation is required to perform the CNOT between p1 and p3 and error-prone operations are used. (c) Mapping for BV4 where qubit
movement is not required and unreliable qubits and gates are avoided.

trapped ion qubits [16]. The gates in these systems are (Scaffold in our framework) and the output is machine
periodically calibrated and their error rates are measured. executable assembly code. First, the compiler converts the
Fig. 6 shows the coherence times and 2-qubit gate error program to an intermediate representation (IR) composed
rates in IBM’s 16-qubit system (ibmnamefull). From daily of single and 2-qubit gates by decomposing high-level QC
calibration logs we find that, the average qubit coherence operations, unrolling all loops and inlining all functions.
time is 40 μs, 2-qubit gate error rate is 7%, readout Fig. 7(a) shows an example IR. The qubits in the IR
error rate is 4%, and single qubit error rate is 0.2%. The (program qubits) are mapped to distinct qubits in the hard-
2-qubit and readout errors are the dominant noise sources ware, typically in a way that reduces qubit communication.
and vary up to 9× across gates and calibration cycles. Next, gates are scheduled while respecting data depen-
Rigetti’s systems also exhibit error rates and variations dences. Finally, on hardware with limited connectivity,
of comparable magnitude. These noise variations in SC such as SC systems, the compiler inserts SWAP operations
systems emerge from material defects due to lithographic to enable 2-qubit operations between nonadjacent qubits.
manufacturing, and are expected in the future systems Fig. 7(a) and (b) shows two compiler mappings for a
also [35], [36]. 4-qubit program on IBM’s 16-qubit system. In the first
Trapped ion systems also have noise fluctuations even mapping, the compiler must insert SWAPs to perform the
though the individual qubits are identical and defect-free. 2-qubit gate between p1 and p3 . Since SWAP operations
On a 5-qubit trapped ion system from UMD, we observed are composed of three 2-qubit gates, they are highly error
up to 3× variation in the 2-qubit gate error rates because prone. In contrast, the second mapping requires no SWAPs
of fundamental challenges in qubit control using lasers and because the qubits required for the CNOTs are adjacent.
their sensitivity to motional mode drifts from temperature Although SWAP optimizations can be performed using the
fluctuations. device ISA, the second mapping is also noise-optimized,
We found that these microarchitectural noise variations that is, it uses qubits with high coherence time and low
dramatically influence program correctness. When a pro- operational error rates. By considering microarchitectural
gram is executed on a noisy QC system, the results may be noise characteristics, our compiler can determine such
corrupted by gate errors, decoherence, or readout errors on noise-optimized mappings to improve the program success
the hardware qubits used for execution. Therefore, it is cru- rate.
cial to select the most reliable hardware qubits to improve We developed three strategies for noise optimization.
the success rate of the program (the likelihood of correct First, our compiler maps program qubits onto hardware
execution). The success rate is determined by executing locations with high reliability, based on the noise data.
a program multiple times and measuring the fraction of We choose the initial mapping based on 2-qubit and read-
runs that produce the correct output. High success rate out error rates because they are the dominant sources of
is important to ensure that the program execution is not error. Second, to mitigate decoherence errors, all gates are
dominated by noise. scheduled to finish before the coherence time of the hard-
ware qubits. Third, our compiler optimizes the reliability
B. Noise-Adaptive Compilation: Key Ideas of SWAP operations by minimizing the number of SWAPs
whenever possible and performing SWAPs along reliable
Our work breaks the ISA abstraction barrier by develop-
hardware paths.
ing compiler optimizations which use hardware calibration
data. These optimizations boost the success rate a program
run by avoiding portions of the machine with poor coher- C. Implementation Using Satisfiability Modulo
ence time and operational error rates. Theory (SMT) Optimization
We first review the key components in a QC compiler. Our compiler implements the above strategies by find-
The input to the compiler is a high-level language program ing the solution to a constrained optimization problem

8 P ROCEEDINGS OF THE IEEE

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

Fig. 9. Measured success rate of R-SMT compared to Qiskit and


T-SMT . (Of 8192 trials per execution, success rate is the
percentage that achieve the correct answer in real-system
execution.) ω is a weight factor for readout error terms in the
R-SMT objective, 0.5 is equal weight for CNOT and readout errors.
R-SMT obtains higher success rate than Qiskit because it adapts
the qubit mappings according to dynamic error rates and also avoids
unnecessary qubit communication.

Fig. 8. Noise-adaptive compilation using SMT optimization. Inputs


D. Real-System Evaluation
are a QC program IR, details about the hardware qubit We present real-system evaluation on IBMQ16. Our eval-
configuration, and a set of options, such as routing policy and solver
uation uses 12 common QC benchmarks, compiled using
R-SMT and T-SMT which are variants of our compiler
options. From these, compiler generates a set of appropriate
constraints and uses them to map program qubits to hardware
qubits and schedule operations. The output of the optimization is
and IBM’s Qiskit compiler (version 0.5.7) [6] which is the
used to generate an executable version of the program. default for this system. R-SMT optimizes the reliability of
the program using hardware noise data. T-SMT optimizes
the execution time of the program considering real-system
using an SMT solver. The variables and constraints in the
gate durations and coherence times, but not operational
optimization encode program information, device topology
error rates. IBM Qiskit is also noise-unaware and uses
constraints, and noise information. The variables express
randomized algorithms for SWAP optimization. For each
the choices for program qubit mappings, gate start times,
benchmark and compiler, we measured the success rate on
and routing paths. The constraints specify that qubit
IBMQ16 system using 8192 trials per program. A success
mappings should be distinct, the schedule should respect
rate of 1 indicates a perfect noise-free execution.
program dependences, and that routing paths should be
Fig. 9 shows the success rate for the three compilers
nonoverlapping. Fig. 8 summarizes the optimization-based
on all the benchmarks. R-SMT has higher success rate
compilation pipeline for IBMQ16.
than both baselines on all benchmarks, demonstrating the
The objective of our optimization is to maximize the
effectiveness of noise-adaptive compilation. Across bench-
success rate of a program execution. Since the success rate
marks R-SMT obtains geomean 2.9× improvement over
can be determined only from a real-system run, we model
Qiskit, with up to 18× gain. Fig. 10 shows the mapping
it at compile time as the program reliablity. We define the
used by Qiskit, T-SMT , and R-SMT for BV4. Qiskit places
reliability of a program as the product of reliability of all
qubits in a lexicographic order without considering CNOT
gates in the program. Although this is not a perfect model
and readout errors and incurs extra swap operations. Sim-
for the success rate, it serves as a useful measure of overall
ilarly, T-SMT is also unaware of noise variations across
correctness [7], [40]. For a given mapping, the solver
the device, resulting in mappings which use unreliable
determines the reliability of each 2-qubit and readout
hardware. R-SMT outperforms these baselines because
operation and computes an overall reliability score. The
it maximizes the likelihood of reliable execution by
solver maximizes the reliability score over all mappings by
leveraging microarchitectural noise characteristics during
tracking and adapting to the error rates, coherence limits,
compilation.
and qubit movement based on program qubit locations.
Full results of our evaluation on seven QC systems from
In practice, we use the Z3 SMT solver to express and
IBM, Rigetti, and UMD can be found in [47] and [48].
solve this optimization. Since the reliability objective is a
nonlinear product, we linearize the objective by optimizing
for the additive logarithms of the reliability scores of each E. Discussion
gate. We term this algorithm as R-SMT . The output of Our work represents one of the first efforts to exploit
the SMT solver is used to create machine executable code hardware noise characteristics during compilation. We
in the vendor-specified assembly language. developed optimal and heuristic techniques for noise

P ROCEEDINGS OF THE IEEE 9

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

Fig. 10. For real data/experiment, on IBMQ16, qubit mappings for Qiskit and our compiler with three optimization objectives, varying the
type of noise-awareness. The edge labels indicate the CNOT gate error rate (×10−2 ), and the node labels indicate the qubit’s readout error
rate (×10−2 ). The thin red arrows indicate CNOT gates. The thick yellow arrows indicate SWAP operations. ω is a weight factor for readout
error terms in the R-SMT objective. (a) Qiskit finds a mapping which requires SWAP operations and uses hardware qubits with high readout
errors. (b), T-SMT finds a a mapping which requires no SWAP operations, but it uses an unreliable hardware CNOT between p3 and p0 .
(c) Program qubits are placed on the best readout qubits, but p0 and p3 communicate using swaps. (d) R-SMT finds a mapping which has the
best reliability where the best CNOTs and readout qubits are used. It also requires no SWAP operations. (a) IBM Qiskit. (b) T-SMT :Optimize
duration without error data. (c) R-SMT (ω 1): Optimize readout reliability. (d) R-SMT (ω 
0.5): Optimize CNOT readout reliability.

adaptivity and performed comprehensive evaluations on abstractions that shield the software from hardware.
several real QC systems [47]. We also developed tech- Bridging the information gap between software and hard-
niques to mitigate crosstalk, another major source of ware by breaking abstraction barriers will be increas-
errors in QC systems, using compiler techniques that ingly important on the path toward practically useful
schedule programs using crosstalk characterization data NISQ devices.
from the hardware [48]. In addition, our techniques
are already being used in industry toolflows [54], [59]. IV. B R E A K I N G T H E Q U B I T
Recognizing the importance of efficient compilation, other ABSTRACTION VIA THE
research groups have also recently developed mapping and THIRD ENERGY LEVEL
routing heuristics [11], [72] and techniques to handle Although quantum computation is typically expressed with
noise [67], [68]. the two-level binary abstraction of qubits, the underlying
Our noise-adaptivity optimizations offer large gains in physics of quantum systems are not intrinsically binary.
success rate. These gains mean the difference between Whereas classical computers operate in binary states at the
executions which yield correct and usable results and physical level (e.g., clipping above and below a threshold
executions where the results are dominated by noise. voltage), quantum computers have natural access to an
These improvements are also multiplicative against bene- infinite spectrum of discrete energy levels. In fact, hard-
fits obtained elsewhere in the stack and will be instrumen- ware must actively suppress higher level states in order
tal in closing the gap between near-term QC algorithms to realize an engineered two-level qubit approximation.
and hardware. Our work also indicates that it is important In this sense, using three-level qutrits (quantum trits) is
to accurately characterize hardware and expose charac- simply a choice of including an additional discrete energy
terization data to software instead of hiding it behind a level within the computational space. Thus, it is appealing
device-independent ISA layer. Additionally, our work also to explore what gains can be realized by breaking the
proposes that QC programs should be compiled once-per- binary qubit abstraction.
execution using the latest hardware characterization data In prior work on qutrits (or more generally, d-level
to obtain the best executions. qudits), researchers identified only constant factor gains
Going beyond noise characteristics, we also studied from extending beyond qubits. In general, this prior work
the importance of exposing other microarchitectural infor- [53] has emphasized the information compression advan-
mation to software. We found that when the compiler tages of qutrits. For example, N qubits can be expressed as
has access to the native gates available in the hard- (N/ log 2 (3)) qutrits, which leads to log2 (3) ≈ 1.6-constant
ware (micro operations used to implement ISA-level factor improvements in runtimes.
gates), it can further optimize programs and improve Recently, however, our research group demonstrated a
success rates. Overall, our work indicates that QC novel qutrit approach that leads to exponentially faster
machines are not yet ready for technology independent runtimes (i.e., shorter in circuit depth) than qubit-only

10 P ROCEEDINGS OF THE IEEE

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

approaches [26], [27]. The key idea underlying the


approach is to use the third state of a qutrit as temporary
storage. Although qutrits incur higher per-operation error
rates than qubits, this is compensated by dramatic reduc-
tions in runtimes and quantum gate counts. Moreover, our
approach applies qutrit operations only in an intermedi- Fig. 12. Toffoli decomposition via qutrits. Each input and output is
ary stage: the input and output are still qubits, which is a qubit. The red controls activate on |1 and the blue controls
important for initialization and measurement on practical activate on |2. The first gate temporarily elevates q1 to |2 if both
q0 and q1 were |1. We then perform the X operation only if q1 is |2.
quantum devices [56], [57].
The final gate restores q0 and q1 to their original state.
The net result of our work is to extend the frontier
of what quantum computers can compute. In particular,
the frontier is defined by the zone in which every machine
qubit is a data qubit, for example a 100-qubit algorithm computational basis states: |0, |1, and |2. A qutrit state
running on a 100-qubit machine. In this frontier zone, |ψ may be represented analogously to a qubit as |ψ =
we do not have space for nondata workspace qubits known α|0 + β|1 + γ|2, where α2 + β2 + γ2 = 1. Qutrits
as ancilla. The lack of ancilla in the frontier zone is a are manipulated in a similar manner to qubits; however,
costly constraint that generally leads to inefficient circuits. there are additional gates which may be performed on
For this reason, typical circuits instead operate below the qutrits. We focus on the X+1 and X−1 operations, which
frontier zone, with many machine qubits used as ancilla. are addition and subtraction gates, modulo 3. For example,
Our work demonstrates that ancilla can be substituted with X+1 elevates |0 to |1 and elevates |1 to |2, while
qutrits, enabling us to operate efficiently within the ancilla- wrapping |2 to |0.
free frontier zone. Just as single-qubit gates have qutrit analogs, the same
holds for two-qutrit gates. For example, consider the
A. Qutrit-Assisted AND Gate CNOT operation, where an X gate is performed condi-
tioned on the control being in the |1 state. For qutrits,
We develop the intuition for how qutrits can be useful
an X+1 or X−1 gate may be performed, conditioned on
by considering the example of constructing an AND gate.
the control being in any of the three possible basis states.
In the framework of QC, which requires reversibility, AND
Just as qubit gates are extended to take multiple controls,
is not permitted directly. For example, consider the output
qutrit gates are extended similarly.
of 0 from an AND gate with two inputs. With only this
In Fig. 12, a Toffoli decomposition using qutrits is given.
information about the output, the value of the inputs
A similar construction for the Toffoli gate is known from
cannot be uniquely determined (00, 01, and 10 all yield
the past work [38], [55]. The goal is to perform an X
an AND output of 0). However, these operations can be
operation on the last (target) input qubit q2 if and only
made reversible by the addition of an extra, temporary
if the two control qubits, q0 and q1 , are both |1. First,
workspace bit initialized to 0. Using a single additional
a |1-controlled X+1 is performed on q0 and q1 . This
such as ancilla, the AND operation can be computed
elevates q1 to |2 if and only if q0 and q1 were both |1.
reversibly as in Fig. 11. Although this approach works,
Then, a |2-controlled X gate is applied to q2 . Therefore, X
it is expensive—in order to decompose the Toffoli gate
is performed only when both q0 and q1 were |1, as desired.
in Fig. 11 into hardware-implementable one- and two-
The controls are restored to their original states by a
input gates, it is decomposed into at least six CNOT gates.
|1-controlled X−1 gate, which undoes the effect of the
However, if we break the qubit abstraction and allow
first gate. The key intuition in this decomposition is that
occupation of a higher qutrit energy level, the cost of the
the qutrit |2 state can be used instead of ancilla to store
Toffoli AND operation is greatly diminished. Before pro-
temporary information.
ceeding, we review the basics of qutrits, which have three

B. Generalized Toffoli Gate


The intuition of our technique extends to more
complicated gates. In particular, we consider the
generalized Toffoli gate, a ubiquitous quantum operation
which extends the Toffoli gate to have any number of
control inputs. The target input is flipped if and only if
all of the controls are activated. Our qutrit-based circuit
Fig. 11. Reversible AND circuit using a single ancilla bit. The
decomposition for the generalized Toffoli gate is presented
inputs are on the left, and time flows rightward to the outputs. This
in Fig. 13. The decomposition is expressed in terms of
AND gate is implemented using a Toffoli (CCNOT) gate with inputs
q0 , q1 and a single ancilla initialized to 0. At the end of the circuit, three-qutrit gates (two controls and one target) instead
q0 and q1 are preserved, and the ancilla bit is set to 1 if and only if of single- and two- qutrit gates because the circuit can
both other inputs are 1. be understood purely classically at this granularity.

P ROCEEDINGS OF THE IEEE 11

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

Table 1 Scaling of Circuit Depths and Two-Qudit Gate Counts for All
Three Benchmarked Circuit Constructions for the N-Controlled General-
ized Toffoli

We verified our circuits, both formally and via sim-


ulation. Our verification scripts are available on our
GitHub [4].

C. Simulation Results
Table 1 shows the scaling of circuit depths and two-qudit
gate counts for all three benchmarked circuits. The QUBIT-
based circuit constructions from the past work are linear in
Fig. 13. Our circuit decomposition for the generalized Toffoli gate
depth and have a high linearity constant. Augmenting with
is shown for 15 controls and 1 target. The inputs and outputs are
both qubits, but we allow occupation of the |2 qutrit state in
a single borrowed ancilla (QUBIT+ANCILLA) reduces the
between. The circuit has a tree structure and maintains the property circuit depth by a factor of 8. However, both circuit con-
that the root of each subtree can only be elevated to |2 if all of its structions are significantly outperformed by our QUTRIT
control leaves were |1. Thus, the U gate is only executed if all construction, which scales logarithmically in N and has a
controls are |1. The right-half of the circuit performs uncomputation
relatively small leading coefficient. Although there is not
to restore the controls to their original state. This construction
applies more generally to any multiply controlled U gate. Note that
an asymptotic scaling advantage for two-qudit gate count,
the three-input gates are decomposed into six two-input and seven the linearity constant for our QUTRIT circuit is 70× smaller
single-input gates in our actual simulation, as based on the than for the equivalent ancilla-free QUBIT circuit.
decomposition in [17]. We ran simulations under realistic SC and trapped ion
device noise. The simulations were run in parallel on
over 100 n1-standard-4 Google Cloud instances. These
simulations represent over 20 000 CPU hours, which were
In actual implementation and in our simulation, sufficient to estimate mean fidelity to an error of 2σ < 0.1%
we used a decomposition [17] that requires six two- for each circuit-noise model pair.
qutrit and seven single-qutrit physically implementable The full results of our circuit simulations are shown
quantum gates. in Fig. 14. All simulations are for the 14-input (13 controls
Our circuit decomposition is most intuitively understood and 1 target) generalized Toffoli gate. We simulated each
by treating the left half of the circuit as a tree. The desired of the three circuit benchmarks against each of our noise
property is that the root of the tree, q7 , is |2 if and models (when applicable), yielding the 16 bars in the
only if each of the 15 controls was originally in the |1 figure. Note that our qutrit circuit consistently outper-
state. To verify this property, we observe the root q7 can forms qubit circuits, with advantages ranging from 2× to
only become |2 if and only ifq7 was originally |1 and 10 000×.
q3 and q11 were both previously |2. At the next level of
the tree, we see q3 could have only been |2 if q3 was
originally |1 and both q1 and q5 were previously |2, and D. Discussion
similarly for the other triplets. At the bottom level of the The results presented in our work in [26] and [27] are
tree, the triplets are controlled on the |1 state, which are applicable to QC in the near term, on machines that are
activated only when the even-index controls are all |1. expected within the next five years. By breaking the qubit
Thus, if any of the controls were not |1, the |2 states abstraction barrier, we extend the frontier of what is com-
would fail to propagate to the root of the tree. The right- putable by quantum hardware right now, without needing
half of the circuit performs uncomputation to restore the to wait for better hardware. As verified by our open-source
controls to their original state. circuit simulator coupled with realistic noise models, our
After each subsequent level of the tree structure, circuits are more reliable than qubit-only equivalents, sug-
the number of qubits under consideration is reduced by gesting that qutrits offer a promising path toward scal-
a factor of ∼2. Thus, the circuit depth is logarithmic in N , ing quantum computers. We propose further investigation
which is exponentially smaller than ancilla-free qubit-only into what advantage qutrits or qudits may confer. More
circuits. Moreover, each qutrit is operated on by a constant broadly, it is critical for quantum architects to bear in
number of gates, so the total number of gates is linear mind that standard abstractions in classical computing do
in N . not necessarily transfer to quantum computation. Often,

12 P ROCEEDINGS OF THE IEEE

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

Fig. 14. Circuit simulation results for all possible pairs of circuit constructions and noise models. Each bar represents 1000  trials, so the

error bars are all 2σ < 0.1 . Our QUTRIT construction significantly outperforms the QUBIT construction. The QUBIT ANCILLA bars are drawn 
with dashed lines to emphasize that it has access to an extra ancilla bit, unlike our construction. Figure reprinted with permission from [27].

this presents unrealized opportunities, as in the case (see Section V-C), there are leakage errors between logical
of qutrits. states, but the transfer probability is estimated to be at the
order of 10−10 with current techonology, thus negligible.
V. B R E A K I N G T H E Q U B I T A B S T R A C -
TION VIA THE GKP ENCODING A. Phase Space Diagram
Currently, there are many competing physical qubit imple-
We describe the GKP qubits in the phase space. For a
mentations. For example, the transmon qubits [2] are
comparison, we first discuss the phase space diagram for a
encoded in the lowest two energy levels of the charge
classical harmonic oscillator (CHO) and an SC qubit.
states in SC LC circuits with Josephson junctions; trapped
ion qubits can be encoded in two ground state hyperfine 1) Classical Harmonic Oscillators: Examples of CHOs
levels [9] or a ground state level and an excited level include LC circuits, springs, and pendulums with small dis-
of an ion [13]; quantum dot qubits use electron spin placement. The voltage/displacement (denoted as p) and
triplets [41]. These QIP platforms have rather distinct the current/momentum (denoted as q ) value completely
physical characteristics, but they are all exposed to the characterize the dynamics of CHO systems. The phase
other layers in the stack as qubits and other implemen- space diagram plots p versus q , which for CHOs are circles
tation details are often hidden. This abstraction is nat- (up to normalization) with the radius representing the
ural for classical computing stack because the robustness system energy. The energy of CHOs can be any nonnegative
of classical bits decouples the programming logic from real value.
physical properties of the transistors except the logical
2) Quantum Harmonic Oscillators: The QHO is the quan-
value. In contrast, qubits are fragile so there are more
tized version of the CHO and is the physical model for
than (superpositions of) the logical values that we want
SC LC circuits and SC cavity modes. One of the values
to know about the implementation. For example, in the
get quantized for QHOs is the system energy, which can
transmon qubits and trapped ion qubits, logical states can
only take equally spaced nonzero discrete values (see
be transferred to higher levels of the physical space by
Fig. 16). The lowest allowed energy is not 0 but (1/2)
unwanted operations and this can cause leakage errors
(up to normalization). We call the quantum state with the
[24], [71]. It will be useful for other layers in the stack
lowest energy the ground state. For a motion with a certain
to access this error information and develop methods
energy, the phase space diagram is not a circle anymore but
to mitigate it. In Section IV, we discussed the qutrit
a quasidistribution that can be described by the Wigner
approach that directly uses the third level for information
function. We say the distribution is a “quasi” distribution
processing, however, it could be more interesting if we
because the probability can be negative. The phase space
can encode the qubit (qudit) using the whole physical
diagram for the ground state and first excited state is plot
Hilbert space to avoid leakage errors systematically and
in Fig. 15.
use the redundant degrees of freedom to reveal infor-
mation about the noise in the encoding. The encoding 3) SC Charge Qubits: The QHO does not allow us selec-
proposed by Gottesman et al. [29] provides such an exam- tively address the energy levels, thus leakage errors will
ple. GKP encoding is free of leakage errors and other occur if we use the lowest two levels as the qubit logic
errors (in the form of small shifts in phase space) can space. For example, a control signal that provides the
be identified and corrected by quantum nondemolition energy difference ΔE enables the transition |0 → |1,
(QND) measurements and simple linear optical operations. but will also make the transition |1 → |2 which brings
In realistic implementations of approximate GKP states the state out of the logic space. To avoid this problem,

P ROCEEDINGS OF THE IEEE 13

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

Fig. 15. Phase space diagrams for a CHO, the ground state, and the first excited state of a QHO and the logic 0 and 1 state of the GKP
qubit. For quantum phase space diagrams, the plotted distribution is the Wigner quasi-probability function, where red indicates positive
values and blue indicates negative values.

Fig. 16. Left: an LC circuit. In SC LC circuits, normal current


becomes SC current. Right: the energy potential of a harmonic
oscillator. In QHOs like the SC LC circuits, the system energy
becomes equally spaced discrete values. The plotted two levels are
the ground state and the first excited state.

Fig. 18. Squeezed vacuum state.

the Cooper pair box (CPB) design of an SC charge qubit


replaces the inductor (see Fig. 17) with a Josephson junc-
tion, making the circuit an anharmonic oscillator, in which “squeeze” the ground state of the QHO (also known as the
the energy levels are not equally spaced anymore. The vacuum state) in the p-direction; however, the distribution
Wigner function for CPB eigenstates are visually similar to in the q -direction spreads, as shown in Fig. 18. Usually,
those of QHO and only differ from them to the first order we have to know both the p and q values to characterize
of the anharmonicity, thus we do not plot them in Fig. 15 the error information unless we know the error is biased.
separately. Thus, it is a great challenge to design encodings in the
phase space to reveal error information.
B. Heisenberg Uncertainty Principle
C. GKP Encoding
We hope that with utilizing the whole physical states
(higher energy levels), we can use the redundant The GKP states are also called the grid states because
space to encode and extract error information. However, each of them is a rectangular lattice in the phase space (see
the Heisenberg uncertainty principle sets the fundamental Fig. 15). There are also other types of lattice in the GKP
limit on what error information we can extract from the family, for example, the hexagonal GKP [29]. Intuitively,
physical states—the more we know about the q variable, the GKP encoding “breaks” the Heisenberg uncertainty
the less we know about the p variable. For example, we can principle—we do not know what are the measured p and
q values of the state (thus expected values of p and q
remain uncertain), but we do know that they must be
integer multiples of the spacing of the grid. Thus, we have
access to the error information in both directions and if
we measure values that are not multiples of the spacing of
the grid, we know there must be errors. Formally, the ideal
GKP logical states are given by


Fig. 17. Left: an LC circuit. In SC LC circuits, normal current |0gkp = Spk |q = 0
becomes SC current. Right: the energy potential of a harmonic
k=−∞
oscillator. In QHOs like the SC LC circuits, the system energy ∞

becomes equally spaced discrete values. The plotted two levels are |1gkp = Spk |q = π (1)
the ground state and the first excited state.
k=−∞

14 P ROCEEDINGS OF THE IEEE

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

However, the difficulty of using GKP qubits in QIP plat-


forms lies in its preparation since they live in highly non-
classical states with relatively high mean photon number
(i.e., the average energy levels). Thus, reliable prepa-
ration of encoded GKP states is an important problem.
In [64], we gave fault-tolerance definitions for GKP prepa-
ration in SC cavities and designed a protocol that fault-
tolerantly prepares the GKP states. We briefly describe the
Fig. 19. Approximate GKP |0 state in q- and p-axis. main ideas.
1) Goodness of Approximate GKP States: Naturally,
because of the finite width of the peaks of approximate
1/2
where Sp = e−2i(π) p is the displacement operator in GKP states, it will not be possible to correct a shift error in
q space, which shifts a wave function in the q -direction p or q of magnitude at most ((π)1/2 /2) with certainty. For
by 2(π)1/2 . These definitions show that for GKP logical example, suppose we have an approximate |0 GKP state
0 and 1, the spacing of the grid in q -direction is 2(π)1/2 with a peak at q = 0 subject to a shift error e−ivp with
and the spacing in p is (π)1/2 . In q -direction, the logical |v| ≤ ((π)1/2 /2). The finite width of the Gaussian peaks
|0 state has peaks at even multiples of (π)1/2 , and the will have a nonzero overlap in the region ((π)1/2 /2) < q <
logical |1 state has peaks at odd multiples of (π)1/2 . (3(π)1/2 /2) and (−3(π)1/2 /2) < q < (−(π)1/2 /2). Thus,
For logical |+ and |−, the spacing in p and q grids is with nonzero probability the state can be decoded to |1
switched. instead of |0 (see Fig. 20 for an illustration).
In general, if an approximate GKP state is afflicted by
1) Approximate GKP States: The ideal GKP states require
a correctable shift error, we would like the probability of
infinite energy, thus they are not realistic. In the laboratory,
decoding to the incorrect logical state to be as small as
we can prepare approximate GKP states as illustrated
possible. A smaller overlap of the approximate GKP state
in Fig. 19, where peaks and the envelope are Gaussian
in regions in q and p space that lead to decoding the state
curve.
to the wrong logical state will lead to a higher proba-
2) Error Correction With GKP Qubits: GKP qubits are bility of correcting a correctable shift error by a perfect
designed to correct shift errors in q - and p-axis. A simple GKP state.
decoding strategy will be shifting the GKP state back to
2) Preparation of Approximate GKP States Using Phase
the closest peak. For example, if we measure a q value
Estimation: We observe that the GKP states are the eigen-
to be 2(π)1/2 + Δq , where Δq < ((π)1/2 /2), then we
states of the Sp operator, thus we can use phase estima-
can shift it back to 2(π)1/2 . With this simple decoding,
tion to gradually project a squeezed vacuum state to an
GKP can correct all shift errors smaller than ((π)1/2 /2).
approximate GKP state. The phase estimation circuit for
While there are other proposals for encoding qubits in
preparing an approximate |0̃ GKP state is given in Fig. 21.
QHO [45], [50], [52] that are designed for realistic errors
The first horizontal line represents the cavity mode that
such as photon loss, it is shown that GKP qubits have the
we want to prepare the GKP states. The second line is a
most error correcting ability in the regime of experimental
relevance [5].
In addition, GKP qubits can also provide error correc-
tion information when concatenating with quantum error
correction codes (QECCs) and yield higher thresholds. For
example, when combining the GKP qubits with a surface
code, the measured continuous p and q values in the
stabilizer measurement can reveal more about the error
distribution than traditional qubits [23], [51], [69].
Finally, it has been shown that given a supply of GKP-
encoded Pauli eigenstates, universal fault-tolerant quan-
tum computation can be achieved using only Gaussian
operations [8]. Compared to qubit error correction codes,
the GKP encoding enables much simpler fault-tolerant
constructions.
Fig. 20. Peaks centered at even integer multiples of π 1/2 in q 
D. Fault-Tolerant Preparation of space. The peak on the left contains large tails that extend into the
region where a shift error is decoded to the logical |1 state. The
Approximate GKP States
peak on the right is much narrower. Consequently for some
The GKP encoding has straightforward logical oper- interval δ, the peak on the right will correct shift errors of size
ation and promising error correcting performance. π1/2 /2 − δ with higher probability than the peak on the left.

P ROCEEDINGS OF THE IEEE 15

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

direction that are closing the gap between current quan-


tum technology and real-world QC applications. We would
also like to briefly discuss some promising future directions
along this line.

A. Noise-Tailoring Compilation
We can further explore the idea of breaking the ISA
Fig. 21. Phase estimation circuit with the flag qubit. The protocol abstraction. Near-term quantum devices have errors from
is aborted if the flag qubit measurement is nontrivial. elementary operations like 1- and 2-qubit gates, but
also emergent error modes like crosstalk. Emergent error
modes are hard to characterize and to mitigate. Recently,
it has been shown that randomized compiling could trans-
transmon ancilla initialized to |+. The third line is a form complicated noise channels including crosstalk, SPAM
transmon flag qubit initialized to |0. The H gate is the errors, and readout errors into simple stochastic Pauli
Hadamard gate. Λ(eiγ ) = diag(1, eiγ ) is the gate with a errors [70], which could potentially enable subsequent
control parameter γ in each round of the phase estimation noise-adaptive compilation optimizations. We believe if
to increase the probability of projecting the cavity state to compilation schemes that combine noise tailoring and
an approximate eigenstate of the displacement operator noise adaptation could be designed, they will outperform
after the measurement. After applying several rounds of existing compilation methods.
the circuit in Fig. 21, the input squeezed vacuum state
is projected onto an approximate eigenstate of Sp with B. Algorithm-Level Error Correction
some random eigenvalue eiθ . Additionally, an estimated
value for the phase θ is obtained. After computing the Near-term quantum algorithms such as variational
phase, the state can be shifted back to an approximate +1 quantum eigensolver (VQE) and QAOA are tailored for
eigenstate of Sp . NISQ hardware, breaking the circuit/ISA abstraction.
In our protocol, we use a flag qubit to detect any damp- We could take a step further and look at high-
ing event during the controlled-displacement gate, if a level algorithms equipped with customized error
nontrivial measurement is obtained, we abort the protocol correction/mitigation schemes. Prominent examples
and start over. Using our simulation results, we also find of this idea are the generalized superfast encoding
a subset of output states that are robust to measurement (GSE) [63] and the Majorana loop stabilizer code
errors in the transmon ancilla and only accept states in (MLSC) [30] for quantum chemistry. In GSE and MLSC,
that subset. We proved that our protocol is fault-tolerant the overhead of mapping Fermionic operators onto qubit
according to the definition we gave. In practice, our pro- operators stays constant with the qubit number (as
tocol produces “good” approximate GKP states with high opposed to linear scaling in the usual Jordan–Wigner
probability and we expect to see experimental efforts to encoding or logarithmic in Bravyi–Kitaev encoding).
implement our protocol. On the other hand, qubit operators in these mappings are
logical operators of a distance 3 stabilizer error correction
code so that we can correct all weight 1-qubit errors in
E. Discussion the algorithm with stabilizer measurements. These works
The GKP qubit architecture is a promising candidate are the first attempts to algorithm-level error correction,
for both near-term and fault-tolerant QC implementations. and we are expecting to see more efforts of this kind to
With intrinsic error-correcting capabilities, the GKP qubit improve the robustness of near-term algorithms.
breaks the abstraction layer between error correction and
the physical implementation of qubits. In [64], we dis- C. Dissipation-Assisted Error Mitigation
cussed the fault-tolerant preparation of GKP qubits and
realistic experimental difficulties. We believe that qubit We generally think of dissipation as competing with
encodings like the GKP encoding will be useful for reli- quantum coherence. However, with careful design of the
able QC. quantum system, dissipation can be engineered and used
for improving the stability of the underlying qubit state.
Previous work on autonomous qubit stabilization [42] and
VI. C O N C L U S I O N A N D F U T U R E error correction [31] suggests that properly engineered
DIRECTIONS dissipation could largely extend qubit coherence time.
In this review, we proposed that greater quantum efficiency Exploring the design space of such systems and their asso-
can be achieved by breaking abstraction layers in the QC ciated error correction/mitigation schemes might provide
stack. We examined some of the previous work in this alternative paths to an efficient and scalable QC stack.

16 P ROCEEDINGS OF THE IEEE

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

REFERENCES
[1] Cramming More Power Into a Quantum Device. superconducting quantum processor,” IEEE Micro, computation with quantum dots,” Phys. Rev. A, Gen.
Accessed: Aug. 30, 2010. [Online]. Available: vol. 38, no. 3, pp. 40–47, May 2018. Phys., vol. 57, no. 1, pp. 120–126, Jan. 1998.
https://www.ibm.com/blogs/research/2019/ [23] K. Fukui, A. Tomita, A. Okamoto, and K. Fujii, [42] Y. Lu et al., “Universal stabilization of a
03/power-quantum-device/ “High-threshold fault-tolerant quantum parametrically coupled qubit,” Phys. Rev. Lett.,
[2] J. Koch, “Charge-insensitive qubit design derived computation with analog quantum error vol. 119, no. 15, Oct. 2017.
from the Cooper pair box,” Phys. Rev. A, Gen. Phys., correction,” Phys. Rev. X, vol. 8, no. 2, May 2018, [43] D. Maslov, S. M. Falconer, and M. Mosca, “Quantum
vol. 76, no. 4, Oct. 2007, Art. no. 042319. Art. no. 021054. circuit placement,” IEEE Trans. Comput.-Aided
[3] (2018). Cirq: A Python Framework for Creating, [24] J. Ghosh, A. G. Fowler, J. M. Martinis, and Design Integr. Circuits Syst., vol. 27, no. 4,
Editing, and Invoking Noisy Intermediate Scale M. R. Geller, “Understanding the effects of leakage pp. 752–763, Apr. 2008.
Quantum (NISQ) Circuits. [Online]. Available: in superconducting quantum-error-detection [44] J. R. McClean, J. Romero, R. Babbush, and
https://github.com/quantumlib/Cirq circuits,” Phys. Rev. A, Gen. Phys., vol. 88, no. 6, A. Aspuru-Guzik, “The theory of variational hybrid
[4] (2019). Code for Asymptotic Improvements to Dec. 2013, Art. no. 062329. quantum-classical algorithms,” New J. Phys.,
Quantum Circuits Via Qutrits. [Online]. Available: [25] S. J. Glaser, U. Boscain, T. Calarco, C. P. Koch, vol. 18, no. 2, 2016, Art. no. 023023.
https://github.com/epiqc/qutrits W. Köckenberger, R. Kosloff, I. Kuprov, B. Luy, [45] M. H. Michael et al., “New class of quantum
[5] V. V. Albert et al., “Performance and structure of S. Schirmer, T. Schulte-Herbrüggen, D. Sugny, and error-correcting codes for a bosonic mode,” Phys.
single-mode bosonic codes,” Phys. Rev. A, Gen. F. K. Wilhelm, “Training Schrödinger’s cat: Rev. X, vol. 6, no. 3, Jul. 2016, Art. no. 031006.
Phys., vol. 97, no. 3, p. 32346, Mar. 2018. Quantum optimal control,” Eur. Phys. J. D, vol. 69, [46] P. Murali, J. M. Baker, A. Javadi-Abhari, F. T. Chong,
[6] G. Aleksandrowicz et al., “Qiskit: An open-source no. 12, p. 279, Dec. 2015. and M. Martonosi, “Noise-adaptive compiler
framework for quantum computing,” IBM T.J [26] P. Gokhale, J. M. Baker, C. Duckering, F. T. Chong, mappings for noisy intermediate-scale quantum
Watson Res. Center, New York, NY, USA, N. C. Brown, and K. R. Brown, “Extending the computers,” in Proc. 24th Int. Conf. Architectural
Tech. Rep., 2019, doi: 10.5281/zenodo.2562111. frontier of quantum computers with qutrits,” IEEE Support Program. Lang. Operating Syst. (ASPLOS),
[7] F. Arute et al., “Quantum supremacy using a Micro, vol. 40, no. 3, pp. 64–72, 2020. New York, NY, USA, Apr. 2019, pp. 1015–1029.
programmable superconducting processor,” Nature, [27] P. Gokhale, J. M. Baker, C. Duckering, N. C. Brown, [47] P. Murali, N. M. Linke, M. Martonosi, A. J. Abhari,
vol. 574, no. 7779, pp. 505–510, 2019. K. R. Brown, and F. T. Chong, “Asymptotic N. H. Nguyen, and C. H. Alderete, “Full-stack,
[8] B. Q. Baragiola, G. Pantaleoni, R. N. Alexander, improvements to quantum circuits via qutrits,” in real-system quantum computer studies:
A. Karanjai, and N. C. Menicucci, “All-Gaussian Proc. 46th Int. Symp. Comput. Architectural, Architectural comparisons and design insights,” in
universality and fault tolerance with the New York, NY, USA, Jun. 2019, pp. 554–566, doi: Proc. 46th Int. Symp. Comput. Architectural (ISCA),
Gottesman-Kitaev-Preskill code,” 2019, 10.1145/3307650.3322253. New York, NY, USA, Jun. 2019, pp. 527–540.
arXiv:1903.00012. [Online]. Available: [28] P. Gokhale et al., “Partial compilation of variational [48] P. Murali, D. C. Mckay, M. Martonosi, and
http://arxiv.org/abs/1903.00012 algorithms for noisy intermediate-scale quantum A. Javadi-Abhari, “Software mitigation of crosstalk
[9] B. B. Blinov, D. Leibfried, C. Monroe, and machines,” in Proc. 52nd Annu. IEEE/ACM Int. on noisy intermediate-scale quantum computers,”
D. J. Wineland, “Quantum computing with trapped Symp. Microarchitecture, New York, NY, USA, in Proc. 25th Int. Conf. Architectural Support
ion hyperfine qubits,” in Experimental Aspects of Oct. 2019, pp. 266–278. Program. Lang. Operating Syst. (ASPLOS),
Quantum Computing. New York, NY, USA: Springer, [29] D. Gottesman, A. Kitaev, and J. Preskill, “Encoding New York, NY, USA, Mar. 2020, pp. 1001–1016.
2005, pp. 45–59. a qubit in an oscillator,” Phys. Rev. A, Gen. Phys., [49] A. M. Nielsen and L. I. Chuang, Quantum
[10] S. A. Caldwell et al., “Parametrically activated vol. 64, no. 1, p. 12310, Jun. 2001. Computation and Quantum Information: 10th
entangling gates using transmon qubits,” Phys. Rev. [30] Z. Jiang, J. McClean, R. Babbush, and H. Neven, Anniversary Edition, 10th ed. New York, NY, USA:
A, Gen. Phys. Appl., vol. 10, no. 3, Sep. 2018, “Majorana loop stabilizer codes for error mitigation Cambridge Univ. Press, 2011.
Art. no. 034050. in fermionic quantum simulations,” Phys. Rev. [50] K. Noh, V. V. Albert, and L. Jiang, “Quantum
[11] A. M. Childs, E. Schoute, and C. M. Unsal, “Circuit Appl., vol. 12, no. 6, pp. 064041-1–064041-17, capacity bounds of Gaussian thermal loss channels
transformations for quantum architectures,” Dec. 2019. [Online]. Available: https://link.aps. and achievable rates with
Tech. Rep., 2019, doi: 10.4230/LIPIcs.TQC.2019.3. org/doi/10.1103/PhysRevApplied.12.064041, Gottesman-Kitaev-Preskill codes,” IEEE Trans. Inf.
[12] J. M. Chow et al., “Simple all-microwave entangling doi: 10.1103/PhysRevApplied.12.064041. Theory, vol. 65, no. 4, pp. 2563–2582, Apr. 2019.
gate for fixed-frequency superconducting qubits,” [31] E. Kapit, “Hardware-efficient and fully autonomous [51] K. Noh and C. Chamberland, “Fault-tolerant
Phys. Rev. Lett., vol. 107, no. 8, Aug. 2011, quantum error correction in superconducting bosonic quantum error correction with the
Art. no. 080502. circuits,” Phys. Rev. Lett., vol. 116, no. 15, surface-Gottesman-Kitaev-Preskill code,” Phys. Rev.
[13] J. I. Cirac and P. Zoller, “Quantum computations Apr. 2016, Art. no. 150501. A, Gen. Phys., vol. 101, no. 1, Jan. 2020,
with cold trapped ions,” Phys. Rev. Lett., vol. 74, [32] G. Karypis and V. Kumar, “A fast and high quality Art. no. 012316.
no. 20, pp. 4091–4094, May 1995. multilevel scheme for partitioning irregular [52] N. Ofek et al., “Demonstrating quantum error
[14] W. A. Cross, S. L. Bishop, A. J. Smolin, and graphs,” SIAM J. Sci. Comput., vol. 20, no. 1, correction that extends the lifetime of quantum
M. J. Gambetta, “Open quantum assembly pp. 359–392, Jan. 1998. information,” 2016, arXiv:1602.04768. [Online].
language,” 2017, arXiv:1707.03429. [Online]. [33] N. Khaneja, T. Reiss, C. Kehlet, Available: http://arxiv.org/abs/1602.04768
Available: https://arxiv.org/abs/1707.03429 T. Schulte-Herbrüggen, and S. J. Glaser, “Optimal [53] A. Pavlidis and E. Floratos, “Arithmetic circuits for
[15] P. de Fouquieres, S. G. Schirmer, S. J. Glaser, and control of coupled spin dynamics: Design of NMR multilevel qudits based on quantum Fourier
I. Kuprov, “Second order gradient ascent pulse pulse sequences by gradient ascent algorithms,” transform,” 2017, arXiv:1707.08834. [Online].
engineering,” J. Magn. Reson., vol. 212, no. 2, J. Magn. Reson., vol. 172, no. 2, pp. 296–305, Available: http://arxiv.org/abs/1707.08834
pp. 412–417, Oct. 2011. Feb. 2005. [54] Qiskit. (2019). Qiskit NoiseAdaptiveLayout Pass.
[16] S. Debnath, N. M. Linke, C. Figgatt, [34] A. Y. Kitaev, A. H. Shen, and M. N. Vyalyi, Classical Accessed: Aug. 1, 2019. [Online]. Available:
K. A. Landsman, K. Wright, and C. Monroe, and Quantum Computation. Providence, RI, USA: https://github.com/Qiskit/qiskit-terra/pull/2089
“Demonstration of a small programmable quantum AMS, 2002. [55] T. C. Ralph, K. J. Resch, and A. Gilchrist, “Efficient
computer with atomic qubits,” Nature, vol. 536, [35] P. V. Klimov et al., “Fluctuations of energy-relaxation Toffoli gates using qudits,” Phys. Rev. A, Gen. Phys.,
no. 7614, pp. 63–66, Aug. 2016. times in superconducting qubits,” Phys. Rev. Lett., vol. 75, no. 2, Feb. 2007, Art. no. 022313.
[17] Y.-M. Di and H.-R. Wei, “Elementary gates for vol. 121, Aug. 2018, Art. no. 090502. [56] J. Randall, A. M. Lawrence, S. C. Webster, S. Weidt,
ternary quantum logic circuit,” 2011, [36] P. Krantz, M. Kjaergaard, F. Yan, T. P. Orlando, N. V. Vitanov, and W. K. Hensinger, “Generation of
arXiv:1105.5485. [Online]. Available: S. Gustavsson, and W. D. Oliver, “A quantum high-fidelity quantum control methods for
http://arxiv.org/abs/1105.5485 engineer’s guide to superconducting qubits,” Appl. multilevel systems,” Phys. Rev. A, Gen. Phys.,
[18] A. Erhard et al., “Characterizing large-scale Phys. Rev., vol. 6, no. 2, Jun. 2019, Art. no. 021318. vol. 98, no. 4, Oct. 2018, Art. no. 043414.
quantum computers via cycle benchmarking,” [37] B. P. Lanyon et al., “Towards quantum chemistry on [57] J. Randall et al., “Efficient preparation and
Nature Commun., vol. 10, no. 1, p. 5347, Nov. a quantum computer,” Nature Chem., vol. 2, no. 2, detection of microwave dressed-state qubits and
2019, doi: 10.1038/s41467-019-13068-7. pp. 106–111, Feb. 2010. qutrits with trapped ions,” Phys. Rev. A, Gen. Phys.,
[19] E. Farhi, J. Goldstone, and S. Gutmann, “A quantum [38] B. P. Lanyon et al., “Simplifying quantum logic vol. 91, no. 1, Jan. 2015, Art. no. 012322.
approximate optimization algorithm,” 2014, using higher-dimensional Hilbert spaces,” Nature [58] Rigetti. (2019). PyQuil. Accessed: Aug. 1, 2019.
arXiv:1411.4028. [Online]. Available: Phys., vol. 5, no. 2, pp. 134–140, Feb. 2009. [Online]. Available:
http://arxiv.org/abs/1411.4028 [39] N. Leung, M. Abdelhafez, J. Koch, and D. Schuster, https://github.com/rigetticomputing/pyquil
[20] X. Fu et al., “eQASM: An executable quantum “Speedup for quantum optimal control from [59] R. Quilc. (2019). Use Swap Fidelity Instead of Gate
instruction set architecture,” in Proc. IEEE Int. automatic differentiation based on graphics Time as a Distance Metric. Accessed: Aug. 1, 2019.
Symp. High Perform. Comput. Archit. (HPCA), 2019, processing units,” Phys. Rev. A, Gen. Phys., vol. 95, [Online]. Available:
pp. 224–237. no. 4, Apr. 2017, Art. no. 042318. https://github.com/rigetti/quilc/pull/395
[21] X. Fu et al., “An experimental microarchitecture for [40] N. M. Linke et al., “Experimental comparison of two [60] M. Sarovar, T. Proctor, K. Rudinger, K. Young,
a superconducting quantum processor,” in Proc. quantum computing architectures,” Proc. Nat. Acad. E. Nielsen, and R. Blume-Kohout, “Detecting
50th Annu. IEEE/ACM Int. Symp. Microarchitecture Sci. USA, vol. 114, no. 13, pp. 3305–3310, crosstalk errors in quantum information
(MICRO), Oct. 2017, pp. 813–825. Mar. 2017. processors,” 2019, arXiv:1908.09855. [Online].
[22] X. Fu et al., “A microarchitecture for a [41] D. Loss and D. P. DiVincenzo, “Quantum Available: https://arxiv.org/abs/1908.09855

P ROCEEDINGS OF THE IEEE 17

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Shi et al.: Resource-Efficient Quantum Computing by Breaking Abstractions

[61] N. Schuch and J. Siewert, “Natural two-qubit gate Lang. Operating Syst. (ASPLOS), New York, NY, compiling,” Phys. Rev. A, Gen. Phys., vol. 94, no. 5,
for quantum computation using theXYinteraction,” USA, Apr. 2019, pp. 1031–1044, doi: Nov. 2016, Art. no. 052325.
Phys. Rev. A, Gen. Phys., vol. 67, no. 3, Mar. 2003, 10.1145/3297858.3304018. [71] C. J. Wood and J. M. Gambetta, “Quantification
Art. no. 032301. [66] Strawberry Fields, Xanadu, Toronto, ON, Canada, and characterization of leakage errors,” Phys. Rev.
[62] T. Schulte-Herbrueggen, A. Spoerl, and S. J. Glaser, 2016. A, Gen. Phys., vol. 97, no. 3, Mar. 2018.
“Quantum CISC compilation by optimal control and [67] S. S. Tannu and M. Qureshi, “Ensemble of diverse [72] A. Zulehner, A. Paler, and R. Wille, “An efficient
scalable assembly of complex instruction sets mappings: Improving reliability of quantum methodology for mapping quantum circuits to the
beyond two-qubit gates,” 2007, arXiv:0712.3227. computers by orchestrating dissimilar mistakes,” in IBM QX architectures,” IEEE Trans. Comput.-Aided
[Online]. Available: Proc. 52nd Annu. IEEE/ACM Int. Symp. Des. Integr. Circuits Syst., vol. 38, no. 7,
http://arxiv.org/abs/0712.3227 Microarchitecture (MICRO), New York, NY, USA, pp. 1226–1236, 2019.
[63] K. Setia, S. Bravyi, A. Mezzacapo, and Oct. 2019, pp. 253–265. [73] E. Pednault, J. A. Gunnels, G. Nannicini, L. Horesh,
J. D. Whitfield, “Superfast encodings for fermionic [68] S. S. Tannu and M. K. Qureshi, “Not all qubits are and R. Wisnieff, “Leveraging secondary storage to
quantum simulation,” Phys. Rev. Res., vol. 1, no. 3, created equal: A case for variability-aware policies simulate deep 54-qubit sycamore circuits,” 2019,
pp. 033033-1–033033-8, Oct. 2019. [Online]. for NISQ-era quantum computers,” in Proc. 24th arXiv:1910.09534. [Online]. Available:
Available: https://link.aps.org/doi/10.1103/ Int. Conf. Architectural Support Program. Lang. https://arxiv.org/abs/1910.09534
PhysRevResearch.1.033033, doi: 10.1103/ Operating Syst. (ASPLOS), New York, NY, USA, [74] C. Huang et al., “Classical simulation of quantum
PhysRevResearch.1.033033. Apr. 2019, pp. 987–999. supremacy circuits,” 2020, arXiv:2005.06787.
[64] Y. Shi, C. Chamberland, and A. Cross, [69] C. Vuillot, H. Asasi, Y. Wang, L. P. Pryadko, and [Online]. Available:
“Fault-tolerant preparation of approximate GKP B. M. Terhal, “Quantum error correction with the https://arxiv.org/abs/2005.06787
states,” New J. Phys., vol. 21, no. 9, 2019, toric Gottesman-Kitaev-Preskill code,” Phys. Rev. A, [75] M. Kjaergaard et al., “Superconducting qubits:
Art. no. 093007. Gen. Phys., vol. 99, no. 3, Mar. 2019, Current state of play,” Annu. Rev. Condens. Matter
[65] Y. Shi et al., “Optimized compilation of aggregated Art. no. 032344. Phys., vol. 11, no. 1, pp. 369–395, 2020,
instructions for realistic quantum computers,” in [70] J. J. Wallman and J. Emerson, “Noise tailoring for doi: 10.1146/annurev-conmatphys-031119-
Proc. 24th Int. Conf. Architectural Support Program. scalable quantum computation via randomized 050605.

18 P ROCEEDINGS OF THE IEEE

Authorized licensed use limited to: Princeton University. Downloaded on June 18,2020 at 11:45:31 UTC from IEEE Xplore. Restrictions apply.

You might also like