Qiskit 1
Qiskit 1
July 3, 2018
Abstract
Recently, the development of quantum chips has made great progress–
the number of qubits is increasing and the fidelity is getting higher.
However, qubits of these chips are not always fully connected, which
sets additional barriers for implementing quantum algorithms and pro-
gramming quantum programs. In this paper, we introduce a general
circuit optimizing scheme, which can efficiently adjust and optimize
quantum circuits according to arbitrary given qubits’ layout by adding
additional quantum gates, exchanging qubits and merging single-qubit
gates. Compared with the optimizing algorithm of IBM’s QISKit, the
quantum gates consumed by our scheme is 74.7%, and the execution
time is only 12.9% on average.
1 Introduction
Quantum computing has attracted increasing attention because of its tremen-
dous computing power [1–3] in recent years. There are more and more
∗
Corresponding author: [email protected]
1
companies and scientific research institutions who devote themselves to de-
veloping quantum chips with more qubits and higher fidelity. While most
theoretical studies assume that interactions between arbitrary pairs of qubits
are available, almost all these realistic chips have certain constraints on qubit
connectivity [4,5]. For example, IBM’s 5-qubit superconducting chips Tenerife
and Yorktown [6] adopt neighboring connectivity ( illustrated in Fig.1 (a) and
1 (b), respectively). [7] uses a 4-qubit superconducting chip, in which four
qubits are not directly connected, but are connected by a central resonator.
That is, the layout of this chip is central, as shown in Fig.1 (c). In addition,
CAS-Alibaba Quantum Laboratory’s 11-qubit superconducting chip [8] and
Tsinghua University’s 4-qubit NMR chip [9] both reduce the fully connectivity
to the linear nearest-neighbor connectivity, as shown in Fig.1 (d). Distinctly,
this non-fully connected connection sets additional barriers for implementing
quantum algorithms and programming quantum programs.
1 1
0 2 3 0 2 3
0 1
0 CR 2
4 4
1 n ... 2
(a) Tenerife (b) Yorktown (c) Central layout (d) Linear layout
2
number of qubits. In 2017, IBM developed a quantum information science kit,
namely QISKit [13], which contains an algorithm that can adjust and optimize
quantum programs according to any layout. Recently, in order to find more
efficient solutions, IBM organized the QISKit Developer Challenge [14]. As
for the optimization of quantum circuits, in order to simulate more qubits on
classical computers, E. Pednault et al. proposed a method, namely slice [15],
to split the original quantum circuit into multiple subcircuits. In this way,
they simulate a random quantum circuit with depth 27 in a 2D lattice of
7 × 7 qubits and a circuit with depth 23 in a 2D lattice of 8 × 7 qubits on the
IBM Blue Gene/Q supercomputer, which improved the number of entangled
qubits that classical computers can simulate. However, the slice approach is
focused on the simulation of more entangled qubits, so it do not take into
account the physical layout, and is only applicable to programs with short
circuit depth.
In this paper, we propose a general enough quantum circuit optimizing
scheme which can efficiently adjust and optimize any quantum circuit ac-
cording to any layout. The remainder of this paper is organized as follows:
Section 2 briefly introduce the necessary conceptions. In Section 3, the design
concept of our optimizing scheme is presented in detail. We next compare
the cost and efficiency of our scheme with QISKit’s optimizing method in
Section 4. The conclusion and future research can be found in Section 5.
2 Preliminaries
2.1 QISKit
QISKit is a quantum information science kit developed by IBM, which
takes the quantum programs written by Open-QASM [16] as the input. It
adjusts and optimizes the input programs according to the given layout, and
then executed the programs by its built-in QASM-simulator or cloud-based
quantum chips.
Open-QASM is a variant of QASM [17], which is designed to control a
physical system with a parameterized gate set. Specifically, Open-QASM
3
takes {u1, u2, u3, CN OT } as the basic quantum gates set, where
1 0
u1(λ) = ,
0 eλi
1 1 −eλi
u2(φ, λ) = √ φi , (1)
2 e e(λi+φi)
cos 2θ −eλi sin 2θ
u3(θ, φ, λ) = .
eφi sin 2θ e(λi+φi) cos 2θ
Obviously, {u1, u2, u3, CN OT } actually has an infinite number of single-qubit
gates and it is universal [18]. For comparison with QISKit, our optimizing
scheme also takes it as the basic set of quantum gates.
4 4
0 3 0 3
1 2 1 2
4
another way to accomplish the same task, such as the circuit shown in Fig.3.
1 1
However, the additional overhead of this solution is costly, especially for
sparse physical layouts. Specifically,
1
cost = 2m × costSWAP , (3)
q0 H H
where m stands for the number
1
of intermediate nodes on the shortest path 1
q • •
between the control-qubit
4 H •
H and the target-qubit, costSWAP stands for 3 CNOT
.
gates and 4 H gates. 1
.
q0 × H • H ×
q0 H H
q1 H H
q4 • H • H •
q4 × × .
.
(a) An implementation of cnot(q1 , q4 ) (b) SWAP(q0 , q4 )
q0 × H • H ×
Figure 3: An equivalent circuit of cnot(q1 , q4q), where SWAP(q0 , q4 ) is
1 H H
implemented by (b).q
4 × ×
5
3.1 The global adjustment of qubits
The global adjustment of qubits means that before the execution of quantum
programs, we first compare the connected relation of quantum programs with
the given layout, and directly exchange the qubits. The greatest advantage of
this step is that no additional quantum gates need to be consumed. Therefore,
the number of additional quantum gates consumed will be minimum if all
illegal CNOT gates can be handled in this step. For simplicity, we assume that
any edge in the given layout is bidirectional in this step and Local adjustment,
that is, the Obstacle-1 is ignored in the two steps.
Specifically, this step can be described as Algorithm 1. In Algorithm 1, we
extract all CNOT gates from the quantum program separately and traverse
them from front to back. Once encountering an illegal CNOT gate, we try
to find an available qubits’ mapping to adjust the whole Open-QASM code
without converting the traversed CNOT gates illegal. At each adjustment,
we have (dcq × dtq − t) available mappings to choose, where t stands for the
number of mappings which make some traversed CNOT gates illegal, dcq and
dtq stand for the number of adjacent qubits of control-qubit and target-qubit
in the given layout, respectively. The traversal terminates when there is no
illegal CNOT gate or (dcq × dtq − t) = 0.
Suppose that there are M possible mappings, where M is related to the
given layout and the connectivity of quantum programs. At this point, we need
to estimate the cost of solving Obstacle-2 in the program adjusted according to
these (M + 1) mappings (M mappings and one empty mapping) respectively.
Then take the smallest one as the global adjustment mapping. The reason
for estimation, rather than accurate calculation, and the estimation process
are explained in the next part. Finally, we adjust the qubits of the original
Open-QASM code according to the global mapping. As for the classical
register, which stores the results of the measurement, does not need to be
modified. For example, cnot(q1 , q4 ) is illegal in Fig.2 and it can be adjusted
by the global mapping {1 : 3, 3 : 1}, as shown in Fig.4.
6
Algorithm 1: Global Adjustment
Input: The set of CNOT in QP, C; the set of legal CNOT, A; the
record of all possible costs, costs; the record of all possible
mappings, maps; the current mapping, amap;
Output: The mapping of qubits’ ID, map
1 GlobalAdjust(costs, maps, amap)
2 costs ←[ ],maps ←[ ] and amap ←[ ];
3 Adjust(C, A, amap, costs, maps);
4 i ←getIndexofMinValue(costs);
5 return maps[i];
6
7 Adjust(C, A, amap, costs, maps)
8 alternativeM ap ← [ ];
9 for CNOT c in C do
10 if c not in A then
11 cq ← c[0] and tq ← c[1];
12 cqAdj ← getAdjacentQubit(cq) and tqAdj ←
getAdjacentQubit(tq);
13 tM aps ← {cq : tqAdj, tq : cqAdj};
14 for map m in tM aps do
15 tempC ← C;
16 change qubits’ ID in tempC according to m;
17 if no illegal CNOT in tempC then
18 add m to alternativeM ap;
19 break;
20 if alternativeM ap == [ ] then
21 cost ← estimateCost();
22 add cost to costs and add amap to maps;
23 for map am in alternativeM ap do
24 tempC ← C and add am to amap;
25 change qubits’ ID in tempC according to am;
26 if no illegal CNOT in tempC then
27 add amap to maps and add 0 to costs;
28 else
29 Adjust(C, A, amap, costs, maps);
7
q1 H • T • c1
q2 Z • c2
q3 Z X c3
1 1
q4 Y H c4
1 .
.
q1 H • T • c1 q3 H • T • c1
q2 Z • c2 q2 Z • c2
q3 Z X c3 q1 Z X c3
q4 Y H c4 q4 Y H c4
.
.
(a) Before adjusting (b) After adjusting
q3 H • • c1
Figure 4: Adjust Tthe circuit according to {1 : 3, 3 : 1} and (b) can be
q2 Z • executed
c2 on Fig.2 (a)
q1 Z X c3
8
where mi stands for the number of intermediate qubits between the control-
qubit and the target-qubit of the ith illegal CNOT, and costSWAP stands for
3 CNOT gates and 4 H gates. Among the various estimation formulas we
tried, the result obtained by Equation (4) is optimal. The reason for adding
the correction factor ( n−i
n
)2 in Equation (4) is that the later the CNOT gate
is executed, the easier it is influenced by the previous adjustments. That is,
estimation is not reliable for the later CNOT gates. Multiplying the factor,
which will continue to decrease as the estimation progress, with the estimation
results can have a certain correction effect.
For improving the accuracy of estimation, we accurately calculate the top
4 layers of the binary tree, and estimate the cost of the subsequent gates of the
24 cases respectively, where 4 is the optimal value determined after repeated
trials. Then add the estimated result and the calculated result together and
choose the smallest one among the 16 cases as our choice.
Specifically, we traverse the Open-QASM code. Whenever encountering an
illegal CNOT, we call Algorithm 2 to adjust it and then update the subsequent
code and the classical register until the traversal terminates. It can be seen
from Algorithm 2 that the mapping generated by Adjust function only affects
the subsequent code of illC and that is why we call this step Local adjustment.
At this point, there is no Obstacle-2 in quantum programs. Then we
traverse the new Open-QASM code again to handle Obstacle-1 by Equation
(2).
9
Algorithm 2: Local Adjustment
Input: The Open-QASM code of the quantum program, qasm; the
first illegal CNOT, illC; the rest CNOTs after illC in qasm,
Cs; the record of all possible costs, costs; the cost in the
current case, cost; the record of all possible mappings, maps;
the current mapping, amap; the depth of recursion, d
Output: The adjusted Open-QASM code, qasm
1 LocalAdjust(qasm, illC, Cs)
2 cost ← 0, costs ←[ ], amap ←[ ], d ← 1 and maps ←[ ];
3 Adjust(illC, Cs, cost, costs, amap, maps, d);
4 i ←getIndexofMinValue(costs);
5 add SWAP gates to qasm according to maps[i];
6 change qubits’ID in qasm according to maps[i];
7 return qasm;
8
9 Adjust(illC, Cs, cost, costs, map, maps, d)
10 interQs ← getIntermediateNode(illC[0], illC[1]);
11 cost ← cost + 34×interQs.length;
12 for qubit q in illC do
13 tc ← cost;
14 if q is control-qubit then
15 tc ← tc + 4;
16 tM ap ← constructMapBetweenQ(interQs,q);
17 change qubits’ ID in cnots according to tM ap;
18 nIllC ← getFirstIllegalCnot(cnots);
19 restC ← getAllCnotAfterNewIllC(cnots);
20 if map != [ ] then
21 tM ap ← map;
22 if nIllC == None then
23 add tc to costs and tM ap to maps;
24 else if d == 4 then
25 tc ← tc + estimateCost();
26 add tc to costs and add tM ap to maps;
27 else
28 Adjust(nIllC, restC, tc, costs, tM ap, maps, d + 1);
10
1 1
q0 H • T H •
q1 Z •
1 1
q2 Z X
.
1
.
q0 H • T H • q0 H • H ×T •
q1 Z • q1 Z •
q2 Z X q2 X ×Z
.
.
(a) Before merging (b) After merging
q0 H • H ×T •
Figure
q1
5: The
•
change of a quantum random circuit before and after merging
Z
single-qubit gates.
q2 X ×Z
u1(λ) = Rz (λ),
π π
u2(φ, λ) = u3( , φ, λ) = Rz (φ) × Ry ( ) × Rz (λ), (5)
2 2
u3(θ, φ, λ) = Rz (φ) × Ry (θ) × Rz (λ);
For the first five cases, we can directly merge them by Rz (λ) × Rz (φ) =
Rz (λ + φ) [18]. As for the last four cases, we have:
The key of this kind of merging lies in how to transform the Y-Z decompo-
sition of a quantum gate to the Z-Y decomposition. And we use QISKit’s
merge method proposed in [20] to solve this problem. So far, we complete
the adjustment and optimization of the original quantum program according
to any given layout.
4 Numerical Results
In this section, we take QISKit’s optimizing method as the benchmark to
evaluate the performance of our optimizing scheme in different scales of
11
quantum programs and different layouts of quantum chips. In addition, we
use the method proposed in the QISKit Developer Challenge to count the
cost of gates:
cost = n2 × 10 + n1 × 1, (7)
where n2 and n1 stand for the number of CNOT gates and single-qubit gates
in optimized quantum circuit, respectively.
4.1 Platform
Hardware Platform
All the experiments in this paper are executed on a PC with an Intel Core
i7 processor and 8GB of RAM. Furthermore, we have no special hardware
acceleration, such as a GPU.
Software Platform
In order to verify the correctness of our scheme, we use the QASM-
simulator to execute the optimized circuits. In addition, we also use a special
method to generate random quantum circuits, which first generates random
circuits whose quantum gates belong to SU (4) [21], and then decomposes
these gates into gates belonged to {SU (2), CN OT } [22]. The advantage of
this method is that we can fully test different connections between qubits
and the fairness of comparison between our optimizing scheme and QISKit
(version=0.4.11) can be guaranteed. The detailed execution flow of our exper-
iments is shown in Fig.6. It should be noticed that for accurate description,
qasm-simulator
the circuit depth mentioned in the following is still SU (4) circuit’s depth, and
the actual depth is about 7 times of it.
12
4.2 Results
As we all know, the number of qubits and the circuit depth are important
indicators for the scale of quantum programs. Therefore, the experiments are
designed as follow: for the 14 cases of qubits number from 3 to 16, we generate
10 different random quantum circuits respectively for 16 cases with circuit
depth from 1 to 16 respectively. That means, in total, 14 × 16 × 10 = 2240
circuits are generated. Then we chose four common connected graphs (linear,
central, neighboring and circular) and use our optimizing scheme and QISKit’s
algorithm to adjust and optimize these 2240 random circuits according to
these layouts, respectively. That is, each algorithm handles 8960 (2240 × 4)
quantum circuits. Finally, the optimized quantum programs are executed
by QASM-simulator. If the result of our scheme is consistent with QISKit’s
result, we count the cost and the execution time of each circuit.
All quantum circuits, layouts and the source code of our scheme can be
found in Github1 .
Comparison with QISKit’s optimizing method
Table 1 shows the quantum gates consumption of the 2240 original random
quantum circuits, and the average cost of gates and compiler time required to
adjust and optimize these 2240 circuits by our scheme and QISKit. Obviously,
the quantum gates consumed by our scheme is 74.7% of QISKit, and the
execution time is only 12.9%.
Specifically, the performance of our scheme varies for different scales of
quantum circuits. Fig.7 (a) and Fig.7 (b) illustrate the ratio of QISKit and
our scheme about the cost of quantum gates and efficiency with various qubits
1
https://github.com/zhangxin20121923/QISKit_Deve_Challenge
13
2.68 21.13
cost of gates
2.27 17.11
efficiency
1.87 13.09
1.46 9.07
1.06
5.05
42 42
16 14 86h 16 14 86h
numb12e 10 8 12 dept
10
numb12e 10 8 12 dept
10
ro
f qubits6 4 1614 r of qu 6
bits 4 16 14
q and circuit depths d, respectively. The two formulas are shown as follows:
qc(n,d) qt(n,d)
cost(n,d) = , efficiency(n,d) = , (8)
c(n,d) t(n,d)
where qc and qt stand for the gate cost and execution time of QISKit’s
algorithm, and c and t indicate those of our method. Fig.7 shows that in all
cases we executed, our algorithm can use fewer quantum gates to adjust and
optimize the original circuits in less time. In the worst case (more qubits and
more circuit depth), we can use 6% less gates and the efficiency is about 5
times; in optimal case (more qubits and less circuit depth), we can use 63%
less gates and the efficiency is about 20 times.
Obviously, the results are consistent with the theory: when the number of
qubits is large and the circuit depth is small, since we recursively calculate 4
layers of the solution space tree, the choice is more reliable and the performance
is better; when the number of qubits is small, the layout tends to be fully
connected and our scheme does not have advantages; and when the circuit
depth is large, we will be easily trapped into the local optimum and the
performance of our scheme is worse than that of the small depth.
Performance in different physical layouts
For the four layouts we have chosen, there are also significant differences
in costs of quantum gates and execution time. In order to deal with different
scales of circuits in a fair manner and avoid the statistical result being
dominated by large-scale circuits, we no longer directly sum up the gate costs
14
in different cases (as used in Table 1). Specifically, the statistical method is
as follows:
2240 2240
1 X ci 1 X qti
costl,c = [ ( )], efficiencyl = [ ( )]. (9)
2240 i=1 oi 2240 i=1 oti
where l ∈ {Linear, Circle, Center, N eighbour}, c ∈ {oc, qc}, oi , qci and oci
stand for the gate cost of the ith original circuit, the ith circuit adjusted by
QISKit and our scheme respectively, qti and oti stand for the time required
to compile the ith circuit by QISKit and our scheme respectively.
Fig.8 (a) shows that for the central layout, our scheme requires 1.80 times
the gate consumption of the original circuit, and the optimizing method of
QISKit requires 3.68 times; for the linear layout, the gate cost of our scheme
is 2.28 times as many as the original cost and the cost of QISKit is about
2.86 times; as for the circle and neighbour layouts, our scheme need to use
1.77 times and 1.60 times the gate cost respectively, while QISKit’s method
need 2.05 times and 2.01 times. And Fig.8 (b) illustrates that for the linear,
circle and neighbour layouts, our scheme is about 4 times faster than QISKit;
as for the central layout, the efficiency of our schemes is about 17.3 times as
fast as QISKit’s method.
Efficiency
2.0 10.0
1.5 7.5
1.0 5.0
0.5 2.5
0.0 0.0
Center Linear Circle Neighbour Center Linear Circle Neighbour
15
5 Conclusions and Future Research
Considering the cost of physical implement, layouts of most existing quantum
chips are not fully connected, which sets additional barriers for implementing
quantum algorithms and programming quantum programs. Therefore, a
better approach is to automate the task of adjusting and optimizing quantum
programs according to any given layout by the compiler of quantum computer.
We propose a general optimizing scheme to accomplish the task by adding
additional logic gates, exchanging qubits in the quantum register and merging
single-qubit gates. Compared with QISKit’s optimizing method, the quantum
gates consumed by our scheme is 74.7% and the execution time is only 12.9%
overall. For circuits with more qubits and less circuit depth, this advantage is
more obvious. In addition, several common connected graphs (linear, central,
neighboring and circular) are compared as well. In these four cases, our
scheme has advantages. Especially for the central layout, we can use only 49%
gates and 5.8% execution time of QISKit’s optimizing algorithm to adjust
and optimize the original quantum circuits.
Future Research
In our scheme, we often use the idea of greedy algorithm to make a choice
when the circuit depth of the quantum program is deep. But the experimental
results in section 4 show that we made wrong choices in some cases, and got
trapped in the local optimal solution. If we can find more equitable selection
criteria or even calculate the global optimal solution, we will further reduce
the consumption of additional logic gates.
In addition, a high precision floating-point calculation is needed in the
combination of single-qubit logic gates, which takes up about 70% of the
total compile time. Whether we can find more efficient merging methods is a
problem worth of consideration. In order to further evaluate different physical
layouts, we also plan to discuss with the R&D teams of actual quantum chips
to combine the actual overhead needed to design different layouts and the
expense of the software level.
References
[1] Daniel R Simon. On the power of quantum computation. SIAM journal
on computing, 26(5):1474–1483, 1997.
16
[2] Peter W Shor. Polynomial-time algorithms for prime factorization and
discrete logarithms on a quantum computer. SIAM review, 41(2):303–332,
1999.
[3] Lov K Grover. A fast quantum mechanical algorithm for database search.
In Proceedings of the twenty-eighth annual ACM symposium on Theory
of computing, pages 212–219. ACM, 1996.
[4] Donny Cheung, Dmitri Maslov, and Simone Severini. Translation tech-
niques between quantum circuit architectures.
[5] Norbert M Linke, Dmitri Maslov, Martin Roetteler, Shantanu Debnath,
Caroline Figgatt, Kevin A Landsman, Kenneth Wright, and Christopher
Monroe. Experimental comparison of two quantum computing architec-
tures. Proceedings of the National Academy of Sciences, page 201618020,
2017.
[6] The backend information of ibm quantum cloud. https://github.com/
QISKit/qiskit-backend-information/.
[7] YP Zhong, D Xu, P Wang, C Song, QJ Guo, WX Liu, K Xu, BX Xia,
C-Y Lu, Siyuan Han, et al. Emulating anyonic fractional statistical
behavior in a superconducting quantum circuit. Physical review letters,
117(11):110501, 2016.
[8] The url of alibaba’s quantum cloud platform. http://quantumcomputer.
ac.cn/index.html.
[9] Tao Xin, Shilin Huang, Sirui Lu, Keren Li, Zhihuang Luo, Zhangqi Yin,
Jun Li, Dawei Lu, Guilu Long, and Bei Zeng. Nmrcloudq: a quantum
cloud experience on a nuclear magnetic resonance quantum computer.
Science Bulletin, 2017.
[10] H Dieter Zeh. On the interpretation of measurement in quantum theory.
Foundations of Physics, 1(1):69–76, 1970.
[11] David P DiVincenzo et al. The physical implementation of quantum
computation. arXiv preprint quant-ph/0002077, 2000.
[12] A Chi-Chih Yao. Quantum circuit complexity. In Foundations of Com-
puter Science, 1993. Proceedings., 34th Annual Symposium on, pages
352–361. IEEE, 1993.
17
[13] Qiskit python api. https://qiskit.org/.
[16] Andrew W Cross, Lev S Bishop, John A Smolin, and Jay M Gambetta.
Open quantum assembly language. arXiv preprint arXiv:1707.03429,
2017.
[17] Krysta M Svore, Alfred V Aho, Andrew W Cross, Isaac Chuang, and
Igor L Markov. A layered software architecture for quantum computing
design tools. Computer, 39(1):74–83, 2006.
[19] Michael A Nielsen and Isaac Chuang. Quantum computation and quan-
tum information, 2002.
[21] Francesco Iachello. Lie algebras and applications, volume 708. Springer,
2006.
[22] Farrokh Vatan and Colin Williams. Optimal quantum circuits for general
two-qubit gates. Physical Review A, 69(3):032315, 2004.
18