Quantum Circuit Architecture Search For Variational Quantum Algorithms
Quantum Circuit Architecture Search For Variational Quantum Algorithms
com/npjqi
ARTICLE OPEN
Variational quantum algorithms (VQAs) are expected to be a path to quantum advantages on noisy intermediate-scale quantum
devices. However, both empirical and theoretical results exhibit that the deployed ansatz heavily affects the performance of
VQAs such that an ansatz with a larger number of quantum gates enables a stronger expressivity, while the accumulated noise
may render a poor trainability. To maximally improve the robustness and trainability of VQAs, here we devise a resource and
runtime efficient scheme termed quantum architecture search (QAS). In particular, given a learning task, QAS automatically seeks
a near-optimal ansatz (i.e., circuit architecture) to balance benefits and side-effects brought by adding more noisy quantum gates
to achieve a good performance. We implement QAS on both the numerical simulator and real quantum hardware, via the IBM
cloud, to accomplish data classification and quantum chemistry tasks. In the problems studied, numerical and experimental
results show that QAS cannot only alleviate the influence of quantum noise and barren plateaus but also outperforms VQAs with
pre-selected ansatze.
npj Quantum Information (2022)8:62 ; https://doi.org/10.1038/s41534-022-00570-y
1234567890():,;
INTRODUCTION VQAs with the fixed ansatz, this approach cannot only maintain a
The variational quantum learning algorithms (VQAs)1,2, including shallow depth to suppress noise and trainability issues, but also
quantum neural network3–5 and variational quantum eigen- keep sufficient expressibility to contain the solution. Current
solvers (VQEs)6–9, are a class of promising candidates to use literature generally adopts brute-force strategies to design such a
noisy intermediate-scale quantum (NISQ) devices to solve variable ansatz31–33. This implies that the required computational
practical tasks that are beyond the reach of classical computers10. overhead is considerable, since the candidates of possible ansatze
Recently, the effectiveness of VQAs toward small-scale learning scale exponentially with respect to the qubits count and the
problems such as low-dimensional synthetic data classification, circuit depth. How to efficiently seek a near-optimal ansatz
image generation, and energy estimation for small molecules has remains largely unknown.
been validated by experimental studies11–14. Despite the promis- In this study, we devise a quantum architecture search scheme
ing achievements, the performance of VQAs will degrade (QAS) to effectively generate variable structure ansatze, which
significantly when the qubit number and circuit depth become considerably improves the learning performance of VQAs. The
large, caused by the tradeoff between the expressivity and advantage of QAS is ensured by unifying the noise inhibition and
trainability15. More precisely, under the NISQ setting, involving the enhancement of trainability for VQAs as a learning problem. In
more quantum resources (e.g., quantum gates) to implement the doing so, QAS does not request any ancillary quantum resource
ansatz results in both a positive and negative aftermath. On the and its runtime is almost the same as conventional VQA-based
one hand, the expressivity of the ansatz, which determines algorithms. Moreover, QAS is compatible with all quantum
whether the target concept will be covered by the represented platforms, e.g., optical, trapped-ion, and superconducting quan-
hypothesis space, will be strengthened by increasing the number tum machines, since it can actively adapt to physical restrictions
of trainable gates16–19. On the other hand, a deep circuit depth and weighted noise of varied quantum gates. In addition, QAS can
implies that the gradient information received by the classical seamlessly integrate with other quantum error mitigation
optimizer is full of noise and the valid information is exponentially methods25–27 and solutions for resolving barren plateaus21,34–36.
vanished, which may lead to divergent optimization or barren Celebrated by the universality and efficacy, QAS contributes to a
plateaus20–24. With this regard, it is of great importance to design broad class of VQAs on various quantum machines.
an efficient approach to dynamically control the expressivity and
trainability of VQAs to attain good performance.
Initial studies have developed two leading strategies to address
the above issue. The first one is quantum error mitigation RESULTS
techniques. Representative methods to suppress the noise effect The mechanism of VQAs
on NISQ machines are quasi-probability25,26, extrapolation27, Before moving on to present QAS, we first recap the mechanism
quantum subspace expansion28, and data-driven methods29,30. of VQAs. Given an input Z and an objective function L, VQA
In parallel to quantum error mitigation, another way is construct- employs a gradient-based classical optimizer that continuously
ing ansatz with a variable structure. Compared with traditional updates parameters in an ansatz (i.e., a parameterized quantum
1
JD Explore Academy, Beijing 101111, China. 2School of Computer Science, Faculty of Engineering, The University of Sydney, Sydney, NSW 2008, Australia. 3SenseTime
Research, Beijing 100080, China. 4Hon Hai Quantum Computing Research Center, Taipei 114, Taiwan. 5Centre for Quantum Software and Information, Faculty of Engineering and
Information Technology, University of Technology Sydney, Sydney, NSW 2007, Australia. 6Present address: SenseTime Research, Beijing 100080, China.
✉email: [email protected]; [email protected]; [email protected]
Fig. 1 Paradigm of the quantum architecture search scheme (QAS). In Step 1, QAS sets up supernet A, which defines the ansatze pool S to
be searched and parameterizes each ansatz in S via the specified weight sharing strategy. All possible single-qubit gates are highlighted by
hexagons and two-qubit gates are highlighted by the brown rectangle. The unitary Ux refers to the data encoding layer. In Step 2, QAS
optimizes the trainable parameters for all candidate ansatzes. Given the specified learning task L, QAS iteratively samples an ansatz aðtÞ 2 S
from A and optimizes its trainable parameters to minimize L. A correlates parameters among different ansatzes via weight sharing strategy.
After T iterations, QAS moves to Step 3 and exploits the trained parameters θ(T) and the predefined L to compare the performance among K
ansatze. The ansatz with the best performance is selected as the output, indicated by a red smiley face. Last, in Step 4, QAS utilizes the
searched ansatz and the parameters θ(T) to retrain the quantum solver with few iterations.
circuit) U(θ) to find the optimal θ*, i.e., Taking into account of the circuit architecture information and the
related noise, the objective of VQAs can be rewritten as
θ ¼ arg min Lðθ; ZÞ; (1)
θ2C
ðθ ; a Þ ¼ arg min Lðθ; a; Z; E a Þ: (3)
θ2C;a2S
where C R is a constraint set, and θ are adjustable parameters
d
of quantum gates16,18. For instance, when VQA is specified as an The learning problem formulated in Eq. (3) forces the optimizer to
eigen-solver6, Z refers to a Hamiltonian and the objection output the best quantum circuit architecture a* by assessing both
function could be chosen as L ¼ TrðZ jψðθÞihψðθÞjÞ, where the effect of noise and the trainability. Notably, Eq. (3) is
jψðθÞi is the quantum state generated by U(θ). For compatibility, intractable via the two-stage optimization strategy that is broadly
throughout the whole study, we focus on exploring how QAS used in previous literature31–33, i.e., individually optimizing all
enhances the trainability of one typical heuristic ansatz—hard- possible ansatze from scratch and then ranking them to obtain
ware-efficient ansatz11,13. Such an ansatz is supposed to obey a (θ*, a*). This is because the classical optimizer needs to store and
multi-layer layout, update O(dQNL) parameters, which forbids its applicability toward
YL large-scale problems in terms of N and L.
UðθÞ ¼ l¼1
Ul ðθÞ 2 SUð2N Þ; (2) The proposed QAS belongs to the one-stage optimization
strategy. Different from the two-state optimization strategy that
where Ul(θ) consists of a sequence of parameterized single-qubit
suffers from the computational bottleneck, this strategy ensures
and two-qubit quantum gates, and L denotes the layer number.
the efficiency of QAS. In particular, for the same number of
Note that the arrangement of quantum gates in Ul(θ) is flexible,
iterations T, the memory cost of QAS is at most T times more than
enabling VQAs to adequately use available quantum resources
that of conventional VQAs. Meanwhile, their runtime complexity
and to accord with any physical restriction. Remarkably, the
is identical. The protocol of QAS is shown in Fig. 1. Two key
achieved results can be effectively extended to other representa-
elements of QAS are supernet and weight sharing strategy. Both
tive ansatze.
of them contribute to locate a good estimation of (θ*, a*) within a
reasonable runtime and memory usage. Intuitively, weight
The scheme of quantum architecture search sharing strategy in QAS refers to correlating parameters among
Let us formalize the noise inhibition and trainability enhancement different ansatze. In this way, the parameter space, which
for VQAs as a learning task. Denote the set S as the ansatze pool amounts to the total number of trainable parameters required
that contains all possible ansatze (i.e., circuit architectures) to build to be optimized in Eq. (3), can be effectively reduced. As for
U(θ) in Eq. (2). The size of S is determined by the qubits count N, supernet, it plays two significant roles in QAS: (1) supernet serves
the maximum circuit depth L, and the number of allowed types of as the ansatz indicator, which defines the ansatze pool S (e.g.,
quantum gates Q, i.e., jSj ¼ OðQNL Þ. Throughout the whole study, determined by the maximum circuit depth and the choices of
when no confusion occurs, we denote a as the ath ansatz U(θ, a) quantum gates) to be searched and (2) supernet parameterizes
in S. Notably, the performance of VQAs heavily relies on the each ansatz in S via the specified weight sharing strategy. QAS
employed ansatz selected from S. Suppose the quantum system includes four steps, i.e., initialization (supernet setup), optimiza-
noise, induced by a, is modeled by the quantum channel E a . tion, ranking, and fine tuning. We now elucidate these four steps.
npj Quantum Information (2022) 62 Published in partnership with The University of New South Wales
Y. Du et al.
3
(1) Initialization: QAS employs a supernet A as an indicator for To relieve fierce competition among ansatze in S and further
the ansatze pool S. Concretely, the setup of the supernet A boost performance of QAS, we slightly modify the initialization and
amounts to leveraging the indexing technique to track S using a optimization steps of QAS. Specifically, instead of exploiting a single
linear memory cost. For instance, when N = 4, L = 1, and the supernet, QAS involves W supernets to optimize the objective
choices of the quantum gates are {RX, RY, RZ} with Q = 3, A indexes function in Eq. (3). The weight sharing strategy applied to W
RX, RY, RZ as “0”, “1”, “2”, respectively. With setting the range of a, b, supernets is independent of each other, where the parameters
c, d as {0, 1, 2}, the index list [“a”, “b”, “c”, “d”] tracks S, e.g., [“0”, “0”, corresponding to W supernets are separately initialized and
“0”, “0”] describes the ansatz 4i¼1 RX ðθi Þ and [“2”, “2”, “2”, “2”] updated. At the training and ranking stages, W supernets separately
describes the ansatz 4i¼1 RZ ðθi Þ. See Method for the construction utilize a weight sharing strategy to parameterize the sampled
of the ansatze pool S involving two-qubit gates. Meantime, as ansatz a(t) to obtain W values of Lðθðt;wÞ ; aðtÞ ; Z; E a Þ, where θ(t, w)
detailed below, A parameterizes all candidate ansatze via weight refers to the parameters corresponding to the wth supernet. Then,
sharing strategy to reduce parameter space. the parameters applied to the ansatz a(t) is categorized into the
(2) Optimization: QAS jointly optimizes {(a, θ)} in Eq. (3). Similar w 0 th supernet when w 0 ¼ arg min Lðθðt;wÞ ; aðtÞ ; Z; E a Þ.
w2½W
to conventional VQAs, QAS optimizes trainable parameters in an We last emphasize how QAS enhances the learning perfor-
iterative manner. At the tth iteration, QAS uniformly samples an mance of hardware-efficient ansatz U(θ) in Eq. (2). Recall that the
ansatz a(t) from S (i.e., an index list indicated by A). To minimize L central aim of QAS is to seek a good ansatz associated with
in Eq. (3), the parameters attached to the ansatz a(t) are updated to optimized parameters toQminimize Lðθ; a; Z; E a Þ in Eq. (3). In
θðtþ1Þ ¼ θðtÞ η∂LðθðtÞ ; aðtÞ ; Z; E aðtÞ Þ=∂θðtÞ , with η being the learn- other words, given U ¼ Ll¼1 Ul ðθÞ, a good ansatz is located by
ing rate. The total number of updating is set as T. Note that since dropping some unnecessary multi-qubit gates and substituting
the optimization of VQAs is NP-hard37, empirical studies generally single-qubit gates in Ul(θ) for ∀ l ∈ [L]. Following this routine,
restrict T to be less than O(poly(QNL)) to obtain an estimation several studies have proved that removing multi-qubit gates to
within a reasonable runtime cost. reduce the entanglement of the ansatz contributes to alleviate
To avoid the computational issue encountered by the two-stage barren plateaus39,40. In addition, a recent study41 unveiled that the
optimization method, QAS leverages the weight sharing strategy choice of the quantum circuit architecture can significantly affect
developed in deep neural architecture search38 to parameterize the expressive power of the ansatz and the learning performance.
ansatze in S via a specified correlation rule. Concretely, for any Since the objective function of QAS implicitly evaluates the effect
ansatz a0 2 S, if the layout of the single-qubit gates of the lth layer of different ansatze, our proposal can be employed as a powerful
between a0 and a(t) is identical with ∀ l ∈ [L], then A uses the tool to enhance the learning performance of VQAs. Refer to
training parameters θ(t) assigned to Ul(θ(t), a(t)) to parametrize Method for further explanation about the role of supernet,
Ul ðθ0 ; a0 Þ, regardless of variations in the layout of other layers. weight sharing, and analysis of the memory cost and runtime
We remark that the parameterization shown above is efficient, complexity of QAS.
which can be accomplished by comparing the generated index list
and the stored index lists. In addition, the above-correlated Simulation and experimental results
updating rule implies that the parameters of unsampled ansatze
are never stored in classical memory. To this end, even though the The proposed QAS is universal and facilitates a wide range of VQA-
size of the ansatze pool exponentially scales in terms of N and L, based learning tasks, e.g., machine learning42–45, quantum
QAS harnesses supernet and weight sharing strategy to guarantee chemistry6,14, and quantum information processing46,47. In the
its applicability toward large-scale problems. following, we separately apply QAS to accomplish a classification
(3) Ranking: after T iterations, QAS uniformly samples K ansatze task and a VQE task to confirm its capability toward the
from S (i.e., K index lists generated by A), ranks their performance, performance enhancement. All numerical simulations are imple-
and then assigns the ansatz with the best performance as the mented in Python in conjunction with the PennyLane and the
output to estimate a*. Mathematically, denoted K as the set Qiskit packages48,49. Specifically, PennyLane is the backbone to
collecting the sampled K ansatze, the output ansatz is implement QAS and Qiskit supports different types of noisy
models. We defer the explanation of basic terminologies in
arg min LðθðTÞ ; a; Z; E a Þ: (4) machine learning and quantum chemistry in Appendices B and C.
a2K Here we first apply QAS to achieve a binary classification task
In QAS, K is a hyper-parameter to balance the tradeoff the under both the noiseless and noisy scenarios. Denote D as the
efficiency and performance. To avoid the exponential runtime synthetic dataset, where its construction rule follows the proposal
complexity of QAS, the setting of K should polynomially scale with of the quantum kernel classifier11. The dataset D contains n =
N, L, and Q. Besides random sampling, other methods such as 300 samples. For each example {x(i), y(i)}, the feature dimension of
evolutionary algorithms can also be used to establish K with the input x(i) is 3 and the corresponding label y(i) ∈ {0, 1} is binary.
better performance. See Supplementary D for details. Examples of D are shown in Fig. 2. At the data preprocessing
(4) Fine tuning: QAS employs the trained parameters θ(T) to fine stage, we split the dataset D into the training set Dtr , validation
tune the output ansatz in Eq. (4). set Dva , and test set Dte with size ntr = 100, nva = 100, and nte =
We empirically observe fierce competition among different 100. The explicit form of the objective function is
ansatze in S when optimizing QAS (see Supplementary B for 1 X ntr 2
details). Namely, suppose S can be decomposed into two subsets L¼ ~y ðiÞ ðA; x ðiÞ ; θÞ y ðiÞ ; (5)
S good and S bad , where the subset S good (S bad ) collects ansatze in ntr i¼1
the sense that they all attain relatively good (bad) performance via
where fx ðiÞ ; y ðiÞ g 2 Dtr and ~y ðiÞ ðA; x ðiÞ ; θÞ 2 ½0; 1 is the output of
independently training. For instance, in the classification task, the quantum classifier (i.e., a function taking the input x(i), the
the ansatz in S good (S bad ) promises a classification accuracy above supernet A, and the trainable parameters θ).P The training
(below) 99%. However, when we apply QAS to accomplish the
same classification task, some ansatze in S bad may outperform
(validation and test) accuracy is measured by i 1gð~y ðiÞ Þ¼y ðiÞ =ntr
P P ðiÞ
certain ansatze in S good . This observation hints the hardness of ( i 1gð~yðiÞ Þ¼yðiÞ =nva and 1 ðiÞ
i gð~y Þ¼y ðiÞ =n te ) with gð~y Þ being the
optimizing correlated trainable parameters among all ansatze predicted label for x(i). We also apply the quantum kernel classifier
accurately, where the learning performance of a portion of ansatze proposed by11 to learn D and compare its performance with QAS,
in S good is no better than training them independently. where the implementation of such a quantum classifier is shown
Published in partnership with The University of New South Wales npj Quantum Information (2022) 62
Y. Du et al.
4
Fig. 2 Simulation results for the classification task. a The illustration of some examples in D with first two features. b The implementation of
the quantum kernel classifier for benchmarking. The quantum gates highlighted by dashed box refer to the encoding layer that transforms
the classical input x(i) into the quantum state. The quantum gates located in the solid box refer to Ul(θ) in Eq. (2) with L = 3. c The output ansatz
of QAS under the noisy setting. d The validation accuracy of QAS under the noiseless case. The label “Epc = a, W = b“ represents that the
number of epochs and supernets is T = a and W = b, respectively. The x-axis means that the validation accuracy of the sampled ansatz is in the
range of [c, d), e.g., c = 0.5, and d = 0.6. e The comparison of QAS between the noiseless and noisy cases. The hyper-parameters setting for
both cases is T = 400, K = 500, and W = 5. The labeling of x-axis is identical to subfigure (d). f The performance of the quantum kernel classifier
(labeled by “Test_acc_baseline”) and QAS (labeled by “Train/Test_acc”) at the fine tuning stage under the noisy setting.
in Fig. 2b. See Supplementary B for more discussion about the (almost conduct a random guess) under the noisy setting. The
construction of D and the employed quantum kernel classifier. degraded performance is caused by the large amount of
The hyper-parameters for QAS are as follows. The number of accumulated noise, where the classical optimizer fails to receive
supernets is W = 1 and W = 5, respectively. The circuit depth for the valid optimization information. By contrast, QAS can achieve
all supernets is set as L = 3. The search space of QAS is formed by good performance under the same noise setting. As shown in
two types of quantum gates. Specifically, at each layer Ul(θ), the Fig. 2e, with setting W = 5 and T = 400, the validation accuracy of
parameterized gates are fixed to be the rotational quantum gate 115 ansatze is above 90% under the noisy setting. The ansatz that
along Y-axis RY. For the two-qubit gates, denoted the index of attains the highest validation accuracy is shown in Fig. 2c.
three qubits as (0, 1, 2), QAS explores whether applying CNOT Notably, compared with the original quantum kernel classifier in
gates to the qubits pair (0, 1), (0, 2), (1, 2) or not. Hence, the size of Fig. 2b, the searched ansatz contains fewer CNOT gates. This
S equals to jSj ¼ 83 . The number of sampled ansatze for ranking implies that, under the noisy setting formulated above, QAS
is set as K = 500. The setting K jSj enables us to understand suppresses the noise effect and improves the training perfor-
how the number of supernets W, the number of epochs T, and the mance by adopting a few CNOT gates. When we retrain the
system noise affect the learning performance of different ansatze obtained ansatz with 10 epochs, both the train and test
in the ranking stage. accuracies achieve 100%, as shown in the upper plot of Fig. 2f.
Under the noiseless scenario, the performance of QAS with three These results indicate the feasibility to apply QAS to achieve noise
different settings is exhibited in Fig. 2d. In particular, QAS with W = inhibition and trainability enhancement.
1 and T = 10 attains the worst performance, where the validation We defer the omitted simulation results and the exploration of
accuracy for most ansatze concentrates on 50–60%, highlighted by fierce competition to Supplementary B. In particular, we assess the
the green bar. With increasing the number of epochs to T = 400 learning performance of the quantum classifier with the hardware-
and fixing W = 1, the performance is slightly improved, i.e., the efficient ansatz and the ansatz searched by QAS under the noise
number of ansatze that achieves validation accuracy above 90% is model extracted from the real quantum device, i.e., “Ibmq_lima”.
30, highlighted by the yellow bar. When W = 5 and T = 400, the The achieved simulation result indicates that the ansatz obtained
performance of QAS is dramatically enhanced, where the validation by QAS outperforms the conventional quantum classifier.
accuracy of 151 ansatze is above 90%. The comparison between the We next apply QAS to find the ground state energy of the
first two settings indicates the correctness of utilizing QAS to Hydrogen molecule13,50 under both the noiseless and noisy
accomplish VQA-based learning tasks in which QAS learns useful scenarios. The molecular hydrogen Hamiltonian is formulated as
feature information and achieves better performance with respect P
3 P
3
to the increased epoch number T. The varied performance of the Hh ¼ g þ gi Z i þ gi;k Z i Z k þ ga Y 0 X 1 X 2 Y 3
i¼0 i¼1;k¼1;i<k (6)
last two settings reflects the fierce competition phenomenon
among ansatze and validates the feasibility to adopt W > 1 to boost þgb Y 0 Y 1 X 2 X 3 þ gc X 0 X 1 Y 2 Y 3 þ gd X 0 Y 1 Y 2 X 3 ;
the performance of QAS. We retrain the output ansatz of QAS under where {Xi, Yi, Zi} denote the Pauli matrices acting on the ith qubit
the setting: W = 5 and T = 400, both the training and test and the real scalars g with or without subscripts are efficiently
accuracies converge to 100% within 15 epochs, which is identical computable functions of the hydrogen–hydrogen bond length (see
to the original quantum kernel classifier. Supplementary C for details about Hh and g). The ground state
The performance of the original quantum kernel classifier is energy calculation amounts to computing the lowest energy
evidently degraded when the depolarizing error for the single- eigenvalues of Hh, where the accurate value is Em = −1.136 Ha48. To
qubit and two-qubit gates is set as 0.05 and 0.2, respectively. tackle this task, the conventional VQE6 and its variants7–9 optimize
As shown in the lower plot of Fig. 2f, the training and test the trainable parameters in U(θ) to prepare the ground state jψ i ¼
accuracies of the original quantum kernel classifier drop to 50% Uðθ Þj0i4 of Hh, i.e., E m ¼ hψ jHh jψ i. The implementation
npj Quantum Information (2022) 62 Published in partnership with The University of New South Wales
Y. Du et al.
5
Fig. 3 Simulation results for the ground state energy estimation of Hydrogen. a The implementation of the conventional VQE. b The
output ansatz of QAS under the noisy setting. c The training performance of VQE under noisy and noiseless settings. The label “Exact” refers to
the accurate result Em. d The performance of the output ansatz of QAS under both the noisy and noiseless settings. e The performance of QAS
at the ranking state. The label “W = b” refers to the number of supernets, i.e., W = b. The x-axis means that the estimated energy of the
sampled ansatz is in the range of (c, d], e.g., c = −0.6 Ha, and d = −0.8 Ha.
of U(θ) is illustrated in Fig. 3a. Under the noiseless setting, the Note that more simulation results are deferred to Supplemen-
estimated energy of VQE fast converges to the target result Em tary. Specifically, in Supplementary C, we exhibit more results of
within 40 iterations, as shown in Fig. 3c. the above task. Furthermore, we implement VQE with the
The hyper-parameters of QAS to compute the lowest energy hardware-efficient ansatz and the ansatz searched by QAS on
eigenvalues of Hh are as follows. The number of supernets has two the real superconducting quantum hardware, i.e., “Ibmq_ourense”,
settings, i.e., W = 1 and W = 5, respectively. The layer number for to estimate the ground state energy of Hh. Due to the runtime
all ansatze is L = 3. The number of iterations and sampled ansatze issue, we complete the optimization and ranking using the
for ranking is T = 500 and K = 500, respectively. The search space classical backend and perform the final runs on the IBMQ cloud.
of QAS for the single-qubit gates is fixed to be the rotational The experimental result indicates that the ansatz obtained by QAS
quantum gates along Y and Z axis. For the two-qubit gates, outperforms the conventional VQE, where the estimated energy of
denoted by the index of four qubits as (0, 1, 2, 3), QAS explores the former is −0.96 Ha while the latter is −0.61 Ha. Then, in
whether applying CNOT gates to the qubits pair (0, 1), (1, 2), (2, 3) Supplementary D, we exhibit that utilizing the evolutionary
or not. Therefore, the total number of ansatze equals to algorithms to establish K can dramatically improve the perfor-
jSj ¼ 1283 . The performance of QAS with W = 5 is shown in mance of QAS. Subsequently, in Supplementary E, we provide
Fig. 3d. Through retraining the obtained ansatz of QAS with 50 numerical evidence that QAS can alleviate the influence of barren
iterations, the estimated energy converges to Em, which is the plateaus. Last, we present a variant of QAS to tackle large-scale
same as the conventional VQE. problems with the enhanced performance in Supplementary F.
The performance between the conventional VQE and QAS is
largely distinct when the noisy model described in the classifica-
tion task is deployed. Due to a large amount of gate noise, the DISCUSSION
estimated ground energy of the conventional VQE converges to In this study, we devise QAS to dynamically and automatically
−0.4 Ha, as shown in Fig. 3c. In contrast, the estimated ground design ansatz for VQAs. Both simulation and experimental results
energy of QAE with W = 1 and W = 5 achieves −0.93 and validate the effectiveness of QAS. Besides good performance, QAS
−1.05 Ha, respectively. Both of them are closer to the target only requests similar computational resources to conventional
result Em compared with the conventional VQE. Moreover, as VQAs with fixed ansatze and is compatible with all quantum
shown in Fig. 3e, a larger W implies a better performance of QAS, systems. Through incorporating QAS with other advanced error
since the estimated energy of most ansatze is below −0.6 Ha mitigation and trainability enhancement techniques, it is possible
when W = 5, while the estimated energy of 350 ansatze is above to seek more applications that can be realized on NISQ machines
0 Ha when W = 1. We illustrate the generated ansatz of QAS with with potential advantages.
W = 5 in Fig. 3b. In particular, to mitigate the effect of gate noise, There are many critical questions remaining in the study of QAS.
this generated ansatz does not contain any CNOT gate, which is Our future work includes the following several directions. First, we
applied to a very large noise level. Recall that a central challenge will explore better strategies to sample ansatz at each iteration.
in quantum computational chemistry is whether NISQ devices can For example, the reinforcement learning techniques, which are
outperform classical methods already available51. The achieved used to construct optimal sequences of unitaries to accomplish
results in QAS can provide good guidance to answer this issue. quantum simulation tasks52, may contribute to this goal. Next, we
Concretely, the searched ansatz in Fig. 3, which only produces the will design a more advanced strategy to shrink the parameter
separable states that can be efficiently simulated by classical space while not degrading the learning performance. Subse-
devices, suggests that VQE method may not outperform classical quently, to further boost the performance of QAS, we will leverage
methods when NISQ devices contain large gate noise. some prior information on the learning problem such as the
Published in partnership with The University of New South Wales npj Quantum Information (2022) 62
Y. Du et al.
6
Fig. 4 A visualization of weight sharing strategy. The upper left panel depicts the potential ansatze when N = 3, L = 2, and the choices of
quantum gates are {RX, RY} with Q = 2. The total number of ansatze is QNL = 64. The upper right panel illustrates how to use the indexing
technique to accomplish the weight sharing. The label “ai” refers to the ansatz a(i). Namely, for any two ansatze, if the indexes in the lth array
are identical (highlighted by the blue and brown regions), then their trainable parameters in the lth layer are the same. The two heatmaps
demonstrated in the lower panel visualize the trainable parameters of 64 ansatze. The label “θi” refers to the parameter assigned to the i-th
rotational quantum gate. Note that when the weight sharing strategy is applied, the trainable parameters are reused for different ansatze, as
indicated by the dashed circles.
symmetric property and some post-processing strategies that optimize (θ, a) to estimate (θ*, a*), where the updated parameters for one
remove redundant gates of the searched ansatz. In addition, we ansatz can also enhance the learning performance of other ansatze when
will delve into theoretically understanding the fierce competition. the correlation criteria are satisfied. As explained in Fig. 4, the weight
In the end, it is intriguing to explore applications of QAS beyond sharing strategy adopted in QAS squeezes the parameter space from
O(dQNL) to O(dLQN). Meantime, our simulation results indicate that the
VQAs such as optimal quantum control and the approximation of reduction of parameter space enables QAS to achieve good performance
the target unitary using the limited quantum gates. within a reasonable runtime complexity.
We remark that through adjusting the correlation criteria applied to the
weight sharing strategy, the parameter space can be further reduced. For
METHODS instance, when all parameters in an ansatz are correlated, the size of the
The classical analog of QAS parameter space reduces to O(1). With this regard, another feasible
The classical analog of the learning problem in Eq. (3) is the neural network correlation rule for QAS is unifying the single-qubit gates for all ansatze as
architecture search38. Recall that the success of deep learning is largely U3 = RZ(α)RY(β)RZ(γ). In other words, QAS only adjusts the arrangement of
attributed to novel neural architectures for specific learning tasks, e.g., the two-qubit gates to enhance the learning performance. From the practical
perspective, this setting is reasonable since the gate error introduced by
convolutional neural networks for image processing tasks53. However,
the single-qubit gates is much less than that of two-qubit gates.
deep neural networks designed by human experts are generally time-
consuming and error-prone38. To tackle this issue, the neural architecture
search approach, i.e., the process of automating architecture engineering, Supernet
has been widely explored, and achieved state-of-the-art performances in We next elucidate supernet used in QAS. As explained in the main text,
many learning tasks54–58. Despite having a similar aim, naively generalizing supernet has two important roles, which are constructing the ansatze pool
classical results to the quantum scenario to accomplish Eq. (3) is infeasible S and parameterizing each ansatz in S via the specified weight sharing
due to the distinct basic components: neurons versus quantum gates, strategy. In other words, supernet defines the search space, which
classical correlation versus entanglement, the barren plateau phenom- subsumes all candidate ansatze, and the candidate ansatze in S are
enon, the quantum noise effect, and physical hardware restrictions. These evaluated through inheriting weights from the supernet. Rather than
differences and extra limitations further intensify the difficulty of searching training numerous separate ansatze from scratch, QAS trains supernet just
the optimal quantum circuit architecture a*, compared with the classical once (Step 2 in Fig. 1), which significantly cuts down the search cost.
setting. In the following, we explain the omitted implementation details We next explain how QAS leverages the indexing technique to construct
of QAS. S when the available quantum gates include both single-qubit and two-
qubit gates. Following notation in the main text, suppose that N = 5, L = 1,
and the choices of single-qubit gates and two-qubit gates are {RY, RZ} and
Weight sharing strategy fCNOT; I4 g, respectively. In QAS, supernet A indexes fRY ; RZ ; CNOT; I4 g as
The role of the weight sharing strategy is to reduce the parameter space to {“0”, “1”, “T”, “F”}. Moreover, we suppose that the topology of the deployed
enhance the learning performance of QAS within a reasonable runtime quantum machine yields a chain structure, i.e., Q1 ↔ Q2 ↔ Q3 ↔ Q4 ↔ Q5.
and memory usage. Intuitively, this strategy correlates parameters among With setting a, b, c, d, e ∈ {“0”, “1”} and A, B, C, D ∈ {“T”, “F”}, the index list
different ansatze in S based on a specified rule. In this way, we can jointly [“a”, “b”, “c”, “d”, “e”, “A”, “B”, “C”, “D”] tracks all candidate ansatze in S, e.g.,
npj Quantum Information (2022) 62 Published in partnership with The University of New South Wales
Y. Du et al.
7
Q “0”, “0”, “0”, “0”, “T”, “T”, “T”, “T”] describes the ansatz
[“0”, 4. Farhi, E. & Neven, H. Classification with quantum neural networks on near term
ð 4i¼1 CNOT i;iþ1 Þð5i¼1 RY ðθi ÞÞ and [“1”, “1”, “1”, “1”, “1”, “F”, “F”, “F”, “F”] processors. Preprint at arXiv:1802.06002 (2018).
describes the ansatz 5i¼1 RZ ðθi Þ. 5. Schuld, M. & Killoran, N. Quantum machine learning in feature hilbert spaces.
Phys. Rev. Lett. 122, 040504 (2019).
6. Peruzzo, A. et al. A variational eigenvalue solver on a photonic quantum pro-
Memory cost and runtime complexity cessor. Nat. Commun. 5, 4213 (2014).
We first analyze the runtime complexity of QAS. In particular, at the first 7. Wang, D., Higgott, O. & Brierley, S. Accelerated variational quantum eigensolver.
step, the setup of supernet, i.e., configuring out the ansatze pool and the Phys. Rev. Lett. 122, 140504 (2019).
correlating rule, takes O(1) runtime. In the second step, QAS proceeds T 8. Stokes, J., Izaac, J., Killoran, N. & Carleo, G. Quantum natural gradient. Quantum 4,
iterations to optimize trainable parameters. The runtime cost of QAS at 269 (2020).
each iteration scales with O(d), where d refers to the number of trainable 9. Mitarai, K., Yan, T. & Fujii, K. Generalization of the output of a variational quantum
parameters in Eq. (1). Such cost origins from the calculation of gradients via eigensolver by parameter interpolation with a low-depth ansatz. Phys. Rev. Appl.
parameter shift rule, which is similar to the optimization of VQAs with a 11, 044087 (2019).
fixed ansatz. To this end, the total runtime cost of the second step is O(dT). 10. Preskill, J. Quantum computing in the NISQ era and beyond. Quantum 2, 79
In the ranking step, QAS samples K ansatze and compares their objective (2018).
values using the optimized parameters. This step takes at most O(K) 11. Havlícek, V. et al. Supervised learning with quantum-enhanced feature spaces.
runtime. In the last step, QAS fine tunes the parameters based on the Nature 567, 209 (2019).
searched ansatz with few iterations (i.e., a very small constant). The 12. Huang, H.-L. et al. Experimental quantum generative adversarial networks for
required runtime is identical to conventional VQAs, which satisfies O(d). image generation. Phys. Rev. Appl. 16, 024051 (2021).
The total runtime complexity of QAS is hence O(dT + K). 13. Kandala, A. et al. Hardware-efficient variational quantum eigensolver for small
We next analyze the memory cost of QAS. Specifically, the first step molecules and quantum magnets. Nature 549, 242–246 (2017).
requests O(QNL) memory to specify the ansatze pool via the indexing 14. Google AI Quantum and Collaborators. Hartree-Fock on a superconducting qubit
technique. Recall the memory cost in this step is dominated by configuring quantum computer. Science 369, 1084–1089 (2020).
the index space, which requests at most O(QNL) memory. This is because in 15. Holmes, Z., Sharma, K., Cerezo, M. & Coles, P. J. Connecting ansatz expressibility to
the worst case, the allowed Q choices of quantum gates for the varied gradient magnitudes and barren plateaus. PRX Quantum 3, 010313 (2022).
qubit at the varied layer are exactly different. To store information that 16. Benedetti, M., Lloyd, E., Sack, S. & Fiorentini, M. Parameterized quantum circuits as
describes choices of gates for different qubits at a different position, the machine learning models. Quantum Sci. Technol. 4, 043001 (2019).
memory cost scales with O(QNL). In the second step, QAS totally outputs T 17. Caro, M. C. et al. Generalization in quantum machine learning from few training
index lists corresponding to the architecture of T ansatze. This requires at data. Preprint at arXiv:2111.05292 (2021).
most O(TNL) memory cost. Moreover, QAS explicitly updates at most Td 18. Du, Y., Hsieh, M.-H., Liu, T. & Tao, D. Expressive power of parametrized quantum
parameters (we omit those parameters that are implicitly updated via circuits. Phys. Rev. Res. 2, 033125 (2020).
weight sharing strategy, since they do not consume the memory cost). To 19. Du, Y., Tu, Z., Yuan, X. & Tao, D. Efficient measure for the expressivity of variational
this end, the memory cost of the second step is O(TNL + Td). In the third quantum algorithms. Phys. Rev. Lett. 128, 080506 (2022).
step, QAS samples K index lists that describe the circuit architecture of K 20. Du, Y., Hsieh, M.-H., Liu, T., You, S. & Tao, D. Learnability of quantum neural
ansatze. This requires at most O(KNL) cost. Moreover, according to the networks. PRX Quantum 2, 040337 (2021).
weight sharing strategy, the memory cost of storing the corresponding 21. Cerezo, M., Sone, A., Volkoff, T., Cincio, L. & Coles, P. J. Cost function dependent
parameters is O(Kd). The memory cost of the last step is identical to the barren plateaus in shallow parametrized quantum circuits. Nat. Commun. 12,
conventional VQAs with a fixed ansatz, which is O(d). The total memory 1–12 (2021).
cost of QAS is hence O(Td + TNL + Kd). 22. McClean, J. R., Boixo, S., Smelyanskiy, V. N., Babbush, R. & Neven, H. Barren
To better understand how the computational complexity scales with plateaus in quantum neural network training landscapes. Nat. Commun. 9, 1–6
N, L, and Q, in the following, we set the total number of iterations in Step (2018).
2 and the number of sampled ansatze in Step 3 as T = O(QNL) and K = O 23. Sweke, R. et al. Stochastic gradient descent for hybrid quantum-classical opti-
(QNL), respectively. Note that since the size of S becomes indefinite, it is mization. Quantum 4, 314 (2020).
reasonable to set K as O(QNL) instead of a constant used in the 24. Wang, S. et al. Noise-induced barren plateaus in variational quantum algorithms.
numerical simulations. Under the above settings, we conclude that the Nat. Commun. 12, 1–11 (2021).
runtime complexity and the memory cost of QAS are O(dQNL) and O 25. Temme, K., Bravyi, S. & Gambetta, J. M. Error mitigation for short-depth quantum
(dQNL + QN2L2), respectively. circuits. Phys. Rev. Lett. 119, 180509 (2017).
We remark that when W supernets are involved, the required memory 26. Endo, S., Benjamin, S. C. & Li, Y. Practical quantum error mitigation for near-future
cost and runtime complexity of QAS linearly scales with respect to W. applications. Phys. Rev. X 8, 031027 (2018).
Moreover, employing adversarial bandit learning techniques59 can exactly 27. Li, Y. & Benjamin, S. C. Efficient variational quantum simulator incorporating
remove this overhead (see Supplementary A for details). active error minimization. Phys. Rev. X 7, 021050 (2017).
28. McClean, J. R., Kimchi-Schwartz, M. E., Carter, J. & De Jong, W. A. Hybrid quantum-
classical hierarchy for mitigation of decoherence and determination of excited
DATA AVAILABILITY states. Phys. Rev. A 95, 042308 (2017).
The datasets generated and/or analyzed during the current study are available from 29. Strikis, A., Qin, D., Chen, Y., Benjamin, S. C. & Li, Y. Learning-based quantum error
Y.D. on reasonable request. mitigation. PRX Quantum 2, 040330 (2021).
30. Czarnik, P., Arrasmith, A., Coles, P. J. & Cincio, L. Error mitigation with clifford
quantum-circuit data. Quantum 5, 592 (2021).
CODE AVAILABILITY 31. Chivilikhin, D. et al., Mog-vqe: multiobjective genetic variational quantum
eigensolver. Preprint at arXiv:2007.04424 (2020).
The source code of QAS to reproduce all numerical experiments is available on the
32. Li, L. et al. Quantum optimization with a novel gibbs objective function and
GitHub repository (https://github.com/yuxuan-du/Quantum_architecture_search/).
ansatz architecture search. Phys. Rev. Res. 2, 023074 (2020).
33. Ostaszewski, M., Grant, E. & Benedetti, M. Structure optimization for para-
Received: 31 December 2020; Accepted: 20 April 2022; meterized quantum circuits. Quantum 5, 391 (2021).
34. Grant, E., Wossnig, L., Ostaszewski, M. & Benedetti, M. An initialization strategy
for addressing barren plateaus in parametrized quantum circuits. Quantum 3,
214 (2019).
35. Skolik, A., McClean, J. R., Mohseni, M., van der Smagt, P. & Leib, M. Layerwise
REFERENCES learning for quantum neural networks. Quantum Mach. Intell. 3, 1–11 (2021).
1. Cerezo, M. et al. Variational quantum algorithms. Nat. Rev. Phys. 3, 625–644 36. Zhang, K., Hsieh, M.-H., Liu, L. & Tao, D. Toward trainability of deep quantum
(2021). neural networks. Preprint at arXiv:2112.15002 (2021).
2. Bharti, K. et al. Noisy intermediate-scale quantum algorithms. Rev. Mod. Phys. 94, 37. Bittel, L. & Kliesch, M. Training variational quantum algorithms is np-hard. Phys.
015004 (2022). Rev. Lett. 127, 120502 (2021).
3. Beer, K. et al. Training deep quantum neural networks. Nat. Commun. 11, 1–6 38. Elsken, T., Metzen, J. H. & Hutter, F. Neural architecture search: a survey. J. Mach.
(2020). Learn. Res. 20, 1–21 (2019).
Published in partnership with The University of New South Wales npj Quantum Information (2022) 62
Y. Du et al.
8
39. Marrero, C. O., Kieferová, M. & Wiebe, N. Entanglement-induced barren plateaus. 59. Bubeck, S. & Cesa-Bianchi, N. Regret analysis of stochastic and nonstochastic
PRX Quantum 2, 040316 (2021). multi-armed bandit problems. Mach. Learn. 5, 1–122 (2012).
40. Patti, T. L., Najafi, K., Gao, X. & Yelin, S. F. Entanglement devised barren plateau
mitigation. Phys. Rev. Res. 3, 033090 (2021).
41. Haug, T., Bharti, K. & Kim, M. S. Capacity and quantum geometry of parametrized AUTHOR CONTRIBUTIONS
quantum circuits. PRX Quantum 2, 040309 (2021). Y.D. and D.T. conceived this work. Y.D., S.Y., and M.-H.H. accomplished the theoretical
42. Huang, H.-Y. et al. Power of data in quantum machine learning. Nat. Commun. 12, analysis. Y.D. and T.H. conducted numerical simulations. All authors reviewed and
1–9 (2021). discussed the analysis and results, and contributed to writing the manuscript.
43. Du, Y., Hsieh, M.-H., Liu, T. & Tao, D. A grover-search based quantum learning
scheme for classification. N. J. Phys. 23, 023020 (2021).
44. Cong, I., Choi, S. & Lukin, M. D. Quantum convolutional neural networks. Nat. Phys. COMPETING INTERESTS
15, 1273–1278 (2019).
The authors declare no competing interests.
45. Wang, X., Du, Y., Luo, Y. & Tao, D. Towards understanding the power of quantum
kernels in the nisq era. Quantum 5, 531 (2021).
46. LaRose, R., Tikku, A., O’Neel-Judy, É., Cincio, L. & Coles, P. J. Variational quantum ADDITIONAL INFORMATION
state diagonalization. npj Quantum Inf. 5, 1–10 (2019).
Supplementary information The online version contains supplementary material
47. Yin, X.-F. et al. Efficient bipartite entanglement detection scheme with a quantum
available at https://doi.org/10.1038/s41534-022-00570-y.
adversarial solver. Phys. Rev. Lett. 128, 110501 (2022).
48. Bergholm, V. et al. Pennylane: automatic differentiation of hybrid quantum-
Correspondence and requests for materials should be addressed to Yuxuan Du,
classical computations. Preprint at arXiv:1811.04968 (2018).
Min-Hsiu Hsieh or Dacheng Tao.
49. Qiskit: an open-source framework for quantum computing (2019).
50. O’Malley, P. J. J. et al. Scalable quantum simulation of molecular energies. Phys.
Reprints and permission information is available at http://www.nature.com/
Rev. X 6, 031007 (2016).
reprints
51. McArdle, S., Endo, S., Aspuru-Guzik, A., Benjamin, S. C. & Yuan, X. Quantum
computational chemistry. Rev. Mod. Phys. 92, 015003 (2020).
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims
52. Yao, J., Lin, L. & Bukov, M. Reinforcement learning for many-body ground-state
in published maps and institutional affiliations.
preparation inspired by counterdiabatic driving. Phys. Rev. X 11, 031070
(2021).
53. Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning. (MIT Press, 2016).
54. Pham, H., Guan, M., Zoph, B., Le, Q. & Dean, J. Efficient neural architecture Open Access This article is licensed under a Creative Commons
search via parameters sharing. In Proceedings of Machine Learning Research. Attribution 4.0 International License, which permits use, sharing,
4095–4104 (2018). adaptation, distribution and reproduction in any medium or format, as long as you give
55. Huang, T. et al. Greedynasv2: greedier search with a greedy path filter. Preprint at appropriate credit to the original author(s) and the source, provide a link to the Creative
arXiv:2111.12609 (2021). Commons license, and indicate if changes were made. The images or other third party
56. Liu, C. et al. Progressive neural architecture search. In Proceedings of the European material in this article are included in the article’s Creative Commons license, unless
Conference on Computer Vision (ECCV). Springer, Cham, 19–34 (2018). indicated otherwise in a credit line to the material. If material is not included in the
57. You, S., Huang, T., Yang, M., Wang, F., Qian, C. & Zhang, C. Greedynas: towards fast article’s Creative Commons license and your intended use is not permitted by statutory
one-shot nas with greedy supernet. In Proceedings of the IEEE/CVF Conference on regulation or exceeds the permitted use, you will need to obtain permission directly
Computer Vision and Pattern Recognition. Computer Vision Foundation/IEEE from the copyright holder. To view a copy of this license, visit http://creativecommons.
1999–2008 (2020). org/licenses/by/4.0/.
58. Yang, Y., Li, H., You, S., Wang, F., Qian, C. & Lin, Z. Ista-nas: efficient and consistent
neural architecture search by sparse coding. Adv. Neural Inf. Process. Syst. 33,
10503–10513 (2020). © Crown 2022
npj Quantum Information (2022) 62 Published in partnership with The University of New South Wales