0% found this document useful (0 votes)
10 views9 pages

Lit Rev 1

Uploaded by

313 afsal ts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views9 pages

Lit Rev 1

Uploaded by

313 afsal ts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)

EG-NAS: Neural Architecture Search with Fast Evolutionary Exploration


Zicheng Cai1,2 , Lei Chen1 , Peng Liu2,3 , Tongtao Ling1 , Yutao Lai1
1
Guangdong University of Technology
2
Ping An Technology (Shenzhen) Co., Ltd.
3
The Hong Kong Polytechnic University
[email protected]

Abstract
NSGANet
Differentiable Architecture Search (DARTS) has achieved a 97.50
NSGANetV1
4 X Faster
rapid search for excellent architectures by optimizing archi- EPCNAS
97.45 EAEPSO
tecture parameters through gradient descent. However, this DARTS(1st)

Test Accuracy (%)


efficiency comes with a significant challenge: the risk of pre- DARTS(2st)
97.40 ProxylessNAS
mature convergence to local optima, resulting in subpar per- PC-DARTS
DARTS-
formance that falls short of expectations. To address this is- 𝛽-DARTS
97.35 DARTS+PT
sue, we propose a novel and effective method called Evo- EG-NAS
lutionary Gradient-Based Neural Architecture Search (EG-
NAS). Our approach combines the strengths of both gradi- 97.20
ent descent and evolutionary strategy, allowing for the ex-
ploration of various optimization directions during the archi-
tecture search process. To begin with, we continue to em- 1.0 0.3 0.1
ploy gradient descent for updating network parameters to en- Search times (GPU-Days)
sure efficiency. Subsequently, to mitigate the risk of prema-
ture convergence, we introduce an evolutionary strategy with Figure 1: Speed-performance comparison of our proposed
global search capabilities to optimize the architecture param- EG-NAS with various neural architecture search methods on
eters. By leveraging the best of both worlds, our method
strikes a balance between efficient exploration and exploita-
CIFAR-10.
tion of the search space. Moreover, we have redefined the
fitness function to not only consider accuracy but also ac-
count for individual similarity. This inclusion enhances the
diversity and accuracy of the optimized directions identified searching network architectures and effectively addressing
by the evolutionary strategy. Extensive experiments on var- this problem, still faces challenges in terms of expensive
ious datasets and search spaces demonstrate that EG-NAS computational power and long search times. For example,
achieves highly competitive performance at significantly low on the CIFAR-10 dataset, AmoebaNet-B (Real et al. 2018),
search costs compared to state-of-the-art methods. The code an evolutionary strategy (EA) based approach, requires 3150
is available at https://github.com/caicaicheng/EG-NAS. GPU days of search time, while NASNet-A (Zoph et al.
2018), a reinforcement learning (RL) based approach, re-
Introduction quires 1800 GPU days of search time. To alleviate the high
search costs in NAS, a gradient-based method called Dif-
Deep neural networks (DNNs), particularly convolution ferentiable Architecture Search (DARTS) (Liu, Simonyan,
neural networks (CNNs) such as Inception-v1 (Szegedy and Yang 2018) is proposed, which shares weights between
et al. 2015), ResNet (He et al. 2016), and other pioneer- architectures. Compared to costly search strategies such as
ing network architectures (Simonyan and Zisserman 2014; AmoebaNet-B and NASNet-A, DARTS can perform the ar-
Howard et al. 2017), as well as the modern architecture chitecture search task in 0.4 GPU-Days, offering a more ef-
ConvNeXt (Liu et al. 2022), have played a crucial role in ficient alternative. Furthermore, PC-DARTS achieves even
driving the advancement of computer vision and addressing faster results, completing the network architecture search in
various vision-related problems. However, the cost of hu- just 0.1 GPU-Days. However, these gradient-based meth-
man design of network architectures is increasing as neu- ods (Xiao et al. 2022a; Chen et al. 2019; Zela et al. 2019;
ral networks advance further, leading to high-cost limita- Wang et al. 2021b) still face different dilemmas, and these
tions that prevent researchers from practicing more creative dilemmas all ultimately point to one reason, i.e., premature
ideas and discouraging their enthusiasm. The proposed Neu- convergence to a local optimum. In traditional optimization
ral Architecture Search (NAS), as an automated method for problems, evolutionary strategy (Hansen and Ostermeier
Copyright © 2024, Association for the Advancement of Artificial 1996) is a well-responding measure due to its global search
Intelligence (www.aaai.org). All rights reserved. property, which makes it more capable of exploring differ-

11159
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)

ent search directions to prevent the dilemma of falling into the EA-based method involves defining the search space,
a local optimum. However, in the field of Neural Architec- initializing a population of network architectures, evaluat-
ture Search (NAS), despite the proposal of numerous Evolu- ing their fitness based on performance, selecting superior ar-
tionary Algorithms (EA) methods to reduce search time and chitectures, applying crossover and mutation, updating the
achieve some results (Lu et al. 2018; Huang et al. 2022), population, and repeating the process for iterations. How-
they have not met the expectations in effectively balancing ever, the challenge of balancing performance and search cost
the trade-off between search cost and performance. There- has been a persistent issue for EA-based methods (Cui et al.
fore, striking a balance between premature convergence to 2018; Xue et al. 2021). For example, (Liu et al. 2018) and
local optima and search cost has become a major challenge (Real et al. 2018) proposed the methods, Hierarchical evo-
for researchers. lution and Amoeba-Net, which achieved a high accuracy of
In this paper, we propose a simple and efficient compound 96.25% and 97.45% on the CIFAR-10 dataset, respectively.
search algorithm, called EG-NAS, to address the above is- However, the computation costs for their methods were sig-
sues via the evolution strategy with gradient descent for neu- nificantly high, with 300 and 3150 GPU-Days, respectively,
ral architecture search. We adopt an improved evolution- far exceeding the capabilities of most researchers. To reduce
ary strategy to explore more diverse search directions and the excessive search cost, experts, (Lu et al. 2018; Huang
tune the architectural parameters, rather than relying exclu- et al. 2022; Yuan et al. 2023) in EA have incorporated ef-
sively on gradient descent (GD) to update the architectural ficient EA algorithms such as the efficient Non-dominated
and network parameters. This approach allows us to lever- Sorting Genetic Algorithm, Particle Swarm Optimization
age the strengths of both gradient descent and evolutionary (PSO), and so on, into NAS tasks, achieving some notable
strategies, namely efficiency and global searchability. Fur- results. Despite the application of various efficient evolu-
thermore, we redesign the fitness function to emphasize in- tionary algorithms to improve the efficiency of NAS, an ex-
dividual similarity rather than solely focusing on individual cessive focus on efficiency sometimes hinders these algo-
performance to enhance the diversity of search directions. rithms from fully leveraging their strengths, thereby dramat-
When the current individual outperforms the old one, reduc- ically impacting the obtained architecture performance.
ing the similarity between individuals fosters a more diverse The high search cost barrier in NAS was first broken
evolution in the search direction. Conversely, increasing the by Differentiable architecture search (DARTS) proposed by
similarity between individuals allows timely adjustments to (Liu, Simonyan, and Yang 2018). DARTS proposed ap-
the evolutionary direction toward the previously promising plying the continuous relaxation method to transform dis-
search direction. By adopting this evolutionary strategy, our crete operations into continuously differentiable weights,
approach ensures that the output represents the results ob- enabling efficient handling of the bi-level optimization ob-
tained from exploring various search directions, alleviat- jectives in architecture search through gradient descent. To
ing the risk of being trapped in local optima. Moreover, it further optimize the memory overhead and improve the
guarantees that the new search directions possess desirable search speed, (Xu et al. 2019) proposed PC-DARTS, which
performance. Finally, we iteratively update the architecture randomly selects only some of the channels to serve the
parameters with the new search direction obtained by the computation in the search phase. Although memory over-
evolutionary strategy while optimizing the network param- head is alleviated, some practitioners, (Xie et al. 2018; Chen
eters using gradient descent. This combination allows our and Hsieh 2020; Wang et al. 2021a), have raised new ques-
approach to efficiently explore diverse architectures with ex- tions about DARTS and provided the corresponding solu-
cellent performance. tions. For example, (Hu et al. 2020b) proposed an angle-
To demonstrate the effectiveness of our proposed EG- based metric that simplifies the original search space by
NAS, we conducted extensive experiments on different eliminating unpromising candidates, thereby reducing the
datasets and search spaces, showing significant compet- challenges faced by existing NAS methods in searching for
itive results compared to other methods. On CIFAR-10, high-quality architectures; (Wang et al. 2021a) proposed the
our method achieves remarkable results, requiring only 0.1 node normalization and decorrelation discretization strategy
GPU-Days for the search (see Fig. 1). The architectures to improve generality and stability; (Xiao et al. 2022b) intro-
discovered not only achieve 97.47% accuracy on CIFAR- duced the Shapley value to evaluate the importance of oper-
10, but also demonstrate 74.4% top-1 accuracy when trans- ations; (Chen et al. 2020) adopted an incremental learning
ferred to ImageNet. Moreover, we directly performed the scheme to bridge the gap between search and evaluation.
search and evaluation on ImageNet, achieving an outstand- However, the critical issue highlighted by the researchers
ing 75.1% top-1 accuracy with a search cost of just 1.2 GPU- (Chen et al. 2019; Zela et al. 2019) remains that gradient-
Days on 2 RTX 4090 GPUs, which is the optimal search based methods suffer from the issue of premature conver-
speed compared to state-of-the-art methods. gence to local optima, significantly compromising the per-
formance of architectures obtained during the search stage.
Related Work
The search strategy based on Evolutionary Algorithms (EA) Methodology
is one of the most common approaches, relying on EA’s
global search capability to prevent premature convergence Preliminaries
into local optima and effectively address large-scale NAS Differentiable Architecture Search (DARTS) DARTS
tasks with a discrete search space. The general process of (Liu, Simonyan, and Yang 2018) is a gradient-based method

11160
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)

Top Acc Poor architecture


Top Acc
Original Architecture
Excellent Architecture
Update
operation weights
Update 𝜶
Poor Acc Poor Acc Select the best one

0
Input ES 𝛼
1
2

0 GD 𝜔 3
Candidate Operation
1
2 No
Yes
END EVA
3

Figure 2: Illustration of EG-NAS main framework. Improved Evolutionary Strategy outputs the best one from evolved architec-
tures as a direction to optimize α. After α is updated, the new architecture is composed and evaluated. The EVA is shorthand
for evaluation.

widely used in NAS that leverages continuous relaxation and Explore Search Directions with ES For gradient-based
weight-sharing techniques to achieve a differentiable archi- methods, the most common step in NAS is to calculate the
tecture search process, significantly reducing the computa- gradient information and use it as a search direction, iter-
tional cost. Following previous research (Pham et al. 2018; ating until convergence. Although the method is simple and
Bender et al. 2018), DARTS discovers the optimal cell ar- efficient, the search direction is limited by the gradient infor-
chitecture and constructs a supernet by repeatedly stacking mation, which makes the search direction single and easy to
normal and reduction cells. Note that, compared to normal fall into a local dilemma. Covariance Matrix Adaptive Evo-
cells, the reduction cells are located at 1/3 and 2/3 of the lutionary Strategy (CMA-ES) (Hansen 2016; Loshchilov
total depth of the network, where all operations adjacent to and Hutter 2016) is one of the most appreciated evolution-
the stride of the input node are set 2. During the search pro- ary algorithms in solving continuous black-box problems.
cess, each cell is regarded as a directed acyclic graph (DAG) Therefore, we apply the capability of CMA-ES, namely ef-
with N nodes and E edges, where each node x(i) is rep- ficient convergence and global search, to explore different
resented by the feature map and each edge (i, j) represents search directions, avoiding getting trapped in local optima.
an operation o(i,j) on the information flow transfer between We begin by sampling N architectures, denoted as xn for
different nodes. In DARTS, all the candidate operations are n = 1, 2, . . . , N , from the Gaussian distribution with α as
applied to the continuous relaxation approach to perform the the mean vector m0 and unit matrix I as the covariance ma-
gradient-based search. Specifically, the intermediate node is trix C0 . In other words, xn = α + σy with y ∼ N (0, I) for
computed using a softmax mixture of candidate operations: n = 1, 2, . . . , N . Subsequently, the sampled architectures
xn are used to initialize N CMA-ES based searches, where
(i,j)
X exp(αo ) each xn , for n = 1, 2, . . . , N , is considered as the initialized
o(i,j) (x(i) ) = P (i,j)
o(x(i) ), (1) mean vector m0 of the n-th ES for the search direction ex-
o∈O o′ ∈O exp(αo′ ) ploration. More specifically, the initial search population of
where i < j, all candidate operations are stored in O, and the n-th ES is sampled as follows:
(i,j)
αo represents the mixing weight of operation o(i,j) in the zti = mt + σi yi , yi ∼ N (0, Ct ), (3)
supernet construction. During the process of relaxation, ar-
chitecture search optimizes network weight ω and architec- 0
where t starting with 0 is the number of iterations, C = I,
ture parameters α in a differentiable manner, establishing a i = 1, . . . , λ and σ is the step size. After that, the mean
bi-level optimization model for NAS: vector m and the covariance matrix Ct+1 are optimized by
Eq. 4 and Eq. 5, which generate the new individual zti+1 .
min F (ω ∗ (α), α) = Lval (ω ∗ (α), α)
α
(2) P⌊λ/2⌋
s.t. ω ∗ (α) = argminω Ltrain (ω, α), mt+1 = i=1 βi zti , (4)

where the optimization variables α and ω are updated where βi represent the fitness weight assigned to each indi-
via the gradient descent. After ending the search, the fi- vidual xi .
nal architecture is composed of the operations with the
largest architectural parameter α on each edge, o(i,j) = Ct+1 = (1 − c1 − c⌊λ/2⌋ )C + c1 (ppT )
(i,j) (5)
argmaxo′ ∈O αo . + c⌊λ/2⌋ ⌊λ/2⌋ t t+1
)(zti − mt+1 )T ,
P
i=1 βi (zi − m

11161
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)

where p is the evolutionary path. During the search, each in- relevant ablation studies, and the corresponding compara-
dividual is evaluated by the fitness function f (·) and sorted tive results are presented in Fig. 3. Detailed analysis of the
based on the fitness value of each individual. The best so- results can be found in part of the ablation studies.
lutions ⌊λ/2⌋ are then utilized to update the search parame-
ters, such as the covariance matrix Ct+1 and the two learn- Evolution Strategy with Gradient Descent-based
ing rates c1 , c⌊λ/2⌋ for the next iteration in CMA-ES. Con- Architecture Search
sidering that the optimal individual may not necessarily be To better guide the architecture search process, we pro-
present in the final population, we not only update the vari- pose exploring various search directions and selecting the
ables to optimize the search direction but also maintain a optimal ones based on task performance with the evolu-
record of the best individual z∗i from each population by tionary strategy. Compared to optimizing both the architec-
storing them in the set P based on their fitness values. Ul- ture and network parameters using gradient-based methods,
timately, during the final iteration phase, we select the indi- our proposed approach updates only the network parameters
vidual z∗best with the highest fitness value from the set P as through gradient descent, while the architecture parameters
the output, representing the optimal search direction for the are updated using an evolutionary strategy to sample excel-
n-th sampling. lent search directions for guidance. To more visually illus-
trate our proposed approach, we provide the general steps of
Compound Fitness Function EG-NAS as shown in Fig. 2. The update process of α can
The proposed fitness function, shown in Eq. 7 supplements be presented in Algorithm 1 and in Eq. 8.
the consideration of individual similarity or diversity. In αt = αt−1 + ξ∇α Lval (ω(α), α) → αt = αt−1 + ξst ,
other words, the fitness function f (·) is composed of the (8)
cross-entropy loss function L1 and the cosine similarity L2 where st represents the direction of α updates at the t-th step
as shown in Eq. 6. during the optimization process, ∇α Lval (ω(α), α) repre-
sents the search direction based on the gradient information,
L1 (yi , ŷi ) = − C
P
i=1 yi log(ŷi ), and ξ means the step size. To ensure that the search direc-
tion s used to update α is closely related to the task perfor-
1 αt zt+1 (6)
L2 (αt , zt+1i )= [ i
t+1 + 1], mance, we introduce task performance-based stabilization
2 ∥αt ∥ zi during the optimization:
where C is the number of classes and L1 represents the st = argmax Accval (ω t−1 , x∗n ), n = 1, 2, . . . , N, (9)

cross-entropy loss function used to showcase the perfor- where Accval means the validation accuracy, xn is the
mance of architecture zt+1 i . In L1 , yi and ŷi refer to the search direction that yields the best fitness value by sam-
predicted labels by architecture zt+1
i and true labels, respec- pling the evolutionary strategy, and ω t−1 is the network pa-
tively. The essence of L2 is the cosine similarity function, rameters at (t − 1) step. After completing the search, we
which denotes the similarity between the current architec- adopt the same approach as DARTS to obtain the final ar-
ture zt+1
i and the original architecture αt . Note that for con- chitecture, where each edge is constructed with the opera-
venience in subsequent calculations, we have made a simple tion that has the maximum weight. Compared to DARTS,
adjustment, so that the function L2 ∈ [0, 1]. When the L2 which updates architecture parameters only once per round,
result tends to 0, it indicates that the two architectures are our method optimizes architecture parameters in every sam-
similar; otherwise, they are dissimilar. pling and evolutionary strategy step. As a result, in terms
of architecture parameter optimization, EG-NAS has a time
ζL1 − ηL2 if Acc(αt ) > Acc(zt+1

t+1 i ) complexity of N times λ compared to traditional gradient
f (αt , zi ) = , descent methods.
ζL1 + ηL2 else
(7)
where ζ and η are the weight coefficients of L1 and L2 , Experiments and Analysis
respectively. When the performance of the currently gener- In this part, we conduct extensive experiments to evaluate
ated individual is inferior to the original individual, we make our approach, EG-NAS, on the DARTS search space with
the L2 function tend towards 0, directing the new individual CIFAR-10, CIFAR-100, and ImageNet for image classifi-
towards a more promising performance. Conversely, when cation, as well as the NAS-Bench-201 search space with
the performance is better, we make the L2 function tend to- CIFAR-10, CIFAR-100, and ImageNet-16-120. All experi-
wards 1, encouraging the new individual to evolve towards ments were conducted on a single Nvidia RTX 3090, except
increased architectural diversity. Assisted by this composite for the ImageNet experiments, which were conducted on 2
fitness function f (·), the evolution strategy generates more RTX 4090. The ImageNet datasets mentioned in this paper
diverse individuals, evaluating a broader range of search di- refer to either ImageNet 1K or the ILSVRC2012 dataset, ex-
rections, leading to a more effective balance between perfor- cept for ImageNet-16-120 in NAS-Bench-201.
mance and diversity. Additionally, exploring and evaluating
more diverse search directions is beneficial for alleviating Datasets and Implementation Details
the issue of premature convergence to local optima and ob- CIFAR-10 dataset contains 60,000 color images from 10 dif-
taining superior architectures. To confirm the effectiveness ferent categories. CIFAR-100 dataset consists of 100 dif-
of the compound fitness function f (·), we have conducted ferent categories, including some finer-grained classes. The

11162
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)

Algorithm 1: Main framework of EG-NAS


Require: Architecture and Network parameters: α, ω; L1 L2 L1&L2
Sample times N ; Epoch T ; Step size ξ and σ1 ; Popu-
lation size λ. 90
1: For each edge (i, j), a mixed operation o(i,j) is formed
by parameterizing it with α(i,j) . 85
2: for t = 1, 2, . . . , T do
Update network parameter ω by ∇ω Ltrain (ω, α); 80

Valid Acc (%)


3:
4: for n = 1, 2, . . . , N do
5: Initialize xn = αt + σn y, y ∼ N (0, I); 75
6: while k < λ do
7: Sample the population individuals zt+1 via 70
k
Eq. 3;
65
8: Evaluate λ individuals using Eq. 7 ;
9: Select ⌊λ/2⌋ best individuals for updating
60
and store the best in P;
Epoch 15 Epoch 20 Epoch 30 Epoch 40 Epoch 50
10: k = k + 1;
11: end while
12:  Select the best individual z∗best from P = Figure 3: During the CIFAR-10 search phase, the impact of
m , z1 , . . . , z∗λ−1 based on the fitness values.
t ∗
L1 and L2 on EG-NAS in evolutionary strategy. The valid
13: Store the optimal individual z∗best → x∗n in D; acc (%) refers to the metric that evaluates the search direc-
14: end for tion with the optimal fitness value during the ES search.
15: Select the best search direction st+1 based on the
validation accuracy from D = {x∗1 , x∗2 , . . . , x∗N };
16: αt+1 = αt + ξst+1 ; tasks by directly evaluating the dataset, and we derive the
17: end for mean and standard deviation of the best architectures by
18: Derive the final architecture parameter α. conducting independent runs with 4 different random seeds.
More details about NAS-Bench-201 can be found in refer-
ence (Dong and Yang 2020).
same operation space O as DARTS is applied and a sim-
ilar partial connection method as PC-DARTS is employed Comparison with State-of-the-art NAS Methods
to reduce memory overhead. The training set of CIFAR-10 The performance of EG-NAS is compared with state-of-the-
is divided into two parts of equal size, one for optimizing art NAS methods on the CIFAR-10 and CIFAR-100 datasets
the network parameters by gradient descent and the other in Table 1. EG-NAS achieves an impressive test error rate
for obtaining the search direction. We train the supernet for of 2.53% with just 0.1 GPU-Days, surpassing the DARTS
50 epochs (the first 15 epochs for warm-up) with a batch baseline significantly in search cost and accuracy. Although
size of 256 and retrained the network from scratch from 600 ProxylessNAS, P-DARTS, and β−DARTS perform better,
epochs, with a batch size of 96. In ES, the population size λ EG-NAS explores architecture search in a distinct space
is set as 25 and the coefficients ζ and η for L1 and L2 are set with remarkably low search cost while maintaining excep-
as 1.0 and 0.4, respectively. The step size ξ, the sample num- tional performance. Table 2 presents the comparison results
bers N , and the initial channel number were assigned to 0.6, of EG-NAS with other methods on the ImageNet dataset.
5, and 16, respectively. For a fair comparison, the remain- We trained the best-searched architecture on CIFAR-10 to
ing search and assessment phases are set up to be consistent evaluate its transferability to ImageNet, achieving a compet-
with DARTS. itive top-1 test error rate of 25.6%, confirming the general-
ImageNet consists of 1000 classes with 1.2 million train- izability of EG-NAS. Additionally, we directly search Ima-
ing and 50,000 validation images. We train the supernet for geNet to evaluate the optimal architecture, obtaining a top-
50 epochs (the first 15 epochs for warm-up) with a batch 1 test error rate of 24.9%. This not only outperforms most
size of 1024 in the search stage and retained the network NAS methods in terms of performance but is also notewor-
from scratch from 250 epochs in the evaluation stage. In the thy as we achieved this with an extremely low search cost
ES algorithm, we set the population size λ to 50. We em- (2.2 GPU-Days), which surpasses all other NAS methods in
ploy an SGD optimizer with a linearly decayed learning rate terms of search efficiency. On the NAS-Bench-201 dataset,
initialized at 0.5, a momentum of 0.9, and a weight decay of EG-NAS achieves outstanding performance by conducting
3×10−5 . NAS-Bench-201 is a significant benchmark dataset an architecture search on the CIFAR-10 dataset and eval-
containing 15,625 diverse neural network architectures with uating on CIFAR-10, CIFAR-100, and ImageNet-16-120,
5 different operations and 4 nodes per cell. It provides a stan- achieving test accuracies of 93.56%, 70.91%, and 46.13%,
dardized testing environment for evaluating NAS algorithms respectively, as shown in Table 3. While our method may
on CIFAR-10, CIFAR-100, and ImageNet16-120 datasets. not outperform all NAS methods on all datasets, such as
In this search space, we obtain the performance of specific DARTS- which performs better on CIFAR-10 and CIFAR-

11163
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)

Test Error (%) Params Search Cost


Architecture
CIFAR-10 CIFAR-100 (M) (GPU-Days)
ResNet (He et al. 2016) 4.61 22.1 1.7 -
ENAS + cutout (Pham et al. 2018) 2.89 - 4.6 0.5
AmoebaNet-A (Real et al. 2018) 3.34 17.63 3.3 3150
NSGA-Net (Lu et al. 2018) 2.75 20.74 3.3 4.0
NSGANetV1-A2 (Lu et al. 2018) 2.65 - 0.9 27
EPCNAS-C (Huang et al. 2022) 3.24 18.36 1.44 1.2
EAEPSO (Yuan et al. 2023) 2.74 16.94 2.94 2.2
DARTS(1st) (Liu, Simonyan, and Yang 2018) 3.00 17.54 3.4 0.4
DARTS(2st) (Liu, Simonyan, and Yang 2018) 2.76 - 3.3 1.0
SNAS (moderate) + cutout (Xie et al. 2018) 2.85 17.55 2.8 1.5
ProxylessNAS + cutout (Cai, Zhu, and Han 2018) 2.02 - - 4.0
GDAS (Dong and Yang 2019b) 2.93 18.38 3.4 0.2
BayesNAS (Zhou et al. 2019) 2.81 - 3.4 0.2
P-DARTS + cutout (Chen et al. 2019) 2.50 17.49 3.4/3.6 0.3
PC-DARTS + cutout (Xu et al. 2019) 2.57 16.90 3.6 0.1
DARTS- (Chu et al. 2020) 2.59 17.51 3.4 0.4
β-DARTS (Ye et al. 2022) 2.53 16.24 3.75/3.80 0.4
DrNAS (Chen et al. 2020) 2.54 - 4.0 0.4
DARTS+PT (Wang et al. 2021b) 2.61 - 3.0 0.8
EG-NAS 2.53 16.22 3.2 0.1

Table 1: Comparison of EG-NAS with state-of-the-art image classifiers on CIFAR-10 and CIFAR-100. The results of EG-NAS
were obtained by repeated experiments with 4 random seeds.

100, EG-NAS still surpasses most of the NAS methods. In Test Error
86.0 Search Cost 0.24
particular, it achieves exceptional performance with 46.13%

Search Cost (GPU Days)


on ImageNet-16-120, confirming the effectiveness of our 0.20
method, EG-NAS, and its excellent generalizability in more 85.5
Valid Error (%)

complex datasets. 0.16


85.0
0.12
Test error Search cost Params 84.5
Architecture 0.08
top-1(%) (GPU-Days) (M)
Inception-v1 30.1 - 6.6 84.0 0.04
MobileNet 29.4 - 4.2
DARTS(2st) 26.7 1.0 4.7 83.5 0.00
SNAS 27.3 1.5 2.8 15 30 50 70 100
ProxylessNAS† 24.9 8.3 7.1 Population Size 𝑃
GDAS 26.0 0.3 3.4
BayesNAS 26.5 0.2 3.9 Figure 4: During the CIFAR-10 search phase, the effects of
PC-DARTS 25.1 0.1 4.7 population size λ on EG-NAS in evolutionary strategy. The
DrNAS† 24.2 4.6 5.7 valid acc (%) refers to the metric that evaluates the search
DARTS+PT† 25.5 3.4 4.7 direction with the optimal fitness value during the ES search.
NASNet-A 26.0 2000 3.3
NASNet-B 27.2 2000 3.3
NASNet-C 27.5 2000 3.3
AmoebaNet-A 25.5 3150 3.2 Ablation Study
AmoebaNet-B 26.0 3150 3.2 Impact of L1 and L2 To investigate the impact of L1
AmoebaNet-C 24.3 3150 3.2 and L2 , we conducted experiments with different fitness
NSGANetV1-A2 25.5 27 4.1
EAEPSO 26.9 4.0 4.9
compositions on CIFAR-10. From Fig. 3, we can observe
EPCNAS-C2 27.1 1.17 3.0 that in the early stages of the experiments, fitness functions
EG-NAS 24.9 0.1 5.3 solely based on performance achieved relatively good re-
EG-NAS† 25.6 2.2 5.2 sults. However, as the experiments progressed, fitness func-
tions incorporating both L1 and L2 show superior growth
Table 2: Comparison with state-of-the-art image classifiers and performance, as they explored more directions while
on ImageNet. † means the results obtained by searching on considering performance. On the other hand, fitness func-
ImageNet, otherwise on CIFAR-10. tions focusing solely on diversity could explore more direc-
tions but faced challenges in convergence due to the lack of
performance consideration.

11164
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)

CIFAR-10 CIFAR-100 ImageNet16-120


Architecture
valid test valid test valid test
ResNet(He et al. 2016) 90.83 93.97 70.42 70.86 44.53 43.63
Random (baseline) 90.93±0.36 93.70±0.36 70.60±1.37 70.65±1.38 42.92±2.00 42.96±2.15
ENAS (Pham et al. 2018) 37.51±3.19 53.89±0.58 13.37±2.35 13.96±2.33 15.06±1.95 14.57±2.10
RandomNAS (Li and Talwalkar 2020) 80.42±3.58 84.07±3.61 52.12±5.55 52.31±5.77 27.22±3.24 26.28±3.09
SETN (Dong and Yang 2019a) 84.04±0.28 87.64±0.00 58.86±0.06 59.05±0.24 33.06±0.02 32.52±0.21
GDAS (Dong and Yang 2019b) 90.01±0.46 93.23±0.23 24.05±8.12 24.20±8.08 40.66±0.00 41.02±0.00
DSNAS (Hu et al. 2020a) 89.66±0.29 93.08±0.13 30.87±16.40 31.01±16.38 40.61±0.09 41.07±0.09
DARTS (1st) (Liu, Simonyan, and Yang 2018) 39.77±0.00 54.30±0.00 15.03±0.00 15.61±0.00 16.43±0.00 16.32±0.00
DARTS (2st) (Liu, Simonyan, and Yang 2018) 39.77±0.00 54.30±0.00 15.03±0.00 15.61±0.00 16.43±0.00 16.32±0.00
PC-DARTS (Xu et al. 2019) 89.96±0.15 93.41±0.30 67.12±0.39 67.48±0.89 40.83±0.08 41.31±0.22
iDARTS (Wang et al. 2021a) 89.86±0.60 93.58±0.32 70.57±0.24 70.83±0.48 40.38±0.59 40.89±0.68
DARTS- (Chu et al. 2020) 91.03±0.44 93.80±0.40 71.36±1.51 71.53±1.51 44.87±1.46 45.12±0.82
EG-NAS 90.12±0.05 93.56±0.02 70.78±0.12 70.91±0.07 44.89±0.29 46.13±0.46
Optimal 91.61 94.37 73.49 73.51 46.77 47.31

Table 3: Performance comparison of the NAS-Bench-201 benchmark. Note that EG-NAS only searched on the CIFAR-10
dataset, but all achieved competitive results on CIFAR-10, CIFAR-100, and ImageNet16-120. The average values are obtained
over four independent search runs.

Influence of coefficients ζ and η of L1 and L2 To explore NAS. Based on the observations in Fig. 4, when the popu-
the influence of different values of the coefficients ζ and η lation size λ is set too small, it significantly constrains the
in the fitness function f (·) of ES, we implement L1 and L2 diversity of the population, further limiting the discovery
assignment with different ζ and η. The validation accuracy of high-performance architectures. With the increase in the
and model parameter cost are demonstrated in Table 4. As population size λ, there is a greater number of individuals
the number of updates increases, the value of the L2 co- available for selection, leading to increased diversity within
efficient in the fitness function f (·), denoted as η, plays a the population, making it easier to discover high-quality net-
critical role in Evolution Strategies (ES). When η is set too work architectures. However, as λ continues to increase, the
large, ES tends to prioritize exploring diverse search direc- performance of discovered architectures sees only marginal
tions, leading to excessive dispersion and hindered conver- improvements, due to the constraints imposed by the cell
gence. On the contrary, if η is too small, ES focuses pre- search space. Concurrently, the increase in λ results in a sig-
dominantly on optimizing performance, resulting in overly nificant escalation of search costs. To effectively strike a bal-
monotonous search directions. As the L1 coefficient ζ in- ance between search costs and architecture performance, we
creases, the evolutionary strategy discovers more stable and ultimately opted for a population size of λ = 50 as our final
superior optimization directions, with ζ=1.0 achieving the choice.
best performance.
Effect of Population Size λ on EG-NAS In this part, a se- Conclusion and Future Work
ries of experiments are conducted on the CIFAR-10 dataset In this paper, we propose a simple yet efficient compound
to investigate the impact of the population size λ on EG- approach based on gradient descent and an improved evo-
lutionary strategy, termed EG-NAS, for neural architecture
search, which alleviates the dilemma of premature conver-
η=1.0 η=0.8
ζ
valid acc (%) params (M) valid acc (%) params (M)
gence to local optima by gradient descent. Meanwhile, by re-
1.0 84.36 3.23 85.42 2.96 ducing the similarity between individuals in the evolutionary
0.8 85.21 2.96 84.89 3.56 strategy, we can effectively explore various search directions
0.4 84.36 3.23 83.80 3.85 and avoid being trapped in local optima. Finally, we evaluate
0.2 83.61 3.10 83.13 3.57 the selected search directions with better fitness values using
η=0.4 η=0.2 validation accuracy to more accurately determine the rela-
ζ
valid acc (%) params (M) valid acc (%) params (M) tionship between search directions and task performance. In
1.0 85.53 3.71 85.40 2.85 the future, our aim is to further investigate the effective in-
0.8 84.82 3.97 83.63 3.41 tegration of different algorithms and enhance the stability of
0.4 82.81 3.0 81.25 2.85 hybrid algorithms to address a wide range of tasks.
0.2 81.54 2.96 80.68 3.66

Table 4: On CIFAR-10, the effect of different coefficient ζ Acknowledgments


and η values of the composite fitness function Eq. 7 on the This work was supported in part by the National Natu-
task performance is verified. The valid acc (%) refers to the ral Science Foundation of China 62006044, and in part
metric that evaluates the search direction with the optimal by Programme of Science and Technology of Guangzhou
fitness value during the ES search. 202201010377.

11165
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)

References Conference on Computer Vision and Pattern Recognition,


Bender, G.; Kindermans, P.-J.; Zoph, B.; Vasudevan, V.; and 12084–12092.
Le, Q. 2018. Understanding and simplifying one-shot ar- Hu, Y.; Liang, Y.; Guo, Z.; Wan, R.; Zhang, X.; Wei, Y.; Gu,
chitecture search. In International Conference on Machine Q.; and Sun, J. 2020b. Angle-based search space shrinking
Learning, 550–559. PMLR. for neural architecture search. In Computer Vision–ECCV
Cai, H.; Zhu, L.; and Han, S. 2018. Proxylessnas: Direct 2020: 16th European Conference, Glasgow, UK, August 23–
neural architecture search on target task and hardware. arXiv 28, 2020, Proceedings, Part XIX 16, 119–134. Springer.
preprint arXiv:1812.00332. Huang, J.; Xue, B.; Sun, Y.; Zhang, M.; and Yen, G. G. 2022.
Chen, X.; and Hsieh, C.-J. 2020. Stabilizing differen- Particle Swarm Optimization for Compact Neural Architec-
tiable architecture search via perturbation-based regulariza- ture Search for Image Classification. IEEE Transactions on
tion. In International Conference on Machine Learning, Evolutionary Computation, 1–1.
1554–1565. PMLR. Li, L.; and Talwalkar, A. 2020. Random search and repro-
ducibility for neural architecture search. In Uncertainty in
Chen, X.; Wang, R.; Cheng, M.; Tang, X.; and Hsieh, C.-
Artificial Intelligence, 367–377. PMLR.
J. 2020. Drnas: Dirichlet neural architecture search. arXiv
preprint arXiv:2006.10355. Liu, H.; Simonyan, K.; Vinyals, O.; Fernando, C.; and
Kavukcuoglu, K. 2018. Hierarchical Representations for Ef-
Chen, X.; Xie, L.; Wu, J.; and Tian, Q. 2019. Progres- ficient Architecture Search. In International Conference on
sive differentiable architecture search: Bridging the depth Learning Representations.
gap between search and evaluation. In Proceedings of
the IEEE/CVF international conference on computer vision, Liu, H.; Simonyan, K.; and Yang, Y. 2018. DARTS:
1294–1303. Differentiable Architecture Search. arXiv preprint
arXiv:1806.09055.
Chu, X.; Wang, X.; Zhang, B.; Lu, S.; Wei, X.; and Yan, J.
2020. Darts-: robustly stepping out of performance collapse Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.;
without indicators. arXiv preprint arXiv:2009.01027. and Xie, S. 2022. A convnet for the 2020s. In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern
Cui, X.; Zhang, W.; Tüske, Z.; and Picheny, M. 2018. Evolu- Recognition, 11976–11986.
tionary stochastic gradient descent for optimization of deep
Loshchilov, I.; and Hutter, F. 2016. CMA-ES for hyper-
neural networks. Advances in Neural Information Process-
parameter optimization of deep neural networks. arXiv
ing Systems, 31.
preprint arXiv:1604.07269.
Dong, X.; and Yang, Y. 2019a. One-shot neural architecture Lu, Z.; Whalen, I.; Boddeti, V.; Dhebar, Y. D.; Deb, K.;
search via self-evaluated template network. In Proceedings Goodman, E. D.; and Banzhaf, W. 2018. NSGA-NET: A
of the IEEE/CVF International Conference on Computer Vi- Multi-Objective Genetic Algorithm for Neural Architecture
sion, 3681–3690. Search. CoRR, abs/1810.03522.
Dong, X.; and Yang, Y. 2019b. Searching for a robust Pham, H.; Guan, M. Y.; Zoph, B.; Le, Q. V.; and Dean, J.
neural architecture in four gpu hours. In Proceedings of 2018. Efficient Neural Architecture Search via Parameter
the IEEE/CVF Conference on Computer Vision and Pattern Sharing. CoRR, abs/1802.03268.
Recognition, 1761–1770.
Real, E.; Aggarwal, A.; Huang, Y.; and Le, Q. V. 2018. Reg-
Dong, X.; and Yang, Y. 2020. Nas-bench-201: Extending ularized Evolution for Image Classifier Architecture Search.
the scope of reproducible neural architecture search. arXiv Proceedings of the AAAI Conference on Artificial Intelli-
preprint arXiv:2001.00326. gence, 33.
Hansen, N. 2016. The CMA evolution strategy: A tutorial. Simonyan, K.; and Zisserman, A. 2014. Very Deep Convolu-
arXiv preprint arXiv:1604.00772. tional Networks for Large-Scale Image Recognition. Com-
Hansen, N.; and Ostermeier, A. 1996. Adapting arbitrary puter Science.
normal mutation distributions in evolution strategies: the co- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.;
variance matrix adaptation. In Proceedings of IEEE Interna- Anguelov, D.; Erhan, D.; Vanhoucke, V.; and Rabinovich, A.
tional Conference on Evolutionary Computation, 312–317. 2015. Going deeper with convolutions. In Proceedings of the
He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual IEEE Conference on Computer Vision and Pattern Recogni-
learning for image recognition. In Proceedings of the IEEE tion, 1–9.
Conference on Conference on Computer Vision and Pattern Wang, H.; Yang, R.; Huang, D.; and Wang, Y. 2021a. idarts:
Recognition, 770–778. Improving darts by node normalization and decorrelation
Howard, A. G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, discretization. IEEE Transactions on Neural Networks and
W.; Weyand, T.; Andreetto, M.; and Adam, H. 2017. Mo- Learning Systems.
bilenets: Efficient convolutional neural networks for mobile Wang, R.; Cheng, M.; Chen, X.; Tang, X.; and Hsieh, C.-
vision applications. arXiv preprint arXiv:1704.04861. J. 2021b. Rethinking architecture selection in differentiable
Hu, S.; Xie, S.; Zheng, H.; Liu, C.; Shi, J.; Liu, X.; and Lin, nas. arXiv preprint arXiv:2108.04392.
D. 2020a. Dsnas: Direct neural architecture search with- Xiao, H.; Wang, Z.; Zhu, Z.; Zhou, J.; and Lu, J. 2022a.
out parameter retraining. In Proceedings of the IEEE/CVF Shapley-NAS: discovering operation contribution for neural

11166
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)

architecture search. In Proceedings of the IEEE/CVF con-


ference on computer vision and pattern recognition, 11892–
11901.
Xiao, H.; Wang, Z.; Zhu, Z.; Zhou, J.; and Lu, J. 2022b.
Shapley-NAS: Discovering Operation Contribution for Neu-
ral Architecture Search. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition,
11892–11901.
Xie, S.; Zheng, H.; Liu, C.; and Lin, L. 2018. SNAS:
stochastic neural architecture search. arXiv preprint
arXiv:1812.09926.
Xu, Y.; Xie, L.; Zhang, X.; Chen, X.; Qi, G.-J.; Tian, Q.;
and Xiong, H. 2019. Pc-darts: Partial channel connections
for memory-efficient architecture search. arXiv preprint
arXiv:1907.05737.
Xue, K.; Qian, C.; Xu, L.; and Fei, X. 2021. Evolution-
ary Gradient Descent for Non-convex Optimization. In Pro-
ceedings of the Thirtieth International Joint Conference on
Artificial Intelligence, IJCAI-21, 3221–3227.
Ye, P.; Li, B.; Li, Y.; Chen, T.; Fan, J.; and Ouyang, W. 2022.
b-darts: Beta-decay regularization for differentiable archi-
tecture search. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, 10874–10883.
Yuan, G.; Wang, B.; Xue, B.; and Zhang, M. 2023. Particle
Swarm Optimization for Efficiently Evolving Deep Convo-
lutional Neural Networks Using an Autoencoder-based En-
coding Strategy. IEEE Transactions on Evolutionary Com-
putation, 1–1.
Zela, A.; Elsken, T.; Saikia, T.; Marrakchi, Y.; Brox, T.; and
Hutter, F. 2019. Understanding and robustifying differen-
tiable architecture search. arXiv preprint arXiv:1909.09656.
Zhou, H.; Yang, M.; Wang, J.; and Pan, W. 2019.
BayesNAS: A Bayesian Approach for Neural Architecture
Search. CoRR, abs/1905.04919.
Zoph, B.; Vasudevan, V.; Shlens, J.; and Le, Q. V.
2018. Learning transferable architectures for scalable im-
age recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 8697–8710.

11167

You might also like