0% found this document useful (0 votes)
63 views20 pages

Neural Architecture Search: Basics

This document discusses neural architecture search (NAS), which aims to automatically find the optimal neural network architecture through searching over a large design space. It defines NAS and describes the basic components of a NAS problem, including the search space of architectural hyperparameters, the outcome of NAS as a specific CNN architecture, and challenges in NAS such as the high cost of evaluating each candidate architecture and the enormous size of the search space. Baseline random search is presented as a simple NAS approach.

Uploaded by

MInh Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views20 pages

Neural Architecture Search: Basics

This document discusses neural architecture search (NAS), which aims to automatically find the optimal neural network architecture through searching over a large design space. It defines NAS and describes the basic components of a NAS problem, including the search space of architectural hyperparameters, the outcome of NAS as a specific CNN architecture, and challenges in NAS such as the high cost of evaluating each candidate architecture and the enormous size of the search space. Baseline random search is presented as a simple NAS approach.

Uploaded by

MInh Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Neural Architecture Search Basics

Shusen Wang

http://wangshusen.github.io/
• Parameters

• Hyper-parameters
Training
data

Parameters (aka weights)

Testing Test
data accuracy
Training Hyper-
data parameters

Parameters (aka weights)

Testing Test
data accuracy
Architecture
Training Hyper-
data parameters
Algorithm

Parameters (aka weights)

Testing Test
data accuracy
Architecture
Training Hyper-
data parameters
Algorithm

Parameters (aka weights)

Testing Test
data accuracy
CNN Architectures

• Architectural hyper-parameters of a CNN include


• numbers of conv and dense layers,
• number of filters, size of filters, and stride in each conv layer,
• width of each dense layer.
CNN Architectures

• Architectural hyper-parameters of a CNN include


• numbers of conv and dense layers,
• number of filters, size of filters, and stride in each conv layer,
• width of each dense layer.

• Popular CNN architectures are manually designed.


• E.g., ResNet, MobileNet, etc.
• Manually tuning the architectural hyper-parameters.
CNN Architectures

• # of filters • # of filters • # of filters


• size of filters • size of filters ⋯ • size of filters
• stride • stride • stride

Conv Layer 1 Conv Layer 2 Conv Layer 20


Neural Architecture Search (NAS)

Definition: Neural Architecture Search (NAS).


Find the architecture that leads to the best validation accuracy
(or other metrics such as efficiency.)

• Example: ResNet has better accuracy than VGG.


• Example: MobileNet is more efficient than ResNet, although
MobileNet has lower accuracy.
Search Space
Hyper-parameter Types Candidates
# of filters 24, 36, 48, 64
size of filters 3×3, 5×5, 7×7
stride 1, 2

# of filters ∈ { 10, 11, 12, 13, ..., 98, 99, 100 }


Search Space
Hyper-parameter Types Candidates
# of filters 24, 36, 48, 64
size of filters 3×3, 5×5, 7×7
stride 1, 2
Search Space
Hyper-parameter Types Candidates
# of filters 24, 36, 48, 64
size of filters 3×3, 5×5, 7×7
stride 1, 2

Search space: The set containing all the possible architectures.


• We want to build a CNN with 20 Conv layers.
• Search space:
24, 36, 48, 64 -. × 3×3, 5×5, 7×7 -.
× 1, 2 -.
.
Search Space
Hyper-parameter Types Candidates
# of filters 24, 36, 48, 64
size of filters 3×3, 5×5, 7×7
stride 1, 2

Search space: The set containing all the possible architectures.


• We want to build a CNN with 20 Conv layers.
• Search space:
24, 36, 48, 64 -. × 3×3, 5×5, 7×7 -. × 1, 2 -. .
• Size of search space (i.e., number of possible architectures):
4×3×2 -. = 4×10-2 .
Search Space
Hyper-parameter Types Candidates
# of filters 24, 36, 48, 64
size of filters 3×3, 5×5, 7×7
stride 1, 2

Search space: The set containing all the possible architectures.


• We want to build a CNN with 20 Conv layers.
• Search space:
24, 36, 48, 64 -. × 3×3, 5×5, 7×7 -. × 1, 2 -. .
• Size of search space (i.e., number of possible architectures):
4×3×2 -. = 4×10-2 .
Outcome of NAS?

• For example, this is an outcome of NAS:

Layer 1 Layer 2 ⋯ Layer 20


# of filters 24 48 ⋯ 64
Size of filters 5×5 3×3 ⋯ 3×3
Stride 1 1 ⋯ 2
Baseline: Random Search

Train Evaluate
randomly selected
CNN model val acc = 82%
hyper-parameters

randomly selected
CNN model val acc = 94%
hyper-parameters
Baseline: Random Search

Train Evaluate
randomly selected
CNN model val acc = 82%
hyper-parameters

randomly selected
CNN model val acc = 94%
hyper-parameters

randomly selected
CNN model val acc = 91%
hyper-parameters

randomly selected
CNN model val acc = 88%
hyper-parameters
Challenges in NAS

Challenge 1: Each trial is expensive.


• Training a CNN from scratch takes hours or days, if a single
GPU is used.

Challenge 2: The search space is too big.


• Number of possible architectures:
4×3×2 -. = 4×10-2 .
Thank You!

http://wangshusen.github.io/

You might also like