OLMP Lab6

Artificial Intelligence (CS303)
Lab Courses
Lab 6: OLMP
OLMP
OLMP
LMP
Steps to DNN compression

Understand OLMP
DNN
ANN
ANN
• The study of artificial neural networks was inspired by attempts to simulate biological neural
systems (e.g. human brain).
• Basic structural and functional unit: nerve cells called neurons
• Work Mechanism
• Different neurons are linked together via axons (轴突) and dendrite (树突)
• When one neuron is excited after stimulation, it sends chemicals to the connected neurons,
thereby changing the potential (电位) within these neurons.
• If the potential (电位) of a neuron exceeds a “threshold”, then it is activated and send
chemicals to other neurons.
ANN
• A neuron is connected to the axons (轴突) of other neurons via dendrites (树突), which are
extensions from the cell body of the neuron.
• The contact point between a dendrite (树突) and an axon (轴突) is called a synapse (突触).
• The human brain learns by changing the strength of the synaptic connection between
neurons upon repeated stimulation by the same impulse.
Artificial Neuron Mathematical Model
• Input: 𝑥" from the i-th neuron
• Weights: connection weights (synapse)
• Output: 𝑜$ = 𝜑(∑+")* 𝑤"$ 𝑥" − 𝜃$ )
• One neuron can be considered as logistic regression

Artificial Neuron Model
• Output: 𝑜$ = 𝜑(∑+")* 𝑤"$ 𝑥" − 𝜃$ )
• Ideal activation function: step function but inapplicable

• Common activation function：sigmoid，tanh，ReLU
Artificial Neural Networks
• Consist of multiple artificial neurons
• Usually have the structure of an input layer,

multiple hidden layer, an output layer
• The design of an NN or AutoML aims to

design appropriate hidden layers and
connection weights.
3-layer Feedforward neural networks

• Other NNs：RBF Networks，CNN，RNN
etc.
One Inference Process
(*) * (3) 3 4
𝑧7 = 𝑓(a7 ) 𝑧7 = 𝑓(a7 ) 𝑦= = 𝑓 a=
𝑥 → 𝑎(*) → 𝑧 (*) → 𝑎(3) → 𝑧 (3) → 𝑎(4) → 𝑦
9
(*) (*)
a7 = 8 𝑤$" 𝑥" − 𝜃$
")*
<
(3) (3)
a; = 8 𝑤;$ 𝑧$ − 𝜃;
$)*
>
Superscript of w：layer index (4) (4)
a= = 8 𝑤=; 𝑧; − 𝜃=
;)*
Training of NN
• W and Threshold values decide the output of NN

• The training is to find appropriate values for W and Threshold
• The learning process is to tune weight matrix
x1 x2 x3 l1 l2
1.0 0.1 0.3 1 0 𝐸𝑟𝑟𝑜𝑟 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑊, 𝜃
1
0.1 1.5 1.2 1 0 = [ 𝑜1 𝑊, 𝜃 − 𝑦1 3
2
1.1 1.1 2.0 0 1 + 𝑜2 𝑊, 𝜃 − 𝑦2 3 ]
0.2 0.2 0.3 0 1
Calculate the gradient for
𝑊, 𝜃 , then tune them
Deep Neural Networks
• Shallow NN vs. Deep NNs

• No clear definition
• Deep NNs usually:
• Thousands of neurons in one layer
• Layer number >=8
• New activation function and training
methods
DNN Application：Handwritten character recognition
Performance of DNN
• Anti-interference ability, such as different sizes, digital distortion

Requirement
• A lot of successful applications, e.g. face recognition, NLP
• However, DNNs are not easy to run on low-end hardware
• One obstacle is their enormous sizes, Memory size ß à Number of W
LSTMP RNN [Sak et al., 2014]
>300MB
AlexNet [Krizhevsky et al., 2012] Transformer [Vaswani et al., 2017]

>200MB >1.2GB
ZIP
IPhone 8
=289MB 2GB RAM =104MB
=333MB =125MB
DNN Compression
• Pruning (e.g. LMP and OLMP)

• Quantization
• Other Compression Methods
Notations
Suppose a neuron network with 𝐿 + 1 layers is represented as the
set of its connections:
= =
𝑊 = 𝑊",$ 𝑊",$ ≠ 0,1 ≤ 𝑙 ≤ 𝐿, 1 ≤ 𝑖 ≤ 𝑛= , 1 ≤ 𝑗 ≤ 𝑛=U* . (1)
Where:
• 𝑛= is the number of neurons in layer 𝑙, 𝑛=U* is similar.
=
• 𝑊",$ denotes the connection weight between the 𝑖 WX neuron in layer 𝑙 and the 𝑗WX
neuron in layer 𝑙 + 1.
=
• 𝑊",$ = 0 indicates the corresponding connection does not exist.
Layer 𝑙 + 1 𝑗
𝑛=U* neurons
=
𝑊",$
Layer 𝑙 𝑖
𝑛= neurons
MP and OLMP
Magnitude-based pruning (MP) [LeCun et al., 1989] : Given a network 𝑊 and a threshold 𝜀,
Magnitude-based pruning indicates:
𝑀𝑃 𝑊, 𝜀 = 𝑤 𝑤 ≥ 𝜀, 𝑤 ∈ 𝑊 .
This method prunes the connections whose absolute connection weights are lower than 𝜀.
Layer-wise magnitude-based pruning (LMP) [Guo et al., 2016; Han et al., 2015] : Instead of apply MP on
the whole network, LMP applies one each layer separately:
𝐿𝑀𝑃 𝑊, {𝜀* , 𝜀3 , … , 𝜀` } = ⋃`=)* 𝑀𝑃 𝑊, 𝑙, 𝜀= , (2)
= =
where 𝑀𝑃 𝑊, 𝑙, 𝜀 = 𝑊",$ 𝑊",$ ≥ 𝜀, 1 ≤ 𝑖 ≤ 𝑛= , 1 ≤ 𝑗 ≤ 𝑛=U* .
𝜀= is the threshold for layer 𝐿

Threshold Tuning
• The solution space for 𝜀* , 𝜀3 , … , 𝜀` can be very large

Suppose a DNN has 𝐿 layers and each layer contains 𝑁 connections, then the possible
combinations of 𝜀* , 𝜀3 , … , 𝜀` will be of size:
(𝑁 + 1)`
Which is very large even for a DNN with modest size
• The evaluation of candidate thresholds is time consuming

Need to evaluate the pruned model on the training set for performance loss.
Optimization-based OLMP
Optimization based LMP:
𝜺∗ = argmin 𝑊 p 𝑠. 𝑡. 𝑓 𝑊 − 𝑓(𝑊 p ) ≤ 𝛿 , （3）
𝜺∈ℝl ,mn )`<o(m,𝜺)
where 𝜺 = {𝜀*, 𝜀3, … , 𝜀` }, 𝑊 p is the model pruned by applying LMP with 𝜺 on
𝑊.
Derivative-free optimization methods
Derivative-free optimization methods [Goldberg, 1989; Brochu et al., 2010; Qian et al., 2015; Yu et al., 2016] : do not require
the problem to be either continuous or differentiable. In our paper, we use negatively
correlated search (NCS) [Tang et al., 2016] to solve Eq. (3).
Negatively Correlated Search [Tang et al., 2016] ：It uses negative correlations to increase diversities
among solutions and to encourage them to search different areas of the solution space.
*NCS can be substituted by

any other suitable
optimization methods!
How to apply NCS
• Fitness function definition

Overall Pipeline for DNN Compression
= =
Eq. (1): 𝑊 = 𝑊",$ 𝑊",$ ≠ 0,1 ≤ 𝑙 ≤ 𝐿, 1 ≤ 𝑖 ≤ 𝑛= , 1 ≤ 𝑗 ≤ 𝑛=U* .
Eq. (2): 𝐿𝑀𝑃 𝑊, {𝜀* , 𝜀3 , … , 𝜀` } = ⋃`=)* 𝑀𝑃 𝑊, 𝑙, 𝜀= ,

𝑀𝑃 𝑊, 𝑙, 𝜀 = 𝑊",$ = =
𝑊",$ ≥ 𝜀, 1 ≤ 𝑙 ≤ 𝐿, 1 ≤ 𝑖 ≤ 𝑛= , 1 ≤ 𝑗 ≤ 𝑛=U* .
⑤ Be retrained on the
whole training set until
Eq. (3): 𝜺∗ = argmin 𝑊 p 𝑠. 𝑡. 𝑓 𝑊 − 𝑓(𝑊 p ) ≤ 𝛿 , 𝜺 = {𝜀* , 𝜀3 , … , 𝜀` }. converging
𝜺∈ℝl ,mn )`<o(m,𝜺)
Original model Final model

fulfilled
Stop criterion
① The 𝑊 w.r.t† Eq. (1) Model to be pruned
② Solve Eq. (3) for 𝜺∗ OLMP
④ Use Dynamic surgery

[Guo et al., 2016] to recover
③ Prune 𝑊 with 𝜺∗ retrain + connection recover
the incorrect pruned
Pruned model
w.r.t Eq. (2) connections
Iterative pruning and adjusting
Experimental Settings
Experiment-Application to LeNet
OLMP achieves the best

compressional result with
no accuracy loss on test set
Iterative pruning and retraining (ITR) [Han et al., 2015]

Dynamic surgery (DS) [Guo et al., 2016]
Soft-weight sharing (SWS) [Ullrich et al., 2017]
Sparse VD [Molchanov et al., 2017]
Experiment-Application to AlexNet-Caltech
OLMP can effectively compress

conventional DNNs
Experiment – OLMP without iterative pruning
Conclusion
• Conventional Layer-wise magnitude-based pruning needs to tune the layer-specific
thresholds manually
Hard for end-users with limited expertise
• OLMP tune the thresholds automatically

• Formulate as an optimization problem
• Use derivate-free optimization algorithm
• New compressional pipeline

• Iterative OLMP and adjusting
• Adjusting contains incorrect repairing
• Empirical results show the effectiveness

OLMP Lab6

Uploaded by

Copyright:

Available Formats

OLMP Lab6

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

OLMP Lab6

Uploaded by

Copyright:

Available Formats

Artificial Intelligence (CS303)

Steps to DNN compression

• Input: 𝑥" from the i-th neuron

• Weights: connection weights (synapse)

• Output: 𝑜$ = 𝜑(∑+")* 𝑤"$ 𝑥" − 𝜃$ )

• One neuron can be considered as logistic regression

• Output: 𝑜$ = 𝜑(∑+")* 𝑤"$ 𝑥" − 𝜃$ )

• Ideal activation function: step function but inapplicable

• Consist of multiple artificial neurons

• Usually have the structure of an input layer,

• The design of an NN or AutoML aims to

3-layer Feedforward neural networks

𝑥 → 𝑎(*) → 𝑧 (*) → 𝑎(3) → 𝑧 (3) → 𝑎(4) → 𝑦

• W and Threshold values decide the output of NN

• Shallow NN vs. Deep NNs

• Anti-interference ability, such as different sizes, digital distortion

AlexNet [Krizhevsky et al., 2012] Transformer [Vaswani et al., 2017]

• Pruning (e.g. LMP and OLMP)

𝜀= is the threshold for layer 𝐿

• The solution space for 𝜀* , 𝜀3 , … , 𝜀` can be very large

• The evaluation of candidate thresholds is time consuming

*NCS can be substituted by

• Fitness function definition

Eq. (2): 𝐿𝑀𝑃 𝑊, {𝜀* , 𝜀3 , … , 𝜀` } = ⋃`=)* 𝑀𝑃 𝑊, 𝑙, 𝜀= ,

Original model Final model

① The 𝑊 w.r.t† Eq. (1) Model to be pruned

② Solve Eq. (3) for 𝜺∗ OLMP

④ Use Dynamic surgery

OLMP achieves the best

Iterative pruning and retraining (ITR) [Han et al., 2015]

OLMP can effectively compress

Hard for end-users with limited expertise

• OLMP tune the thresholds automatically

• New compressional pipeline

• Empirical results show the effectiveness

You might also like

𝑥 → 𝑎() → 𝑧 () → 𝑎(3) → 𝑧 (3) → 𝑎(4) → 𝑦