0% found this document useful (0 votes)
31 views21 pages

18 - Computational Complexity

This document discusses computational complexity and optimization of machine learning models. It covers topics like model parameters, size, operations counts like FLOPs and MACs, inference time, and techniques for optimizing models like reducing parameters, operations, quantization, and knowledge distillation.

Uploaded by

Fairooz Toroshe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views21 pages

18 - Computational Complexity

This document discusses computational complexity and optimization of machine learning models. It covers topics like model parameters, size, operations counts like FLOPs and MACs, inference time, and techniques for optimizing models like reducing parameters, operations, quantization, and knowledge distillation.

Uploaded by

Fairooz Toroshe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Computational Complexity

Chittagong University of Engineering & Technology

Submitted To: Submitted By:


Dr. Kaushik Deb Md . Al-Mamun Provath
Professor 22MCSE004
Dept. of CSE, CUET
Contents
 Model Parameters

 Model Size

 FLOPs

 FLOPS

 MACs

 Inference Time

2 Department of CSE, CUET


Model Parameters
 Model parameters
 configuration settings that determine behavior and predictive capabilities
 learns from the training data
 adjusts during the training process to minimize loss function

 Learnable Parameters
 weights and biases in the model

 Model parameters influence the model's ability to generalize and make accurate
predictions
 When the model parameters are set optimally, the model fits the training data well
and generalizes to unseen data effectively
 If the parameters are poorly chosen, the model may overfit or underfit

3 Department of CSE, CUET


Model Parameters
 The ideal number of parameters depends on several factors:

 Data availability
 More data the more complex a model to use
 Insufficient data with a complex model lead to overfitting

 Model Complexity
 Simple problems addressed by less complex models
 Complex problems requires large number of parameter

 Computational Resources
 Training models with a large number of parameters computationally
expensive
 Having limited computational resources, need to use smaller models

4 Department of CSE, CUET


Calculating Model Parameters
 Feed Forward Neural Network (Dense Layer )
For one hidden layer,

#Parameters= connections between layers + biases in every layer

Between input to hidden unit:


3 X 5 + 5 = 20

Between hidden unit to output:


5 X 2 + 2 = 12

5 Department of CSE, CUET


Calculating Model Parameters
 Feed Forward Neural Network (Dense Layer )

6 Department of CSE, CUET


Calculating Model Parameter
 Feed Forward Neural Network (Dense Layer )

7 Department of CSE, CUET


Calculating Model Parameters
 Feed Forward Neural Network (Dense Layer )

8 Department of CSE, CUET


Calculating Model Parameters
 CNNs

#Parameters = ( Filter height X Filter width X input_image channels + 1) X Number of Filters

RGB image with 2×2


filter, output of 1
channel

9 Department of CSE, CUET


Calculating Model Parameters
 CNNs

#Parameters = ( Filter height X Filter width X input_image channels + 1) X Number of Filters

RGB image with 2×2


filter, output of 1
channel

10 Department of CSE, CUET


Model Size
 Model size measures the storage for the weights of the given neural network
 The common units for model size are: MB (megabyte), KB (kilobyte), bits.
 In general, if the whole neural network uses the same data type (e.g., floating-point),

Model Size = #Parameters • Bit Width

 Example: AlexNet has 61M parameters

 If all weights are stored with 32-bit numbers, total storage will be about
61M × 4 Bytes (32 bits) = 224 MB (224 × 10° Bytes)

 If all weights are stored with 8-bit numbers, total storage will be about
61M × 1 Byte (8 bits) = 61 MB

11 Department of CSE, CUET


FLOPs, FLOPS, MACs
 FLOPs
 Floating Point Operations
 the total number of computations the model will have to perform
 addition, subtraction, division, multiplication, or any other operation
that involves a floating point value
 FLOPS
 Floating Point Operations per Second
 tells us how good is hardware
 The more operations per second we can do, the faster the inference will be
 MACs
 Multiply-Accumulate Computations
 A MAC is an operation that does an addition and a multiplication,
so 2 operations
 Generally, 1 MAC ≈ 2 FLOPs

12 Department of CSE, CUET


FLOPs, FLOPS, MACs
General idea

 We want a low number of FLOPs in our model, but keeping


it complex enough to be good

 We want a high number of FLOPS in our hardware

 Our role will be to optimize the Deep Learning models to


have a low number of FLOPs

13 Department of CSE, CUET


Calculating FLOPs
Let's take the following model that performs a classification on the MNIST dataset

 The Input Image is of size 28x28x1 (grayscale)

 We run 2 Convolutions of 5 kernels of size (3x3)

 We run a Fully Connected Layer of 128 Neurons

 We finish with a Fully Connected Layer of 10 Neurons: 1 per digit.

14 Department of CSE, CUET


Calculating FLOPs

15 Department of CSE, CUET


Calculating MACs

Where, C = channel, k = kernel,


ℎ𝑜 = height of output ,
𝑤𝑜 = width of output

• MACs = 96 X 3 X 11 X 11 X 55 X 55
=105,415,200

g = number of groups

16 Department of CSE, CUET


Inference Time
 How long is takes for a forward propagation

The inference time will be FLOPs/FLOPS

 Suppose, FLOPs = 1,060,400


 CPU performs 1 GFLOPS

 Inference time = (1,060,400)/(1,000,000,000) = 0,001 s or 1ms.

17 Department of CSE, CUET


Model Optimization
 2 main ways to optimize a neural network:
1. Reducing model size
2. Reducing number of operations

 Reducing number of operations:

 Pooling
-subsampling layers

 Separable Convolutions
-don't change the depth, reducing the number of FLOPs
-a pointwise convolution is a 1x1 convolution

 Model Pruning
-redundant network parameters are removed

18 Department of CSE, CUET


Model Optimization
 2 main ways to optimize a neural network:
1. Reducing model size
2. Reducing number of operations

 Reducing model size:


 Quantization
 mapping values from a larger set to a smaller one
 Quantization can be done on weights, and on activations
 These both reduces memory and complexity of computations

 Weight Sharing
 share the weights between neuron
 so we have less of them to store

19 Department of CSE, CUET


Model Optimization
 Knowledge Distillation
 try to transfer the knowledge learned by a large, accurate model (the teacher model) to
a smaller and computationally less expensive model (the student model)

20 Department of CSE, CUET


The End

Thank You

21 Department of CSE, CUET

You might also like