0% found this document useful (0 votes)
24 views

4b Image Processing

Uploaded by

jiejialing08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

4b Image Processing

Uploaded by

jiejialing08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

4b.

Image Processing

COMP9444 Week 4b

Sonit Singh
School of Computer Science and Engineering
Faculty of Engineering
The University of New South Wales, Sydney, Australia
[email protected]
Agenda
Ø Convolutional Neural Networks
Ø Why training Deep Neural Networks is hard?
Ø DNN training strategy
Ø Transfer Learning
Ø Overfitting and Underfitting
Ø Methods to avoid overfitting
Ø Data Augmentation
Ø Regularization
Ø Data Preprocessing
Ø Batch Normalization
Ø Choice of optimizers
Ø Tuning DNNs hyperparameters
Ø Neural Style Transfer

2
Convolutional Neural Networks (CNNs)
Ø A class of deep neural networks suitable for processing 2D/3D data. For e.g., Images and
Videos
Ø CNNs can capture high-level representation of images/videos which can be used for end-
tasks such as classification, object detection, segmentation, etc.
Ø A range of CNNs improving over the years

Source: Convolutional Neural Networks. https://medium.com/@rajat.k.91/convolutional-neural-networks-why-what-and-how-f8f6dbebb2f9


3
History
Ø ImageNet (2009)
Ø Consists of 14 million images, more than 21,000 classes, and about 1 million images have
bounding box annotations
Ø Annotated by humans using crowdsourcing platform “Amazon Mechanical Turk”

Ø ImageNet Large-Scale Visual Recognition Challenge (ILSVRC)


Ø annual competition to foster the development and benchmarking of state-of-the-art algorithms
in Computer Vision
Ø Led to improvement in architectures and techniques at the intersection of CV and DL
Image Credit: Synced. https://syncedreview.com/2020/06/23/google-deepmind-researchers-revamp-imagenet/
4
LeNet
Ø First developed by Yann Lecun in 1989 for digit recognition
Ø First time backprop is used to automatically learn visual features
Ø Two convolutional layers, three fully connected layers (32 x 32 input, 6 and 12 FMs, 5 x 5 filters)
Ø Stride = 2 is used to reduce image dimensions
Ø Scaled tanh activation function
Ø Uniform random weight initialization

Source: Lecun et al. (1989). Gradient-based learning applied to document recognition.


5
CNN Architectures

Ø AlexNet, 8 layers (2012)


Ø VGG, 19 layers (2014)
Ø GoogleNet, 22 layers (2014)
Ø ResNets, 152 layers (2015)
Ø DenseNet, 201 layers (2017)
Ø EfficientNet (2019)
Ø EfficientNetV2 (2021)

.
6
AlexNet
Ø 650K neurons
Ø 630M connections
Ø 60M parameters

Ø more parameters than images -------> danger of overfitting

7
Enhancements
Ø Rectified Linear Units (ReLUs)
Ø Overlapping pooling (Width = 3, stride = 2)
Ø Stochastic gradient descent with momentum and weight decay
Ø Data augmentation to reduce overfitting
Ø 50% dropout in the fully connected layers

8
Dealing with Deep Networks
Ø > 10 layers
Ø weight initialization
Ø batch normalization

Ø > 30 layers
Ø skip connections

Ø > 100 layers


Ø identity skip connections

Slide Credit: Alan Blair


9
Statistics Example: Coin Tossing

Slide Credit: Alan Blair


10
Statistics

Slide Credit: Alan Blair


11
Weight Initialization

Slide Credit: Alan Blair


12
Weight Initialization

Slide Credit: Alan Blair


13
Weight Initialization

Slide Credit: Alan Blair


14
Weight Initialization

Slide Credit: Alan Blair


15
Batch Normalization

Slide Credit: Alan Blair


16
Going Deeper

Ø If we simply stack additional layers, it can lead to higher training error as well as higher test
error

Slide Credit: Alan Blair


17
Residual Networks

Ø Idea: Take any two consecutive stacked layers in a deep network and add a “skip”
connection which bypasses these layers and is added to their output.

Slide Credit: Alan Blair


18
Residual Networks

Ø the preceding layers attempt to do the “whole” job, making x as close as possible to the
target output of the entire network

Ø F(x) is a residual component which corrects the errors from previous layers, or provides
additional details which the previous layers were not powerful enough to compute

Ø With skip connections, both training and test error drop as you add more layers

Ø With more than 100 layers, need to apply ReLU before adding the residual instead of
afterwards. This is called an identity skip connection.

19
Dense Networks

Ø Good results have been achieved using networks with densely connected blocks, within
which each layer is connected by shortcut connections to all the preceding layers.

20
VGG
Ø Developed at Visual Geometry Group (Oxford) by Simonyan and Zisserman
Ø 1st runner up (Classification) and Winner (localization) of ILSVRC 2014 competition
Ø VGG-16 comprises of 138 million parameters
Ø VGG-19 comprises of 144 million parameters

Image Credit: Medium. https://medium.com/coinmonks/paper-review-of-vggnet-1st-runner-up-of-ilsvlc-2014-image-classification-d02355543a11


21
GoogLeNet
Ø A 22-layer CNN developed by researchers at Google
Ø Deeper networks prone to overfitting and suffer
from exploding or vanishing gradient problem
Ø Core idea “Inception module”
Ø Adding Auxiliary loss as an extra supervision
Ø Winner of 2014 ILSVRC Challenge

Source: Convolutional Neural Networks. https://medium.com/@rajat.k.91/convolutional-neural-networks-why-what-and-how-f8f6dbebb2f9


22
ResNet
Ø Developed by researchers at Microsoft
Ø Core idea “residual connections” to preserve the gradient
Ø The identity matrix transmits forward the input data that
avoids the loose of information (the data vanishing problem)

Image Credit: Medium. https://medium.com/@pierre_guillou/understand-how-works-resnet-without-talking-about-residual-64698f157e0c


23
DenseNet
Ø In a DenseNet architecture, each layer is connected to every other layer, hence the name
Densely Connected Convolutional Network
Ø For each layer, the feature maps of all the preceding layers are used as inputs, and its own
feature maps are used as input for each subsequent layers
Ø DenseNets have several compelling advantages:
Ø alleviate the vanishing-gradient problem
Ø strengthen feature propagation
Ø encourage feature reuse, and
Ø substantially reduce the number of parameters.

Image Credit: https://pytorch.org/hub/pytorch_vision_densenet/


24
SENet (Squeeze-and-Excitation Network)
Ø CNNs fuse the spatial and channel information to extract features to solve the task
Ø Before this, networks weights each of its channels equally when creating the output feature
maps
Ø SENets added a content aware mechanism to weight each channel adaptively
Ø SE block helps to improve representation power of the network, able to better map the
channel dependency along with access to global information

Source: Convolutional Neural Networks. https://medium.com/@rajat.k.91/convolutional-neural-networks-why-what-and-how-f8f6dbebb2f9


25
Why training Deep Neural Networks is hard?

Credit: https://itechindia.co/blog/machine-learning-are-companies-in-india-ready-for-it/
26
Why training Deep Neural Networks is hard?

Credit: Adrian Rosebrock, PyImageSearch, https://www.pyimagesearch.com/2019/10/14/why-is-my-validation-loss-lower-than-my-training-loss/


27
Training Methodology
Ø steps

Source: Yamashita R. et al. (2018) Convolutional neural networks: an overview and applications in radiology
28
Transfer Learning
Ø Transfer learning aims to leverage the learned knowledge from a resource-rich domain/task
to help learning a task with not sufficient training data.
Ø Sometimes referred as domain adaptation

Ø The resource-rich domain is known as the source and the low-resource task is known as the
target.

Ø Transfer learning works the best if the model features learned from the source task are
general (i.e., domain-independent)

Credit: Mahammadreza Ebrahimi An Introduction to Deep Transfer Learning


29
Transfer Learning with CNNs

Slide Credit: Stanford CS231n Course


30
Transfer Learning with CNNs

Slide Credit: Stanford CS231n Course


31
Transfer Learning is common in all applications

Slide Credit: Stanford CS231n Course


32
Overfitting and Underfitting
Ø Monitor the loss on training and validation sets during the training iteration.
Ø If the model performs poorly on both training and validation sets: Underfitting
Ø If the model performs well on the training set compared to the validation set: Overfitting

Source: Yamashita R. et al. (2018) Convolutional neural networks: an overview and applications in radiology
33
Common methods to mitigate overfitting
Ø More training data
Ø Early Stopping
Ø Data Augmentation
Ø Regularization (weight decay, dropout)
Ø Batch normalization

Source: Yamashita R. et al. (2018) Convolutional neural networks: an overview and applications in radiology
34
Image Credit: Hyper-parameters tuning practices: learning rate, batch size, momentum, and weight decay. Medium
More training data
Ø Costly
Ø Time consuming
Ø Need experts for
specialized domains

Source: Fast Annotation Net: A framework for active learning in 2018. https://medium.com/diffgram/fast-annotation-net-a-framework-for-active-learning-in-2018-1c75d6b4af92
35
Image Datasets — ImageNet, PASCAL, TinyImage, ESP and LabelMe — what do they offer ? Medium Blog
Early Stopping
Ø Training too little mean model will underfit on the training and testing sets
Ø Training too much mean model will overfit the training dataset and hence poor performance
on test set
Ø Early Stopping:
Ø To stop training at the point when performance on a validation set starts to degrade.
Ø Idea is to stop training when generalization error increases
Ø How to use Early Stopping
Ø Monitoring model performance: Using metric to evaluate to monitor performance of the model
during training
Ø Trigger to stop training:
Ø No change in metric over a given number of epochs
Ø A decrease in performance observed over a number of epochs
Ø Some delay or “patience” is good for early stopping

Source: Machine Learning Mastery: A Gentle Introduction to Early Stopping to Avoid Overtraining Neural Networks
36
URL: https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/
Data Augmentation
Ø Data augmentation generate different versions of a real dataset artificially to increase its size
Ø We use data augmentation to handle data scarcity and insufficient data diversity
Ø Data augmentation helps to increase performance of deep neural networks

Ø Common augmentation techniques:


Ø Adding noise
Ø Cropping
Ø Flipping
Ø Rotation
Ø Scaling
Ø Translation
Ø Brightness
Ø Contrast
Ø Saturation
Ø Generative Adversarial Networks (GANs)

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


37
Data Augmentation
Ø Adding noise

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


38
Data Augmentation
Ø Cropping

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


39
Data Augmentation
Ø Flipping

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


40
Data Augmentation
Ø Rotation

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


41
Data Augmentation
Ø Scaling

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


42
Data Augmentation
Ø Translation

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


43
https://nanonets.com/blog/data-augmentation-how-to-use-deep-learning-when-you-have-limited-data-part-2/
Data Augmentation
Ø Brightness

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


44
Data Augmentation
Ø Contrast

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


45
Data Augmentation
Ø Generative Adversarial Networks (GANs) for data augmentation

Source: Zhao et al., Differential Augmentation for Data-Efficient GAN Training, NeurIPS, 2020
46
Regularization: Weight Decay
Ø It adds a penalty term to the loss function on the training set to reduce the complexity of the
learned model
Ø Popular choice for weight decay:
Ø L1: The L1 penalty aims to minimize the absolute value of the weights

Ø L2: The L2 penalty aims to minimize the squared magnitude of the weights

Credit: 5 Techniques to Prevent Overfitting in Neural Networks. https://www.kdnuggets.com/2019/12/5-techniques-prevent-overfitting-neural-networks.html


47
Regularization: Dropout
Ø L1 and L2 reduce overfitting by modifying the cost function
Ø Dropout modify the network by randomly dropping neurons from the neural network during
training
Ø Dropout is an efficient way to average many large neural networks

Credit: Srivastava et al., Dropout: A Simple Way to Prevent Neural Networks from Overfitting. JMLR, 2014
48
https://colab.research.google.com/github/d2l-ai/d2l-en-colab/blob/master/chapter_multilayer-perceptrons/dropout.ipynb
Data Preprocessing
Ø The pixel values in images must be scaled prior to given as input to deep neural networks for
training or evaluation
Ø Three main types of pixel scaling:
Ø Pixel Normalization: scale pixel values to the range 0-1
Ø Pixel Centering: scale pixel values to have a zero mean
Ø Pixel Standardization: scale pixel values to have a zero mean and unit variance

Credit: Stanford CS231n course slides.


49
Machine Learning Mastery: How to Normalize, Center, and Standardize Image Pixels in Keras
Batch Normalization
Ø Enables stable training
Ø Reduces the internal covariate shift (ICS)
Ø Accelerates the training process
Ø Reduces the dependence of gradients on the scale of the parameters

Source: LearnOpenCV: Batch Normalization in Deep Networks. https://learnopencv.com/batch-normalization-in-deep-networks/


50
Choice of Optimizers
Ø Choosing right optimizer helps to update the model parameters and reducing the loss in
much less effort
Ø Most DL frameworks supports various optimizers:
Ø Stochastic Gradient Descent (SGD)
Ø Momentum
Ø Nesterov Accelerated Gradient
Ø AdaGrad
Ø AdaDelta
Ø Adam
Ø RMSProp

Source: Towards Data Science. Various Optimization Algorithms For Training Neural Network https://towardsdatascience.com/optimizers-for-training-neural-network-59450d71caf6
51
Tuning Hyperparameters
Ø Hyperparameters are all parameters which can be arbitrarily set by the user before starting
training
Ø Hyperparameters are like knobs or dials of the network (model)
Ø An optimization problem: We aim to find the right combinations of their values which can
help us to find either the minimum (e.g., loss) or the maximum (e.g., accuracy) of a function
Ø Many hyperparameters to tune:
Ø Learning rate
Ø No. of epochs
Ø Dropout rate
Ø Batch size
Ø No. of hidden layers and units
Ø Activation function
Ø Weight initialization
Ø…

Source: KDnuggets: Practical Hyperparameter Optimization. https://www.kdnuggets.com/2020/02/practical-hyperparameter-optimization.html


52
Tuning Hyperparameters strategies
Ø Random Guess
Ø Simply use values from similar work
Ø Rely on your experience
Ø Training DNNs is part art, part science
Ø With experience sense of what works
and what doesn’t
Ø Still chances of being incorrect (suboptimal performance)
Ø Grid Search
Ø Set up a grid of hyperparameters and train/test model on each of the possible combinations
Ø Automated hyperparameter tuning
Ø Use of Bayesian optimization and Evolutionary Algorithms
Ø Hyperopt: Distributed Asynchronous Hyperparameter Optimization

Source: KDnuggets: Practical Hyperparameter Optimization. https://www.kdnuggets.com/2020/02/practical-hyperparameter-optimization.html


53
Deep Learning Frameworks

Source: Nguyen et al., (2019). ML and DL frameworks and libraries for large-scale data mining: a survey.
54
Texture
Ø Texture is a repeating pattern of local variations in image intensity
Ø Texture provides information in the spatial arrangement of colors or intensities in an image.
Ø Texture is characterized by the spatial distribution of intensity levels in a neighborhood.

Source: https://www.mathworks.com/help/images/texture-segmentation-using-texture-filters.html
55
Texture Synthesis

Source:
56
Neural Texture Synthesis

Slide Credit: Alan Blair


57
Neural Texture Synthesis

Slide Credit: Alan Blair


58
Neural Style Transfer

Content + Style à New image

Slide Credit: Alan Blair


59
Neural Style Transfer

Slide Credit: Alan Blair


60
Neural Style Transfer

Slide Credit: Alan Blair


61
Key takeaways
Ø Continuous improvement in CNN architectures and heuristics (tips and tricks)
Ø always check literature to find state-of-the-art methods
Ø Training methodology
Ø Split data into training (70 %), validation (10 %), and testing (20 %)
Ø Take care of data leakage (e.g., multiple samples of same patients should be in same set)
Ø Check distribution of classes, work on balanced datasets (ideally)
Ø Tune hyperparameters on validation set. Save best model and do inference on test set (once)
Ø Don’t use off-the-shelf model blindly. Do ablation studies to know its working
Ø Data augmentation techniques are not standardized
Ø Get input from experts to know what data augmentations make sense in the domain
Ø For e.g., in chest X-rays we don’t want vertical flipping
Ø Results
Ø Use multiple metrics rather a single metric to report results (often they are complementary)
Ø Show both qualitative and quantitative results (e.g., image segmentation)

62
Questions?

If you have any questions, post it on the Ed forum

You might also like