0% found this document useful (0 votes)
16 views

4b Image Processing

Uploaded by

jiejialing08
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

4b Image Processing

Uploaded by

jiejialing08
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

4b.

Image Processing

COMP9444 Week 4b

Sonit Singh
School of Computer Science and Engineering
Faculty of Engineering
The University of New South Wales, Sydney, Australia
[email protected]
Agenda
Ø Convolutional Neural Networks
Ø Why training Deep Neural Networks is hard?
Ø DNN training strategy
Ø Transfer Learning
Ø Overfitting and Underfitting
Ø Methods to avoid overfitting
Ø Data Augmentation
Ø Regularization
Ø Data Preprocessing
Ø Batch Normalization
Ø Choice of optimizers
Ø Tuning DNNs hyperparameters
Ø Neural Style Transfer

2
Convolutional Neural Networks (CNNs)
Ø A class of deep neural networks suitable for processing 2D/3D data. For e.g., Images and
Videos
Ø CNNs can capture high-level representation of images/videos which can be used for end-
tasks such as classification, object detection, segmentation, etc.
Ø A range of CNNs improving over the years

Source: Convolutional Neural Networks. https://medium.com/@rajat.k.91/convolutional-neural-networks-why-what-and-how-f8f6dbebb2f9


3
History
Ø ImageNet (2009)
Ø Consists of 14 million images, more than 21,000 classes, and about 1 million images have
bounding box annotations
Ø Annotated by humans using crowdsourcing platform “Amazon Mechanical Turk”

Ø ImageNet Large-Scale Visual Recognition Challenge (ILSVRC)


Ø annual competition to foster the development and benchmarking of state-of-the-art algorithms
in Computer Vision
Ø Led to improvement in architectures and techniques at the intersection of CV and DL
Image Credit: Synced. https://syncedreview.com/2020/06/23/google-deepmind-researchers-revamp-imagenet/
4
LeNet
Ø First developed by Yann Lecun in 1989 for digit recognition
Ø First time backprop is used to automatically learn visual features
Ø Two convolutional layers, three fully connected layers (32 x 32 input, 6 and 12 FMs, 5 x 5 filters)
Ø Stride = 2 is used to reduce image dimensions
Ø Scaled tanh activation function
Ø Uniform random weight initialization

Source: Lecun et al. (1989). Gradient-based learning applied to document recognition.


5
CNN Architectures

Ø AlexNet, 8 layers (2012)


Ø VGG, 19 layers (2014)
Ø GoogleNet, 22 layers (2014)
Ø ResNets, 152 layers (2015)
Ø DenseNet, 201 layers (2017)
Ø EfficientNet (2019)
Ø EfficientNetV2 (2021)

.
6
AlexNet
Ø 650K neurons
Ø 630M connections
Ø 60M parameters

Ø more parameters than images -------> danger of overfitting

7
Enhancements
Ø Rectified Linear Units (ReLUs)
Ø Overlapping pooling (Width = 3, stride = 2)
Ø Stochastic gradient descent with momentum and weight decay
Ø Data augmentation to reduce overfitting
Ø 50% dropout in the fully connected layers

8
Dealing with Deep Networks
Ø > 10 layers
Ø weight initialization
Ø batch normalization

Ø > 30 layers
Ø skip connections

Ø > 100 layers


Ø identity skip connections

Slide Credit: Alan Blair


9
Statistics Example: Coin Tossing

Slide Credit: Alan Blair


10
Statistics

Slide Credit: Alan Blair


11
Weight Initialization

Slide Credit: Alan Blair


12
Weight Initialization

Slide Credit: Alan Blair


13
Weight Initialization

Slide Credit: Alan Blair


14
Weight Initialization

Slide Credit: Alan Blair


15
Batch Normalization

Slide Credit: Alan Blair


16
Going Deeper

Ø If we simply stack additional layers, it can lead to higher training error as well as higher test
error

Slide Credit: Alan Blair


17
Residual Networks

Ø Idea: Take any two consecutive stacked layers in a deep network and add a “skip”
connection which bypasses these layers and is added to their output.

Slide Credit: Alan Blair


18
Residual Networks

Ø the preceding layers attempt to do the “whole” job, making x as close as possible to the
target output of the entire network

Ø F(x) is a residual component which corrects the errors from previous layers, or provides
additional details which the previous layers were not powerful enough to compute

Ø With skip connections, both training and test error drop as you add more layers

Ø With more than 100 layers, need to apply ReLU before adding the residual instead of
afterwards. This is called an identity skip connection.

19
Dense Networks

Ø Good results have been achieved using networks with densely connected blocks, within
which each layer is connected by shortcut connections to all the preceding layers.

20
VGG
Ø Developed at Visual Geometry Group (Oxford) by Simonyan and Zisserman
Ø 1st runner up (Classification) and Winner (localization) of ILSVRC 2014 competition
Ø VGG-16 comprises of 138 million parameters
Ø VGG-19 comprises of 144 million parameters

Image Credit: Medium. https://medium.com/coinmonks/paper-review-of-vggnet-1st-runner-up-of-ilsvlc-2014-image-classification-d02355543a11


21
GoogLeNet
Ø A 22-layer CNN developed by researchers at Google
Ø Deeper networks prone to overfitting and suffer
from exploding or vanishing gradient problem
Ø Core idea “Inception module”
Ø Adding Auxiliary loss as an extra supervision
Ø Winner of 2014 ILSVRC Challenge

Source: Convolutional Neural Networks. https://medium.com/@rajat.k.91/convolutional-neural-networks-why-what-and-how-f8f6dbebb2f9


22
ResNet
Ø Developed by researchers at Microsoft
Ø Core idea “residual connections” to preserve the gradient
Ø The identity matrix transmits forward the input data that
avoids the loose of information (the data vanishing problem)

Image Credit: Medium. https://medium.com/@pierre_guillou/understand-how-works-resnet-without-talking-about-residual-64698f157e0c


23
DenseNet
Ø In a DenseNet architecture, each layer is connected to every other layer, hence the name
Densely Connected Convolutional Network
Ø For each layer, the feature maps of all the preceding layers are used as inputs, and its own
feature maps are used as input for each subsequent layers
Ø DenseNets have several compelling advantages:
Ø alleviate the vanishing-gradient problem
Ø strengthen feature propagation
Ø encourage feature reuse, and
Ø substantially reduce the number of parameters.

Image Credit: https://pytorch.org/hub/pytorch_vision_densenet/


24
SENet (Squeeze-and-Excitation Network)
Ø CNNs fuse the spatial and channel information to extract features to solve the task
Ø Before this, networks weights each of its channels equally when creating the output feature
maps
Ø SENets added a content aware mechanism to weight each channel adaptively
Ø SE block helps to improve representation power of the network, able to better map the
channel dependency along with access to global information

Source: Convolutional Neural Networks. https://medium.com/@rajat.k.91/convolutional-neural-networks-why-what-and-how-f8f6dbebb2f9


25
Why training Deep Neural Networks is hard?

Credit: https://itechindia.co/blog/machine-learning-are-companies-in-india-ready-for-it/
26
Why training Deep Neural Networks is hard?

Credit: Adrian Rosebrock, PyImageSearch, https://www.pyimagesearch.com/2019/10/14/why-is-my-validation-loss-lower-than-my-training-loss/


27
Training Methodology
Ø steps

Source: Yamashita R. et al. (2018) Convolutional neural networks: an overview and applications in radiology
28
Transfer Learning
Ø Transfer learning aims to leverage the learned knowledge from a resource-rich domain/task
to help learning a task with not sufficient training data.
Ø Sometimes referred as domain adaptation

Ø The resource-rich domain is known as the source and the low-resource task is known as the
target.

Ø Transfer learning works the best if the model features learned from the source task are
general (i.e., domain-independent)

Credit: Mahammadreza Ebrahimi An Introduction to Deep Transfer Learning


29
Transfer Learning with CNNs

Slide Credit: Stanford CS231n Course


30
Transfer Learning with CNNs

Slide Credit: Stanford CS231n Course


31
Transfer Learning is common in all applications

Slide Credit: Stanford CS231n Course


32
Overfitting and Underfitting
Ø Monitor the loss on training and validation sets during the training iteration.
Ø If the model performs poorly on both training and validation sets: Underfitting
Ø If the model performs well on the training set compared to the validation set: Overfitting

Source: Yamashita R. et al. (2018) Convolutional neural networks: an overview and applications in radiology
33
Common methods to mitigate overfitting
Ø More training data
Ø Early Stopping
Ø Data Augmentation
Ø Regularization (weight decay, dropout)
Ø Batch normalization

Source: Yamashita R. et al. (2018) Convolutional neural networks: an overview and applications in radiology
34
Image Credit: Hyper-parameters tuning practices: learning rate, batch size, momentum, and weight decay. Medium
More training data
Ø Costly
Ø Time consuming
Ø Need experts for
specialized domains

Source: Fast Annotation Net: A framework for active learning in 2018. https://medium.com/diffgram/fast-annotation-net-a-framework-for-active-learning-in-2018-1c75d6b4af92
35
Image Datasets — ImageNet, PASCAL, TinyImage, ESP and LabelMe — what do they offer ? Medium Blog
Early Stopping
Ø Training too little mean model will underfit on the training and testing sets
Ø Training too much mean model will overfit the training dataset and hence poor performance
on test set
Ø Early Stopping:
Ø To stop training at the point when performance on a validation set starts to degrade.
Ø Idea is to stop training when generalization error increases
Ø How to use Early Stopping
Ø Monitoring model performance: Using metric to evaluate to monitor performance of the model
during training
Ø Trigger to stop training:
Ø No change in metric over a given number of epochs
Ø A decrease in performance observed over a number of epochs
Ø Some delay or “patience” is good for early stopping

Source: Machine Learning Mastery: A Gentle Introduction to Early Stopping to Avoid Overtraining Neural Networks
36
URL: https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/
Data Augmentation
Ø Data augmentation generate different versions of a real dataset artificially to increase its size
Ø We use data augmentation to handle data scarcity and insufficient data diversity
Ø Data augmentation helps to increase performance of deep neural networks

Ø Common augmentation techniques:


Ø Adding noise
Ø Cropping
Ø Flipping
Ø Rotation
Ø Scaling
Ø Translation
Ø Brightness
Ø Contrast
Ø Saturation
Ø Generative Adversarial Networks (GANs)

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


37
Data Augmentation
Ø Adding noise

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


38
Data Augmentation
Ø Cropping

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


39
Data Augmentation
Ø Flipping

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


40
Data Augmentation
Ø Rotation

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


41
Data Augmentation
Ø Scaling

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


42
Data Augmentation
Ø Translation

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


43
https://nanonets.com/blog/data-augmentation-how-to-use-deep-learning-when-you-have-limited-data-part-2/
Data Augmentation
Ø Brightness

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


44
Data Augmentation
Ø Contrast

Source: 13 Data Augmentation Techniques. https://research.aimultiple.com/data-augmentation-techniques/


45
Data Augmentation
Ø Generative Adversarial Networks (GANs) for data augmentation

Source: Zhao et al., Differential Augmentation for Data-Efficient GAN Training, NeurIPS, 2020
46
Regularization: Weight Decay
Ø It adds a penalty term to the loss function on the training set to reduce the complexity of the
learned model
Ø Popular choice for weight decay:
Ø L1: The L1 penalty aims to minimize the absolute value of the weights

Ø L2: The L2 penalty aims to minimize the squared magnitude of the weights

Credit: 5 Techniques to Prevent Overfitting in Neural Networks. https://www.kdnuggets.com/2019/12/5-techniques-prevent-overfitting-neural-networks.html


47
Regularization: Dropout
Ø L1 and L2 reduce overfitting by modifying the cost function
Ø Dropout modify the network by randomly dropping neurons from the neural network during
training
Ø Dropout is an efficient way to average many large neural networks

Credit: Srivastava et al., Dropout: A Simple Way to Prevent Neural Networks from Overfitting. JMLR, 2014
48
https://colab.research.google.com/github/d2l-ai/d2l-en-colab/blob/master/chapter_multilayer-perceptrons/dropout.ipynb
Data Preprocessing
Ø The pixel values in images must be scaled prior to given as input to deep neural networks for
training or evaluation
Ø Three main types of pixel scaling:
Ø Pixel Normalization: scale pixel values to the range 0-1
Ø Pixel Centering: scale pixel values to have a zero mean
Ø Pixel Standardization: scale pixel values to have a zero mean and unit variance

Credit: Stanford CS231n course slides.


49
Machine Learning Mastery: How to Normalize, Center, and Standardize Image Pixels in Keras
Batch Normalization
Ø Enables stable training
Ø Reduces the internal covariate shift (ICS)
Ø Accelerates the training process
Ø Reduces the dependence of gradients on the scale of the parameters

Source: LearnOpenCV: Batch Normalization in Deep Networks. https://learnopencv.com/batch-normalization-in-deep-networks/


50
Choice of Optimizers
Ø Choosing right optimizer helps to update the model parameters and reducing the loss in
much less effort
Ø Most DL frameworks supports various optimizers:
Ø Stochastic Gradient Descent (SGD)
Ø Momentum
Ø Nesterov Accelerated Gradient
Ø AdaGrad
Ø AdaDelta
Ø Adam
Ø RMSProp

Source: Towards Data Science. Various Optimization Algorithms For Training Neural Network https://towardsdatascience.com/optimizers-for-training-neural-network-59450d71caf6
51
Tuning Hyperparameters
Ø Hyperparameters are all parameters which can be arbitrarily set by the user before starting
training
Ø Hyperparameters are like knobs or dials of the network (model)
Ø An optimization problem: We aim to find the right combinations of their values which can
help us to find either the minimum (e.g., loss) or the maximum (e.g., accuracy) of a function
Ø Many hyperparameters to tune:
Ø Learning rate
Ø No. of epochs
Ø Dropout rate
Ø Batch size
Ø No. of hidden layers and units
Ø Activation function
Ø Weight initialization
Ø…

Source: KDnuggets: Practical Hyperparameter Optimization. https://www.kdnuggets.com/2020/02/practical-hyperparameter-optimization.html


52
Tuning Hyperparameters strategies
Ø Random Guess
Ø Simply use values from similar work
Ø Rely on your experience
Ø Training DNNs is part art, part science
Ø With experience sense of what works
and what doesn’t
Ø Still chances of being incorrect (suboptimal performance)
Ø Grid Search
Ø Set up a grid of hyperparameters and train/test model on each of the possible combinations
Ø Automated hyperparameter tuning
Ø Use of Bayesian optimization and Evolutionary Algorithms
Ø Hyperopt: Distributed Asynchronous Hyperparameter Optimization

Source: KDnuggets: Practical Hyperparameter Optimization. https://www.kdnuggets.com/2020/02/practical-hyperparameter-optimization.html


53
Deep Learning Frameworks

Source: Nguyen et al., (2019). ML and DL frameworks and libraries for large-scale data mining: a survey.
54
Texture
Ø Texture is a repeating pattern of local variations in image intensity
Ø Texture provides information in the spatial arrangement of colors or intensities in an image.
Ø Texture is characterized by the spatial distribution of intensity levels in a neighborhood.

Source: https://www.mathworks.com/help/images/texture-segmentation-using-texture-filters.html
55
Texture Synthesis

Source:
56
Neural Texture Synthesis

Slide Credit: Alan Blair


57
Neural Texture Synthesis

Slide Credit: Alan Blair


58
Neural Style Transfer

Content + Style à New image

Slide Credit: Alan Blair


59
Neural Style Transfer

Slide Credit: Alan Blair


60
Neural Style Transfer

Slide Credit: Alan Blair


61
Key takeaways
Ø Continuous improvement in CNN architectures and heuristics (tips and tricks)
Ø always check literature to find state-of-the-art methods
Ø Training methodology
Ø Split data into training (70 %), validation (10 %), and testing (20 %)
Ø Take care of data leakage (e.g., multiple samples of same patients should be in same set)
Ø Check distribution of classes, work on balanced datasets (ideally)
Ø Tune hyperparameters on validation set. Save best model and do inference on test set (once)
Ø Don’t use off-the-shelf model blindly. Do ablation studies to know its working
Ø Data augmentation techniques are not standardized
Ø Get input from experts to know what data augmentations make sense in the domain
Ø For e.g., in chest X-rays we don’t want vertical flipping
Ø Results
Ø Use multiple metrics rather a single metric to report results (often they are complementary)
Ø Show both qualitative and quantitative results (e.g., image segmentation)

62
Questions?

If you have any questions, post it on the Ed forum

You might also like