4b Image Processing
4b Image Processing
Image Processing
COMP9444 Week 4b
Sonit Singh
School of Computer Science and Engineering
Faculty of Engineering
The University of New South Wales, Sydney, Australia
[email protected]
Agenda
Ø Convolutional Neural Networks
Ø Why training Deep Neural Networks is hard?
Ø DNN training strategy
Ø Transfer Learning
Ø Overfitting and Underfitting
Ø Methods to avoid overfitting
Ø Data Augmentation
Ø Regularization
Ø Data Preprocessing
Ø Batch Normalization
Ø Choice of optimizers
Ø Tuning DNNs hyperparameters
Ø Neural Style Transfer
2
Convolutional Neural Networks (CNNs)
Ø A class of deep neural networks suitable for processing 2D/3D data. For e.g., Images and
Videos
Ø CNNs can capture high-level representation of images/videos which can be used for end-
tasks such as classification, object detection, segmentation, etc.
Ø A range of CNNs improving over the years
.
6
AlexNet
Ø 650K neurons
Ø 630M connections
Ø 60M parameters
7
Enhancements
Ø Rectified Linear Units (ReLUs)
Ø Overlapping pooling (Width = 3, stride = 2)
Ø Stochastic gradient descent with momentum and weight decay
Ø Data augmentation to reduce overfitting
Ø 50% dropout in the fully connected layers
8
Dealing with Deep Networks
Ø > 10 layers
Ø weight initialization
Ø batch normalization
Ø > 30 layers
Ø skip connections
Ø If we simply stack additional layers, it can lead to higher training error as well as higher test
error
Ø Idea: Take any two consecutive stacked layers in a deep network and add a “skip”
connection which bypasses these layers and is added to their output.
Ø the preceding layers attempt to do the “whole” job, making x as close as possible to the
target output of the entire network
Ø F(x) is a residual component which corrects the errors from previous layers, or provides
additional details which the previous layers were not powerful enough to compute
Ø With skip connections, both training and test error drop as you add more layers
Ø With more than 100 layers, need to apply ReLU before adding the residual instead of
afterwards. This is called an identity skip connection.
19
Dense Networks
Ø Good results have been achieved using networks with densely connected blocks, within
which each layer is connected by shortcut connections to all the preceding layers.
20
VGG
Ø Developed at Visual Geometry Group (Oxford) by Simonyan and Zisserman
Ø 1st runner up (Classification) and Winner (localization) of ILSVRC 2014 competition
Ø VGG-16 comprises of 138 million parameters
Ø VGG-19 comprises of 144 million parameters
Credit: https://itechindia.co/blog/machine-learning-are-companies-in-india-ready-for-it/
26
Why training Deep Neural Networks is hard?
Source: Yamashita R. et al. (2018) Convolutional neural networks: an overview and applications in radiology
28
Transfer Learning
Ø Transfer learning aims to leverage the learned knowledge from a resource-rich domain/task
to help learning a task with not sufficient training data.
Ø Sometimes referred as domain adaptation
Ø The resource-rich domain is known as the source and the low-resource task is known as the
target.
Ø Transfer learning works the best if the model features learned from the source task are
general (i.e., domain-independent)
Source: Yamashita R. et al. (2018) Convolutional neural networks: an overview and applications in radiology
33
Common methods to mitigate overfitting
Ø More training data
Ø Early Stopping
Ø Data Augmentation
Ø Regularization (weight decay, dropout)
Ø Batch normalization
Source: Yamashita R. et al. (2018) Convolutional neural networks: an overview and applications in radiology
34
Image Credit: Hyper-parameters tuning practices: learning rate, batch size, momentum, and weight decay. Medium
More training data
Ø Costly
Ø Time consuming
Ø Need experts for
specialized domains
Source: Fast Annotation Net: A framework for active learning in 2018. https://medium.com/diffgram/fast-annotation-net-a-framework-for-active-learning-in-2018-1c75d6b4af92
35
Image Datasets — ImageNet, PASCAL, TinyImage, ESP and LabelMe — what do they offer ? Medium Blog
Early Stopping
Ø Training too little mean model will underfit on the training and testing sets
Ø Training too much mean model will overfit the training dataset and hence poor performance
on test set
Ø Early Stopping:
Ø To stop training at the point when performance on a validation set starts to degrade.
Ø Idea is to stop training when generalization error increases
Ø How to use Early Stopping
Ø Monitoring model performance: Using metric to evaluate to monitor performance of the model
during training
Ø Trigger to stop training:
Ø No change in metric over a given number of epochs
Ø A decrease in performance observed over a number of epochs
Ø Some delay or “patience” is good for early stopping
Source: Machine Learning Mastery: A Gentle Introduction to Early Stopping to Avoid Overtraining Neural Networks
36
URL: https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/
Data Augmentation
Ø Data augmentation generate different versions of a real dataset artificially to increase its size
Ø We use data augmentation to handle data scarcity and insufficient data diversity
Ø Data augmentation helps to increase performance of deep neural networks
Source: Zhao et al., Differential Augmentation for Data-Efficient GAN Training, NeurIPS, 2020
46
Regularization: Weight Decay
Ø It adds a penalty term to the loss function on the training set to reduce the complexity of the
learned model
Ø Popular choice for weight decay:
Ø L1: The L1 penalty aims to minimize the absolute value of the weights
Ø L2: The L2 penalty aims to minimize the squared magnitude of the weights
Credit: Srivastava et al., Dropout: A Simple Way to Prevent Neural Networks from Overfitting. JMLR, 2014
48
https://colab.research.google.com/github/d2l-ai/d2l-en-colab/blob/master/chapter_multilayer-perceptrons/dropout.ipynb
Data Preprocessing
Ø The pixel values in images must be scaled prior to given as input to deep neural networks for
training or evaluation
Ø Three main types of pixel scaling:
Ø Pixel Normalization: scale pixel values to the range 0-1
Ø Pixel Centering: scale pixel values to have a zero mean
Ø Pixel Standardization: scale pixel values to have a zero mean and unit variance
Source: Towards Data Science. Various Optimization Algorithms For Training Neural Network https://towardsdatascience.com/optimizers-for-training-neural-network-59450d71caf6
51
Tuning Hyperparameters
Ø Hyperparameters are all parameters which can be arbitrarily set by the user before starting
training
Ø Hyperparameters are like knobs or dials of the network (model)
Ø An optimization problem: We aim to find the right combinations of their values which can
help us to find either the minimum (e.g., loss) or the maximum (e.g., accuracy) of a function
Ø Many hyperparameters to tune:
Ø Learning rate
Ø No. of epochs
Ø Dropout rate
Ø Batch size
Ø No. of hidden layers and units
Ø Activation function
Ø Weight initialization
Ø…
Source: Nguyen et al., (2019). ML and DL frameworks and libraries for large-scale data mining: a survey.
54
Texture
Ø Texture is a repeating pattern of local variations in image intensity
Ø Texture provides information in the spatial arrangement of colors or intensities in an image.
Ø Texture is characterized by the spatial distribution of intensity levels in a neighborhood.
Source: https://www.mathworks.com/help/images/texture-segmentation-using-texture-filters.html
55
Texture Synthesis
Source:
56
Neural Texture Synthesis
62
Questions?