0% found this document useful (0 votes)
29 views63 pages

8 Deep Learning CNN

The document provides an overview of deep learning, particularly focusing on convolutional neural networks (CNNs) and their applications in image processing. It discusses various frameworks like TensorFlow and PyTorch, architectural components of CNNs, and techniques to address challenges such as overfitting. Additionally, it includes references for further reading and homework assignments for practical implementation of CNNs.

Uploaded by

Matei Dinu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views63 pages

8 Deep Learning CNN

The document provides an overview of deep learning, particularly focusing on convolutional neural networks (CNNs) and their applications in image processing. It discusses various frameworks like TensorFlow and PyTorch, architectural components of CNNs, and techniques to address challenges such as overfitting. Additionally, it includes references for further reading and homework assignments for practical implementation of CNNs.

Uploaded by

Matei Dinu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Deep learning for image processing.

Convolutional neural networks

Ruxandra Stoean
Further bibliography
 Ian Goodfellow et al, Deep Learning (Adaptive Computation and Machine
Learning series), MIT Press, 2016 http://www.deeplearningbook.org/
 John Kelleher, Deep Learning (The MIT Press Essential Knowledge series),
2019
 Charu Aggarwal, Neural Networks and Deep Learning: A Textbook 2nd ed,
Springer, 2023
 Aston Zhang et al, Dive into Deep Learning, Cambridge University Press,
2023
 Simon Prince, Understanding Deep Learning, The MIT Press, 2023
https://www.kdnuggets.com/2017/08/first-steps-learning-deep-learning-image-classification-keras.html
Definitions
 “For most flavors of the old generations of learning algorithms … performance will plateau. … deep
learning … is the first class of algorithms … that is scalable. … performance just keeps getting better as
you feed them more data.” - Andrew Ng

 “The hierarchy of concepts allows the computer to learn complicated concepts by building them out of
simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep,
with many layers. For this reason, we call this approach to AI deep learning.” – Ian Goodfellow

 “Deep learning [is] … a pipeline of modules all of which are trainable. … deep because [has] multiple
stages in the process of recognizing an object and all of those stages are part of the training” -Yann LeCun

 “At which problem depth does Shallow Learning end, and Deep Learning begin? Discussions with DL experts
have not yet yielded a conclusive response to this question. […], let me just define for the purposes of this
overview: problems of depth > 10 requireVery Deep Learning.” - Jurgen Schmidhuber
https://machinelearningmastery.com/what-is-deep-learning/
“The more I read, the more I acquire,
the more certain I am that I know nothing.”
― Voltaire
Frameworks for implementing deep learning
Tensorflow Pytorch

Developed by Google Developed by Facebook

Visualization with Tensorboard Efficient memory usage

Simple high-level API Python low-level coding

Mature library Young library, but increasing in


popularity
Tensorflow vs. Pytorch

https://paperswithcode.com/trends
Tensorflow vs. Pytorch

https://trends.google.com/trends/explore?date=2018-08-30%202023-09-30&q=pytorch,tensorflow&hl=en
Working with Tensorflow/Keras under R
 Installation:
 Install Rtools4 from https://cran.r-
project.org/bin/windows/Rtools/rtools40.html
 Install.packages(“dplyr”)
 Library(devtools)
 Install_github(“rstudio/reticulate”)
 Install.packages(“keras”) for Tensorflow and Keras

 Use RStudio
Image processing. Convolutional neural networks
Convolutional neural networks
(CNN)

https://www.quora.com/What-are-some-good-data-science-statistics-machine-learning-jokes
CNN
 Automatic feature learning: from low-
level to high-level
 Important applications in computer vision
 Classification – There is a building in this
image
 Semantic Segmentation – These are the
pixels of the buildings
 Object Detection – There are buildings in
this image
 Instance Segmentation – There are several
instances of buildings in this image and
these are the pixels of each
Architecture
 Layers
 Feature learning
 Convolution
 ReLU
 Pooling
 Classification
 Fully connected

https://medium.com/@RaghavPrabhu/understanding-of-
convolutional-neural-network-cnn-deep-learning-99760835f148
Convolution
 Input
 Volume of Width x Height x Depth
 Kernel (or filter) - a shared set of weights
 Forward pass – convolution between filter
and input volume
 Result: KD activation maps of size KSxKS
stacked in a new volume
 Four hyperparameters:
 Kernel size (KS)
 Kernel depth (number of filters) (KD)
 Stride
 Zero-padding http://cs231n.github.io/convolutional-networks/
https://www.youtube.com/watch?v=AQirPKrAyDg
ReLU & Pooling
 Rectified Linear Unit – transfer layer for nonlinearity
 (Max) Pooling – downsamples the volume by taking the (maximum) value
 Window sizes
 Stride

http://cs231n.github.io/convolutional-networks/
https://www.youtube.com/watch?v=AQirPKrAyDg
The practice
 Choice of the proper architecture
 Parameter dependence on the problem
 Large runtime
Machine Learning Memes for Convolutional Teens
 Great computing power required
 Small sample size in real-world
 Overfitting
 Difficult model interpretation
 Not plug & play!

https://medium.com/@elmira/deep-
learning-frustration-4c26d69609f2
Parametrization
 Convolutional kernels  Initial weights
 Sizes, depths, strides  Optimizers
 Pooling layers  Learning rate
 Window sizes, strides  Activation functions
 Dropout rates  Number of units in the fully connected layers
 Batch size  Topology
 Number of epochs

 Tuning
 Manual
 Automatic
Overfitting
 When the model is too complex for the data
 Working with real-world problems
 Small sample size
 Ways to combat:
Machine Learning Memes for Convolutional Teens
 Data augmentation
 Flip, rotate, scale, crop, translate, Gaussian noise
 Generative adversarial networks (GAN): one network generates, the other evaluates
 Dropout layer
 Regularization
 Weight penalty (decay) L1 and L2
 Early stopping
 Use of checkpoints to save the model each epoch
 Pick the best candidate from validation result after last epoch
Image classification
MNIST digits classification in Keras/TF under R
 A 2-layer CNN pentru recunoasterea cifrelor MNIST
 http://yann.lecun.com/exdb/mnist/
Real-time training history
Plot training
history
Visualize test predictions
Accuracy:
98.79%
CNN for CIFAR-10 in Tensorflow/Keras
 https://www.cs.toronto.edu/~kriz/cifar.html
 A data set of 60000 images, size 32x32
 Real-world pictures
 10 classes, 6000 images per class
 50000 training data, 10000 test data
TensorBoard
 Real time and storable measurements/visualizations during the machine learning
flow

 !Delete contents of “\AppData\Local\Temp\.tensorboard-info“ to be able to


launch new TB session!
 See TB directly in notebook
Prediction on test
DeepExplainer
 Determines SHAP values by an improved version of the DeepLIFT algorithm
 DeepLIFT backpropagates the neuron contributions to the input features
 Contribution (positive or negative) = difference between neuron activation and reference
activation

 The expectations are estimated on the base of selected background examples


 100 samples represent a good estimate; 1000 a very good one, but costly

 Interpretation:
 Red pixels (positive) denote the features whose presence is significant for the class
 Blue pixels (negative) show the features whose absence is significant for the class
Transfer learning
 Deep learning needs
 Big data
 Big computing resources
 Take the parameters from an already trained network on a large data set
 Data: ImageNet, CIFAR
 Pre-trained models: VGG, Inception, AlexNet, ResNet
 Initial layers have learnt general features
 Train the final layers for the problem at hand
 Learn the specific features of the current data
 Hence
 Resolve problem with scarce available data
 Less parameters to train
Top

VGG-19 Classification

 Common choice for transfer learning


 19 layers (16 convolutional + 3 feed-forward)
 Top layers are the last layers added to the
network

Feature
learning
Semantic segmentation
Competing architectures
U-Net Mask R-CNN

Fully CNN Extension of Faster Regional CNN

Encoder-decoder architecture Object detection in found region proposals

Skip connections for direct transfer Mask prediction in parallel to detection


of features from encoder to decoder through a CNN

Designed for fine-grained detailed General purpose image segmentation


biomedical segmentation
U-Net
 Addresses pixel-wise
representations of the
images
 by downsampling
(encoder)
 and upsampling
(decoder)
 Skip connections – send
information at the same
size

https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
U-Net with ResNet34 on CamVid in Pytorch
 CamVid – frames from video of cars and pedestrians
 https://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/

 Batch normalization
 follows convolutional layers and before the ReLU layers
 allows higher learning rates and reduces the strong dependency on initialization
 Self attention layers
 Pixel accuracy as performance metric
 Mish activation function, Ranger optimizer

 First time:
 !pip install fastai wwf -q –upgrade
.
.
Homework
 Implement a direct convolutional neural network for Fashion MNIST
https://keras.rstudio.com/reference/dataset_fashion_mnist.html

 Implement a pretrained architecture (e.g. ResNet, Inception) for CIFAR-100


https://keras.io/api/datasets/cifar100/

 (Optional) Investigate and create a YOLO model for object detection on the
data set of your choice

You might also like