8 Deep Learning CNN
8 Deep Learning CNN
Ruxandra Stoean
Further bibliography
Ian Goodfellow et al, Deep Learning (Adaptive Computation and Machine
Learning series), MIT Press, 2016 http://www.deeplearningbook.org/
John Kelleher, Deep Learning (The MIT Press Essential Knowledge series),
2019
Charu Aggarwal, Neural Networks and Deep Learning: A Textbook 2nd ed,
Springer, 2023
Aston Zhang et al, Dive into Deep Learning, Cambridge University Press,
2023
Simon Prince, Understanding Deep Learning, The MIT Press, 2023
https://www.kdnuggets.com/2017/08/first-steps-learning-deep-learning-image-classification-keras.html
Definitions
“For most flavors of the old generations of learning algorithms … performance will plateau. … deep
learning … is the first class of algorithms … that is scalable. … performance just keeps getting better as
you feed them more data.” - Andrew Ng
“The hierarchy of concepts allows the computer to learn complicated concepts by building them out of
simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep,
with many layers. For this reason, we call this approach to AI deep learning.” – Ian Goodfellow
“Deep learning [is] … a pipeline of modules all of which are trainable. … deep because [has] multiple
stages in the process of recognizing an object and all of those stages are part of the training” -Yann LeCun
“At which problem depth does Shallow Learning end, and Deep Learning begin? Discussions with DL experts
have not yet yielded a conclusive response to this question. […], let me just define for the purposes of this
overview: problems of depth > 10 requireVery Deep Learning.” - Jurgen Schmidhuber
https://machinelearningmastery.com/what-is-deep-learning/
“The more I read, the more I acquire,
the more certain I am that I know nothing.”
― Voltaire
Frameworks for implementing deep learning
Tensorflow Pytorch
https://paperswithcode.com/trends
Tensorflow vs. Pytorch
https://trends.google.com/trends/explore?date=2018-08-30%202023-09-30&q=pytorch,tensorflow&hl=en
Working with Tensorflow/Keras under R
Installation:
Install Rtools4 from https://cran.r-
project.org/bin/windows/Rtools/rtools40.html
Install.packages(“dplyr”)
Library(devtools)
Install_github(“rstudio/reticulate”)
Install.packages(“keras”) for Tensorflow and Keras
Use RStudio
Image processing. Convolutional neural networks
Convolutional neural networks
(CNN)
https://www.quora.com/What-are-some-good-data-science-statistics-machine-learning-jokes
CNN
Automatic feature learning: from low-
level to high-level
Important applications in computer vision
Classification – There is a building in this
image
Semantic Segmentation – These are the
pixels of the buildings
Object Detection – There are buildings in
this image
Instance Segmentation – There are several
instances of buildings in this image and
these are the pixels of each
Architecture
Layers
Feature learning
Convolution
ReLU
Pooling
Classification
Fully connected
https://medium.com/@RaghavPrabhu/understanding-of-
convolutional-neural-network-cnn-deep-learning-99760835f148
Convolution
Input
Volume of Width x Height x Depth
Kernel (or filter) - a shared set of weights
Forward pass – convolution between filter
and input volume
Result: KD activation maps of size KSxKS
stacked in a new volume
Four hyperparameters:
Kernel size (KS)
Kernel depth (number of filters) (KD)
Stride
Zero-padding http://cs231n.github.io/convolutional-networks/
https://www.youtube.com/watch?v=AQirPKrAyDg
ReLU & Pooling
Rectified Linear Unit – transfer layer for nonlinearity
(Max) Pooling – downsamples the volume by taking the (maximum) value
Window sizes
Stride
http://cs231n.github.io/convolutional-networks/
https://www.youtube.com/watch?v=AQirPKrAyDg
The practice
Choice of the proper architecture
Parameter dependence on the problem
Large runtime
Machine Learning Memes for Convolutional Teens
Great computing power required
Small sample size in real-world
Overfitting
Difficult model interpretation
Not plug & play!
https://medium.com/@elmira/deep-
learning-frustration-4c26d69609f2
Parametrization
Convolutional kernels Initial weights
Sizes, depths, strides Optimizers
Pooling layers Learning rate
Window sizes, strides Activation functions
Dropout rates Number of units in the fully connected layers
Batch size Topology
Number of epochs
Tuning
Manual
Automatic
Overfitting
When the model is too complex for the data
Working with real-world problems
Small sample size
Ways to combat:
Machine Learning Memes for Convolutional Teens
Data augmentation
Flip, rotate, scale, crop, translate, Gaussian noise
Generative adversarial networks (GAN): one network generates, the other evaluates
Dropout layer
Regularization
Weight penalty (decay) L1 and L2
Early stopping
Use of checkpoints to save the model each epoch
Pick the best candidate from validation result after last epoch
Image classification
MNIST digits classification in Keras/TF under R
A 2-layer CNN pentru recunoasterea cifrelor MNIST
http://yann.lecun.com/exdb/mnist/
Real-time training history
Plot training
history
Visualize test predictions
Accuracy:
98.79%
CNN for CIFAR-10 in Tensorflow/Keras
https://www.cs.toronto.edu/~kriz/cifar.html
A data set of 60000 images, size 32x32
Real-world pictures
10 classes, 6000 images per class
50000 training data, 10000 test data
TensorBoard
Real time and storable measurements/visualizations during the machine learning
flow
Interpretation:
Red pixels (positive) denote the features whose presence is significant for the class
Blue pixels (negative) show the features whose absence is significant for the class
Transfer learning
Deep learning needs
Big data
Big computing resources
Take the parameters from an already trained network on a large data set
Data: ImageNet, CIFAR
Pre-trained models: VGG, Inception, AlexNet, ResNet
Initial layers have learnt general features
Train the final layers for the problem at hand
Learn the specific features of the current data
Hence
Resolve problem with scarce available data
Less parameters to train
Top
VGG-19 Classification
Feature
learning
Semantic segmentation
Competing architectures
U-Net Mask R-CNN
https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
U-Net with ResNet34 on CamVid in Pytorch
CamVid – frames from video of cars and pedestrians
https://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/
Batch normalization
follows convolutional layers and before the ReLU layers
allows higher learning rates and reduces the strong dependency on initialization
Self attention layers
Pixel accuracy as performance metric
Mish activation function, Ranger optimizer
First time:
!pip install fastai wwf -q –upgrade
.
.
Homework
Implement a direct convolutional neural network for Fashion MNIST
https://keras.rstudio.com/reference/dataset_fashion_mnist.html
(Optional) Investigate and create a YOLO model for object detection on the
data set of your choice