0% found this document useful (0 votes)
102 views33 pages

CC511 Week 7 - Deep - Learning

Deep learning uses neural networks to learn representations of data. Convolutional neural networks (CNNs) are commonly used for visual data. A CNN contains convolutional layers that learn hierarchical representations through local connections and weight sharing. It also uses pooling layers for downsampling, fully connected layers for classification, and may employ techniques like dropout to prevent overfitting. CNNs have achieved human-level performance on image recognition tasks.

Uploaded by

mohamed sherif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views33 pages

CC511 Week 7 - Deep - Learning

Deep learning uses neural networks to learn representations of data. Convolutional neural networks (CNNs) are commonly used for visual data. A CNN contains convolutional layers that learn hierarchical representations through local connections and weight sharing. It also uses pooling layers for downsampling, fully connected layers for classification, and may employ techniques like dropout to prevent overfitting. CNNs have achieved human-level performance on image recognition tasks.

Uploaded by

mohamed sherif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

CC 511 Artificial Intelligence

Deep Learning

Dr. Karma Fathalla


Lecture Highlights
• Deep Learning Applications
• Current Architectures
• Convolution Neural Network
DL and Data Science
• Scalability of neural networks - results get better
with more data and larger models, that in turn
require more computation to train.
DL and Feature Engineering
• Automated Feature Learning - ability to perform
automatic feature extraction from raw data.
• Hierarchical Feature Learning - ability to provide
different levels of abstractions of the data.
Traditional Recognition
Convolution
Neural Networks (CNN)
• It is a class of deep, feed-forward artificial neural
networks that are applied to analyzing visual
imagery.
Convolution Layer
Convolution Layer
Convolution Layer
Convolution Layer
Convolution Layer
• The # of output feature maps are usually larger than
the # of input feature maps.
Convolution Layer
Related terms
• Filter : A mask/window that holds the learned weights that
are convolved with the image. Its size specifies the patch or
receptive field of the image.
• Feature Map: is the output of one filter applied to the
previous layer.
• Stride: is the distance (number of rows and columns) that
filter is moved across the input from the previous location.
• Padding: is to invent mock inputs for the receptive field for
the filter to read, incase the filter is attempting to read off the
edge of the input feature map
Spatial Dimensions
• 7x7 input (spatially) assume 3x3
filter => 5x5 output
• 7x7 input (spatially) assume 3x3
filter applied with stride 2 => 3x3
output!
• 7x7 input (spatially) assume 3x3
filter applied with stride 3? doesn’t
fit! cannot apply 3x3 filter on 7x7
input with stride 3.
Spatial Dimensions

• Output size: (N - F) / stride


+1
• e.g. N = 7, F = 3:
stride 1 => (7 - 3)/1 + 1 = 5
stride 2 => (7 - 3)/2 + 1 = 3
stride 3 => (7 - 3)/3 + 1 =
2.33
Padding

• Input 7x7 and 3x3 filter, applied with stride


1 pad with 1 pixel border => what is the
output?
• 7x7 output!
• In general, common to see CONV layers with
stride 1, filters of size FxF, and zero-padding
with (F-1)/2. (will preserve size spatially)
F = 3 => zero pad with 1
F = 5 => zero pad with 2
F = 7 => zero pad with 3
Weight Sharing
• Is the concept by which the CNN achieves translation
invariance.
• Based on the assumption: That if one feature is useful to
compute at some spatial position (x,y), then it should also be
useful to compute at a different position (x2,y2).
• Is to constrain the neurons in each depth slice to use the same
weights and bias across the whole image.
• However, it is possible to relax the parameter sharing scheme,
and instead simply call the layer a Locally-Connected Layer.
Weight Sharing
• In practice, the weight update is performed
concurrently through parallelization algorithms and
special hardware called the Graphical Processing Unit
(GPU)
• GPUs : are hundreds of simpler cores, thousands of
hardware threads that are applied to image regions
at the same time.
Number of parameters
• Input volume: 32x32x3 10 5x5 filters with stride 1,
pad 2
• Number of parameters in this layer? each filter has
5*5*3 + 1 = 76 params (+1 for bias) => 76*10 =
760
Hierarchy of
Convolution Layers
Activation Layer
• After each conv layer, it is conventional to apply a nonlinear function.
• In the past, nonlinear functions like tanh and sigmoid were used, but
researchers found out that ReLU layers work far better because the
network is able to train a lot faster (because of the computational
efficiency) without making a significant difference to the accuracy. It also
helps to alleviate the vanishing gradient problem.
• Generalization would not be possible with a linear mapping as in that case
a high level of abstraction/generalization would not be possible. Hence, to
map a class of images into a manifold of feature vector, we need
activation, without it, it would be really difficult to generalize as pictures in
a class can have to much intra-class variations.
Activation Layer
• Relu (REctified Linear Unit)
Pooling Layer
• It down-samples the previous layer’s feature map.
• Pooling layers follow a sequence of one or more convolutional .
• It may be considered as a technique to compress or generalize
feature representations and generally reduce the overfitting of the
training data by the model.
• They too have a receptive field, often much smaller than the
convolutional layer. Also, the stride or number of inputs that the
receptive field is moved for each activation is often equal to the size
of the receptive field to avoid any overlap.
• Pooling layers are often very simple, taking the average or the
maximum of the input value in order to the new feature map.
Pooling Layer
Dropout Layer
• Probabilistically dropping out or ignoring nodes in the network is a
simple and effective regularization method.
• It offers a very computationally cheap and remarkably effective
regularization method to reduce overfitting and improve
generalization error in deep neural networks of all kinds.
• Dropout has the effect of making the training process noisy, forcing
nodes within a layer to probabilistically take on more or less
responsibility for the inputs.
• It encourages the network to actually learn a sparse representation.
Dropout Layer
Fully Connected Layer
• is the normal flat feed-forward neural network layer.
• is preceded by a flatten procedure.
• Contains neurons that connect to the entire input volume, as in ordinary
Neural Networks
• Spatial information is lost at this phase
• These layers may have a non-linear activation function or a softmax
activation in order to output probabilities of class predictions.
• Fully connected layers are used at the end of the network after feature
extraction and consolidation has been performed by the convolutional and
pooling layers.
• They are used to create final non-linear combinations of features and for
making predictions by the network.
Soft--max Layer
Soft
Soft--max Layer
Soft
• A Softmax function is a type of squashing function, that limit the
output of the function into the range 0 to 1.
• This allows the output to be interpreted directly as a probability.
Similarly, softmax functions are multi-class sigmoids, meaning they
are used in determining probability of multiple classes at once.
• Since the outputs of a softmax function can be interpreted as a
probability, a softmax layer is typically the final layer used in neural
network functions.
• It is important to note that a softmax layer must have the same
number of nodes as the output later.
• It allows for the calculation of the error.
Transfer Learning

• is a technique which reuses the finished Deep Learning


model in another more specific task.
• A pretrained CNN is used to process data of different
dataset than the one it was trained on.
• The learned parameters are used as they are.
• Sometimes, some further training to fine tune the CNN is
used. Also, some adaptation of the architecture might be
involved.
Data Augmentation
• Artificially making the dataset larger
• By using a collection of simple image transformations
on the already included images yielding new ones,
such as: grayscales, horizontal flips, vertical flips,
random crops, color jitters, translations, rotation.
Shortcomings of CNNs
• A black-box : operates in the paradigm of non-
explainable AI, With the exception of visualization of
output structures at intermediate levels
• The application of CNNs in unsupervised settings is still
lagging behind
• Limitations to context reasoning
• Not invariant to other affine and non-affine
transformations
Famous CNNs Listing
• LeNet. The first successful application of Convolutional Networks were developed by Yann LeCun in 1990’s.
• AlexNet. The first work that popularized Convolutional Networks in Computer Vision. The AlexNet was submitted
to the ImageNet ILSVRC challenge in 2012 and significantly outperformed the second runner-up (top 5 error of
16% compared to runner-up with 26% error). The Network had a very similar architecture to LeNet, but was
deeper, bigger, and featured Convolutional Layers stacked on top of each .
• ZF Net. The ILSVRC 2013. It was an improvement on AlexNet by tweaking the architecture hyperparameters, in
particular by expanding the size of the middle convolutional layers and making the stride and filter size on the first
layer smaller.
• GoogLeNet. The ILSVRC 2014 winner was a Convolutional Network from Szegedy et al. from Google. Its main
contribution was the development of an Inception Module that dramatically reduced the number of parameters in
the network (4M, compared to AlexNet with 60M). Additionally, this paper uses Average Pooling instead of Fully
Connected layers at the top of the ConvNet, eliminating a large amount of parameters that do not seem to matter
much.
• VGGNet. The runner-up in ILSVRC. Its main contribution was in showing that the depth of the network is a critical
component for good performance. Their final best network contains 16 CONV/FC layers and, appealingly, features
an extremely homogeneous architecture that only performs 3x3 convolutions and 2x2 pooling from the beginning
to the end.
• ResNet. Residual Network was the winner of ILSVRC 2015. It features special skip connections and a heavy use of
batch normalization. The architecture is also missing fully connected layers at the end of the network.

You might also like