Deep learning uses neural networks to learn representations of data. Convolutional neural networks (CNNs) are commonly used for visual data. A CNN contains convolutional layers that learn hierarchical representations through local connections and weight sharing. It also uses pooling layers for downsampling, fully connected layers for classification, and may employ techniques like dropout to prevent overfitting. CNNs have achieved human-level performance on image recognition tasks.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
102 views33 pages
CC511 Week 7 - Deep - Learning
Deep learning uses neural networks to learn representations of data. Convolutional neural networks (CNNs) are commonly used for visual data. A CNN contains convolutional layers that learn hierarchical representations through local connections and weight sharing. It also uses pooling layers for downsampling, fully connected layers for classification, and may employ techniques like dropout to prevent overfitting. CNNs have achieved human-level performance on image recognition tasks.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33
CC 511 Artificial Intelligence
Deep Learning
Dr. Karma Fathalla
Lecture Highlights • Deep Learning Applications • Current Architectures • Convolution Neural Network DL and Data Science • Scalability of neural networks - results get better with more data and larger models, that in turn require more computation to train. DL and Feature Engineering • Automated Feature Learning - ability to perform automatic feature extraction from raw data. • Hierarchical Feature Learning - ability to provide different levels of abstractions of the data. Traditional Recognition Convolution Neural Networks (CNN) • It is a class of deep, feed-forward artificial neural networks that are applied to analyzing visual imagery. Convolution Layer Convolution Layer Convolution Layer Convolution Layer Convolution Layer • The # of output feature maps are usually larger than the # of input feature maps. Convolution Layer Related terms • Filter : A mask/window that holds the learned weights that are convolved with the image. Its size specifies the patch or receptive field of the image. • Feature Map: is the output of one filter applied to the previous layer. • Stride: is the distance (number of rows and columns) that filter is moved across the input from the previous location. • Padding: is to invent mock inputs for the receptive field for the filter to read, incase the filter is attempting to read off the edge of the input feature map Spatial Dimensions • 7x7 input (spatially) assume 3x3 filter => 5x5 output • 7x7 input (spatially) assume 3x3 filter applied with stride 2 => 3x3 output! • 7x7 input (spatially) assume 3x3 filter applied with stride 3? doesn’t fit! cannot apply 3x3 filter on 7x7 input with stride 3. Spatial Dimensions
1 pad with 1 pixel border => what is the output? • 7x7 output! • In general, common to see CONV layers with stride 1, filters of size FxF, and zero-padding with (F-1)/2. (will preserve size spatially) F = 3 => zero pad with 1 F = 5 => zero pad with 2 F = 7 => zero pad with 3 Weight Sharing • Is the concept by which the CNN achieves translation invariance. • Based on the assumption: That if one feature is useful to compute at some spatial position (x,y), then it should also be useful to compute at a different position (x2,y2). • Is to constrain the neurons in each depth slice to use the same weights and bias across the whole image. • However, it is possible to relax the parameter sharing scheme, and instead simply call the layer a Locally-Connected Layer. Weight Sharing • In practice, the weight update is performed concurrently through parallelization algorithms and special hardware called the Graphical Processing Unit (GPU) • GPUs : are hundreds of simpler cores, thousands of hardware threads that are applied to image regions at the same time. Number of parameters • Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 • Number of parameters in this layer? each filter has 5*5*3 + 1 = 76 params (+1 for bias) => 76*10 = 760 Hierarchy of Convolution Layers Activation Layer • After each conv layer, it is conventional to apply a nonlinear function. • In the past, nonlinear functions like tanh and sigmoid were used, but researchers found out that ReLU layers work far better because the network is able to train a lot faster (because of the computational efficiency) without making a significant difference to the accuracy. It also helps to alleviate the vanishing gradient problem. • Generalization would not be possible with a linear mapping as in that case a high level of abstraction/generalization would not be possible. Hence, to map a class of images into a manifold of feature vector, we need activation, without it, it would be really difficult to generalize as pictures in a class can have to much intra-class variations. Activation Layer • Relu (REctified Linear Unit) Pooling Layer • It down-samples the previous layer’s feature map. • Pooling layers follow a sequence of one or more convolutional . • It may be considered as a technique to compress or generalize feature representations and generally reduce the overfitting of the training data by the model. • They too have a receptive field, often much smaller than the convolutional layer. Also, the stride or number of inputs that the receptive field is moved for each activation is often equal to the size of the receptive field to avoid any overlap. • Pooling layers are often very simple, taking the average or the maximum of the input value in order to the new feature map. Pooling Layer Dropout Layer • Probabilistically dropping out or ignoring nodes in the network is a simple and effective regularization method. • It offers a very computationally cheap and remarkably effective regularization method to reduce overfitting and improve generalization error in deep neural networks of all kinds. • Dropout has the effect of making the training process noisy, forcing nodes within a layer to probabilistically take on more or less responsibility for the inputs. • It encourages the network to actually learn a sparse representation. Dropout Layer Fully Connected Layer • is the normal flat feed-forward neural network layer. • is preceded by a flatten procedure. • Contains neurons that connect to the entire input volume, as in ordinary Neural Networks • Spatial information is lost at this phase • These layers may have a non-linear activation function or a softmax activation in order to output probabilities of class predictions. • Fully connected layers are used at the end of the network after feature extraction and consolidation has been performed by the convolutional and pooling layers. • They are used to create final non-linear combinations of features and for making predictions by the network. Soft--max Layer Soft Soft--max Layer Soft • A Softmax function is a type of squashing function, that limit the output of the function into the range 0 to 1. • This allows the output to be interpreted directly as a probability. Similarly, softmax functions are multi-class sigmoids, meaning they are used in determining probability of multiple classes at once. • Since the outputs of a softmax function can be interpreted as a probability, a softmax layer is typically the final layer used in neural network functions. • It is important to note that a softmax layer must have the same number of nodes as the output later. • It allows for the calculation of the error. Transfer Learning
• is a technique which reuses the finished Deep Learning
model in another more specific task. • A pretrained CNN is used to process data of different dataset than the one it was trained on. • The learned parameters are used as they are. • Sometimes, some further training to fine tune the CNN is used. Also, some adaptation of the architecture might be involved. Data Augmentation • Artificially making the dataset larger • By using a collection of simple image transformations on the already included images yielding new ones, such as: grayscales, horizontal flips, vertical flips, random crops, color jitters, translations, rotation. Shortcomings of CNNs • A black-box : operates in the paradigm of non- explainable AI, With the exception of visualization of output structures at intermediate levels • The application of CNNs in unsupervised settings is still lagging behind • Limitations to context reasoning • Not invariant to other affine and non-affine transformations Famous CNNs Listing • LeNet. The first successful application of Convolutional Networks were developed by Yann LeCun in 1990’s. • AlexNet. The first work that popularized Convolutional Networks in Computer Vision. The AlexNet was submitted to the ImageNet ILSVRC challenge in 2012 and significantly outperformed the second runner-up (top 5 error of 16% compared to runner-up with 26% error). The Network had a very similar architecture to LeNet, but was deeper, bigger, and featured Convolutional Layers stacked on top of each . • ZF Net. The ILSVRC 2013. It was an improvement on AlexNet by tweaking the architecture hyperparameters, in particular by expanding the size of the middle convolutional layers and making the stride and filter size on the first layer smaller. • GoogLeNet. The ILSVRC 2014 winner was a Convolutional Network from Szegedy et al. from Google. Its main contribution was the development of an Inception Module that dramatically reduced the number of parameters in the network (4M, compared to AlexNet with 60M). Additionally, this paper uses Average Pooling instead of Fully Connected layers at the top of the ConvNet, eliminating a large amount of parameters that do not seem to matter much. • VGGNet. The runner-up in ILSVRC. Its main contribution was in showing that the depth of the network is a critical component for good performance. Their final best network contains 16 CONV/FC layers and, appealingly, features an extremely homogeneous architecture that only performs 3x3 convolutions and 2x2 pooling from the beginning to the end. • ResNet. Residual Network was the winner of ILSVRC 2015. It features special skip connections and a heavy use of batch normalization. The architecture is also missing fully connected layers at the end of the network.