DL Highlights
DL Highlights
knowledge of conditions that might be related to that K-Nearest Neighbor(KNN) Algorithm for Machine Learning
event. It is a further case of conditional probability. *simplest Machine Learning algorithms based on
What is Bayes’ Theorem? Supervised Learning technique.
Bayes theorem is also known as the Bayes Rule or Bayes *assumes the similarity between the new case/data
Law. It is used to determine the conditional probability and available cases and put the new case into the
of event P(A|B) = P(B|A)P(A) / P(B) category that is most similar to the available
Bayes Theorem Statement categories.
Bayes’ Theorem for n set of events *stores all the available data and classifies a new
P(Ei|A) = P(Ei)P(A|Ei) / ∑ P(Ek)P(A|Ek) data point based on the similarity.
Terms Related to Bayes Theorem *K-NN algorithm can be used for Regression as well
Conditional Probability P(A|B) as for Classification
Joint Probability P(A∩B). *K-NN is a non-parametric algorithm,
Random Variables *It is also called a lazy learner algorithm
Bayes Theorem Formula Why do we need a K-NN Algorithm?
Sparse AutoencoderS : Is controlled by changing the to recognize visual patterns directly from pixel images
number of nodes at each hidden layer. with minimal preprocessing..
Contractive Autoencoders The input is passed through a LeNet-5 (1998)
bottleneck in a contractive autoencoder and then AlexNet (2012)
reconstructed in the decoder. ZFNet(2013)
Denoising Autoencoders: are similar to regular GoogLeNet/Inception(2014)
autoencoders in that they take an input and produce an VGGNet (2014)
output. ResNet(2015)
Variational Autoencoders TRANSFER LEARNING
Variational autoencoders (VAEs) are models that address Transfer learning is a technique in machine learning
a specific problem with standard autoencoders. where a model trained on one task is used as the starting
Encoder-Decoder Model point for a model on a second task
•Encoder •Hidden Vector • Decoder Advantages of transfer learning:
*Speed up the training process: *Better performance:
The Encoder will convert the input sequence into a *Handling small dataset:
single-dimensional vector (hidden vector). The decoder Disadvantages:
will convert the hidden vector into the output sequence. *Domain mismatch *Overfitting: *Complexity:
Encoder-Decoder models are jointly trained to maximize DEEP LEARNING CHALLENGES
the conditional probabilities of the target sequence 1. Lots and lots of data 2. Overfitting in neural networks 3.
given the input sequence. Hyperparameter Optimization 4. Requires high-
Encoder performance hardware 5. Neural networks are essentially a
*Multiple RNN cells can be stacked together to form the Blackbox 6. Lack of Flexibility and Multitasking
encoder. RNN reads each inputs sequentially Different Normalization Layers in Deep Learning
*For every timestep (each input) t, the hidden state h is Presently Deep Learning has been revolutionizing many
updated according to the input at that timestep X[i]. subfields such as natural language processing, computer
*After all the inputs are read by encoder model, the final vision, robotics, etc.
hidden state of the model represents the *Batch Normalization *Weight Normalization
context/summary of the whole input sequence *Layer Normalization *Group Normalization
Decoder *Weight Standarization
• The Decoder generates the output sequence by Facial Recognition Using Deep Learning
predicting the next output Yt given the hidden Convolutional Neural Networks allow us to extract a
state ht. wide range of features from images. Turns out, we can
• The input for the decoder is the final hidden use this idea of feature extraction for face recognition
vector obtained at the end of encoder model. too! That’s what we are going to explore in this tutorial,
• Each layer will have three inputs, hidden vector using deep conv nets for face recognition.
from previous layer ht-1 and the previous layer What is the difference between zero-shot, one-shot
output yt-1, original hidden vector h. learning, and few-shot learning models?
Output Layer Apart from one-shot learning, there exist other models
Applications that require just several examples (few-shot learning) or
It possesses many applications such as no examples at all (zero-shot learning).
• Google’s Machine Translation Few-shot learning is simply a variation of one-shot
• Question answering chatbots learning model with several training images available.
• Speech recognition The goal of zero-shot learning is to categorize unknown
• Time Series Application etc., classes without training data at all.
Convolution Layers Applications
There are three types of layers that make up the CNN face recognition and signature verification.
which are the convolutional layers, pooling layers, and computer vision,
fully-connected (FC) layers. cross-lingual word recognition,
1. Convolutional Layer
2. Pooling Layer What is Image Segmentation?
3. Fully Connected Layer One of the most important operations in Computer
5. Activation Functions Vision is Segmentation. Image segmentation is the task
CNN Architectures: LeNet, AlexNet, VGG, GoogLeNet, of clustering parts of an image together that belong to
ResNet and more… the same object class. This process is also called pixel-
A Convolutional Neural Network (CNN, or ConvNet) are level classification. In other words, it involves
a special kind of multi-layer neural networks, designed partitioning images (or video frames) into multiple
segments or objects.
6
Semantic vs. Instance Segmentation Other use cases of GAN could be:
Image segmentation can be formulated as a Text-to-Image Translation. Clothing Translation
classification problem of pixels with semantic labels Face Frontal View Generation. Photo Inpainting.
(semantic segmentation) or partitioning of individual Generate New Human Poses. Photos to Emojis.
objects (instance segmentation). Semantic Face Aging. Super Resolution.
segmentation performs pixellevel labeling with a set of What is modeling in deep learning?
object categories (for example, people, trees, sky, cars) A computer model learns to perform classification tasks
for all image pixels. It is generally a more difficult directly from images, text, or sound..
undertaking than image classification, which predicts a What is preprocsssing?
single label for the entire image or frame. Instance Preprocessing data is a common first step in the deep
segmentation extends the scope of semantic learning workflow to prepare raw data in a format that
segmentation further by detecting and delineating all the network can accept.
the objects of interest in an image. What is Feature extraction ?
What's the KL Divergence? Feature extraction for machine learning and deep
The Kullback-Leibler divergence (hereafter written as KL learning. Feature extraction refers to the process of
divergence) is a measure of how a probability transforming raw data into numerical features that can
distribution differs from another probability distribution. be processed while preserving the information in the
Classically, in Bayesian theory, there is some true original data set..
distribution P(X) Advantage and disadvantages of ADAGRAD
GAN Advantages
Generative adversarial networks (GANs) are an exciting Disadvantages
recent innovation in machine learning. GANs are The learning rate is A squared term is added for
generative models: they create new data instances that automatically each iteration. Since it is always
resemble your training data. For example, GANs can updated. There is no positive, the learning rate
create images that look like photographs of human need to manually constantly decreases and can
faces, even though the faces don't belong to any real update the learning become infinitely small.
person. rate for each
Less efficient than some other
feature.
optimization algorithms like
Gives better results AdaDelta and Adam
than simple SGD if
we have both sparse
and dense features