0% found this document useful (0 votes)
15 views6 pages

DL Highlights

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

DL Highlights

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

1

DEEP LEARNING-Last Minute Notes


Feature vector A feature vector is an ordered list of Assume some functional form for the probability,
numerical properties of observed phenomena. It represents With the help of training data, we estimate the
input features to a machine learning model that makes a parameters of P(Y|X)
prediction. Examples of Discriminative Models
Feature Vectors for Text Classification Logistic regressio,Supportvector machines(SVMs)
A bag-of-words model One-hot encoding Traditional neural networks ,Nearest neighbor
Tf-idf (term frequency-inverse document frequency) Conditional Random Fields (CRFs)
word embedding. Decision Trees and Random Forest
BOUNDARY DESCRIPTOR What Are Generative Models?
Simple Descriptors Generative models are considered a class of statistical
l Length of a Contour l Boundary Diameter models that can generate new data instances. These
Boundary Descriptors models are used in unsupervised machine learning as a
l Curvature Shape Numbers means to perform tasks such as
Regional Descriptors : Area, Perimiter and compactness Probability and Likelihood estimation, Modeling data
Compactness=(perimeter)2/area points ,To describe the phenomenon in data,
Topological Descriptors o distinguish between classes based on these
l Rubber-sheet Distortions l Euler Number E = C -H probabilities.
Machine learning VS Deep learning The Mathematics of Generative Models
Machine Learning Deep Learning f: X -> Y, or probability P(Y|X):
Machine Learning is a Deep Learning is a subset of • P(Y), P(X|Y)
superset of Deep Learning Machine Learning • P(Y |X)
The data represented in The data representation is used Examples of Generative Models
Machine Learning is quite in Deep Learning is quite Naïve Bayes Bayesian networks
different as compared to different as it uses neural Markov random fields Hidden Markov Models (HMMs)
DL as it uses structured networks(ANN). Latent Dirichlet Allocation
data Generative Adversarial Networks (GANs)
Machine Learning is an Deep Learning is an evolution of
evolution of AI Machine Learning. Basically, it is
how deep is the machine What is Deep Leaning ?
learning. Deep learning is a subset of machine learning, which is
Its model takes less time in A huge amount of time is taken
essentially a neural network with three or more layers.
training due to its small because of very big data points.
size.
These neural networks attempt to simulate the behavior
Humans explicitly do Feature engineering is not of the human brain—albeit far from matching its
feature engineering. needed because important ability—allowing it to “learn” from large amounts of
features are automatically data.
detected by neural networks. Advantages of Deep Learning:
ML applications are Deep learning systems utilize 1Automatic feature learning: 2 Handling large and complex
simpler compared to deep much more powerful hardware data: 3 Improved performance:4Handling non-linear
learning and can be and resources. relationships: 5Handling structured and unstructured data
executed on standard 6 Predictive modeling: 7 Handling missing data:
computers. 8 Handling sequential data:9 Scalability 10. Generalization:
The results of an ML model The results of deep learning are
Disadvantages of Deep Learning:
are easy to explain. difficult to explain.
1High computational cost: 2Overfitting:
Machine learning models Deep learning models are 3Lack of interpretability: 4Dependence on data quality:
can be used to solve appropriate for resolving 5Data privacy and security concerns:
straightforward or a little challenging issues. 6Lack of domain expertise 7Unforeseen consequences
bit challenging issues. 8Limited to the data its trained on: 9Black box models:
Application of Deep Learning:
What Are Discriminative Models? Computer vision Natural language processing
The discriminative model refers to a class of models used Speech recognition Predictive analytics
in Statistical Classification, mainly used for supervised Recommender systems Healthcare
machine learning. These types of models are also known Finance Marketing
as conditional models since they learn the boundaries Gaming Robotics Cybersecurity
between classes or labels in a dataset. Beysion Theorem
The Mathematics of Discriminative Models Bayes’ Theorem is named after Reverend Thomas Bayes.
analysis involves estimating a function f: X -> Y, or It is a very important theorem in mathematics that is
probability P(Y|X) used to find the probability of an event, based on prior
2

knowledge of conditions that might be related to that K-Nearest Neighbor(KNN) Algorithm for Machine Learning
event. It is a further case of conditional probability. *simplest Machine Learning algorithms based on
What is Bayes’ Theorem? Supervised Learning technique.
Bayes theorem is also known as the Bayes Rule or Bayes *assumes the similarity between the new case/data
Law. It is used to determine the conditional probability and available cases and put the new case into the
of event P(A|B) = P(B|A)P(A) / P(B) category that is most similar to the available
Bayes Theorem Statement categories.
Bayes’ Theorem for n set of events *stores all the available data and classifies a new
P(Ei|A) = P(Ei)P(A|Ei) / ∑ P(Ek)P(A|Ek) data point based on the similarity.
Terms Related to Bayes Theorem *K-NN algorithm can be used for Regression as well
Conditional Probability P(A|B) as for Classification
Joint Probability P(A∩B). *K-NN is a non-parametric algorithm,
Random Variables *It is also called a lazy learner algorithm
Bayes Theorem Formula Why do we need a K-NN Algorithm?

Bayes Theorem Derivation


P(Ei|A) = P(Ei)P(A|Ei)/∑ P(Ek)P(A|Ek)
Difference Conditional Probability and Bayes Theorem
Bayes’ Theorem Conditional Probability
Bayes’ Theorem is Conditional Probability is How does K-NN work?
derived using the the probability of event A The K-NN working can be explained on the basis of the
definition of conditional when event B has already below algorithm:
probability. It is used to occurred. Step-1: Select the number K of the neighbors
find the reverse Step-2: Calculate the Euclidean distance of K number
probability. of neighbors
Formula: P(A|B) = Formula: P(A|B) = P(A∩B) / Step-3: Take the K nearest neighbors as per the
[P(B|A)P(A)] / P(B) P(B) calculated Euclidean distance.
Applications of Naïve Bayes Classifier: Step-4: Among these k neighbors, count the number
• It is used for Credit Scoring. of the data points in each category.
• It is used in medical data classification. Step-5: Assign the new data points to that category
• It can be used in real-time predictions for which the number of the neighbor is maximum.
• It is used Spam filtering and Sentiment analysis. Step-6: Our model is ready
What is Bayes minimum risk classifier? Advantages
The minimum risk R (αi|x) is called the Bayes risk. *Simple to implement. *Robust to the noisy training data
λijP(ωj|x)=1 − P(ωi|x). R(αi|x) is minimum for the *It can be more effective if the training data is large.
decision i for which the posterior P(ωi|x) is maximum. Disadvantages of KNN Algorithm:
Same decision rule as the Bayes classifier. Always needs to determine the value of K which may
What is minimum error rate Bayes classifier? be complex some time*The computation cost is high
Minimum Error-Rate Classification λijP(ωj|x)=1 − SVM Algorithm as Maximum Margin Classifier
P(ωi|x). R(αi|x) is minimum for the decision i for which
the posterior P(ωi|x) is maximum Hwe will understand the concepts related to SVM
Linear discriminant analysis (Support Vector Machine) algorithm which is one of the
LDA is a classical technique to predict groups of samples. popular machine learning algorithm. SVM algorithm is
This is a supervised technique and needs prior used for solving classification problems in machine
knowledge of groups. learning.
Quadratic discriminant analysis Lets take a 2-dimensional problem space where a point
Quadratic discriminant analysis (QDA) is a probabilistic can be classified as one or the other class based on the
parametric classification technique which represents an value of the two dimensions (independent variables, say)
evolution of LDA for nonlinear class separations. X1 and X2.
Minimum Distance Classifier The minimum distance Advantages of support vector machine:
classifier is used to classify unknown image data to classes Robustness to noise: Generalization: Versatility.
which minimize the distance between the image data and Sparse solution: Regularization:
the class in multi-feature space. Disadvantages of support vector machine:
(1) Euclidian distance *Computationally expensive: *Choice of kernel
(2) Normalized Euclidian distance *Memory-intensive: * Limited to two-class problems:
(3) Mahalanobis distance
3

Applications of support vector machine:


*Face observation – *Text and hypertext arrangement – Different Regularization Techniques in Deep Learning
*Grouping of portrayals –*Handwriting remembranc L2 & L1 regularization
*Bioinformatics – *Generalized predictive control(GPC) L1 and L2 are the most common types of regularization.
Hinge Loss These update the general cost function by adding
The hinge loss is a specific type of cost function that another term known as the regularization term.
incorporates a margin or distance from the classification Cost function = Loss (say, binary cross entropy) +
boundary into the cost calculation. Regularization term
HInge Loss Formula Dropout
The loss is defined according to the following formula, This is the one of the most interesting types of
where t is the actual outcome (either 1 or -1), and y is regularization techniques. It also produces very good
the output of the classifier. results and is consequently the most frequently used
l(y)=max(0,1−t y)l(y)=max(0,1−t y) regularization technique in the field of deep learning.
LOSS FUNCTION Data Augmentation
The loss function is very important in machine learning The simplest way to reduce overfitting is to increase the
or deep learning. let’s say you are working on any size of the training data. In machine learning, we were
problem and you have trained a machine learning model not able to increase the size of training data as the
on the dataset and are ready to put it in front of your labeled data was too costly.
client. In mathematical optimization and decision Early stopping
theory, a loss or cost function (sometimes also called an Early stopping is a kind of cross-validation strategy where
error function) is a function that maps an event or values we keep one part of the training set as the validation set.
of one or more variables onto a real number intuitively When we see that the performance on the validation set
representing some “cost” associated with the event. is getting worse, we immediately stop the training on the
Loss function vs Cost function model. This is known as early stopping.
Most people confuse loss function and cost function. Local Minima and Global Minima
let’s understand what is loss function and cost function. The point at which a function takes the minimum value
Cost function and Loss function are synonymous and is called global minima. Those several points which
used interchangeably but they are different. appear to be minima but are not the point where the
Loss Function: function actually takes the minimum value are called
A loss function/error function is for a single training local minima. Machine learning algorithms such as
example/input. gradient descent algorithms may get stuck in local
Cost Function: minima during the training of the models.
A cost function, on the other hand, is the average loss Techniques for dealing with local minima problems? :
over the entire training dataset. • Careful selection of hand-crafted features
Loss function in Deep Learning • Dependence on learning rate schedules
1. Regression • Using different number of steps
1. MSE(Mean Squared Error)
2. MAE(Mean Absolute Error) Optimization Techniques popularly used in Deep Learning
3. Hubber loss Some of the techniques that we will be discussing in this
2. Classification article is-
1. Binary cross-entropy · Gradient Descent
2. Categorical cross-entropy · Stochastic Gradient Descent (SGD)
3. AutoEncoder · Mini-Batch Stochastic Gradient Descent (MB — SGD)
1. KL Divergence · SGD with Momentum
4. GAN · Nesterov Accelerated Gradient (NAG)
1. Discriminator loss · Adaptive Gradient (AdaGrad)
2. Minmax GAN loss · AdaDelta
5. Object detection · RMSProp
1. Focal loss · Adam
6. Word embeddings · Nadam
1. Triplet loss
What is Regularization? Underfitting and Overfitting
Regularization is a technique which makes slight • Bias: Assumptions made by a model to make a
modifications to the learning algorithm such that the function easier to learn. It is actually the error
model generalizes better. This in turn improves the rate of the training data.
model’s performance on the unseen data as well. • Variance: The difference between the error rate of
training data and testing data is called variance .
4

Underfitting: A statistical model or a machine learning What is a Neuron in Deep Learning?


algorithm is said to have underfitting when it cannot Neurons in deep learning models are nodes through
capture the underlying trend of the data, which data and computations flow.
Reasons for Underfitting: Neurons work like this:
*High bias and low variance *The model is too simple • They receive one or more input signals. These
*The size of the training dataset used is not enough. input signals can come from either the raw data
Techniques to reduce underfitting: set or from neurons positioned at a previous
*Increase model complexity * Remove noise from layer of the neural net.
the data. *Increase the number of features, • They perform some calculations.
performing feature engineering • They send some output signals to neurons –
Backpropagation is one of the important concepts of a
Overfitting: A statistical model is said to be overfitted neural network. Our task is to classify our data best. For
when the model does not make accurate predictions on this, we have to update the weights of parameter and
testing data. When a model gets trained with so much bias, but how can we do that in a deep neural network?
data, it starts learning from the noise and inaccurate In the linear regression model, we use gradient descent
data entries in our data set to optimize the parameter. Similarly here we also use
Reasons for Overfitting are as follows: gradient descent algorithm using Backpropagation.
*High variance and low bias *The model is too complex Multi-layer Perceptron
The size of the training data Multi-layer perception is also known as MLP. It is fully
Techniques to reduce overfitting: connected dense layers, which transform any input
*Increase training data. *Reduce model complexity. dimension to the desired dimension. A multi-layer
Good Fit in a Statistical Model: Ideally, the case when perception is a neural network that has multiple layers.
the model makes the predictions with 0 error, is said to To create a neural network we combine neurons
have a good fit on the data. This situation is achievable together so that the outputs of some neurons are inputs
at a spot between overfitting and underfitting. of other neurons.
LINEAR REGRESSION LOGISTIC REGRESSION A gentle introduction to neural networks and
Linear Regression is one of the Logistic regression is one of
TensorFlow can be found here:
most simple Machine learning the most popular Machine What is Entropy in ML?
algorithm that comes under learning algorithm that comes Entropy is the number of bits required to transmit a
Supervised Learning technique under Supervised Learning randomly selected event from a probability distribution.
and used for solving regression techniques. A skewed distribution has a low entropy, whereas a
problems. It can be used for distribution where events have equal probability has a
It is used for predicting the Classification as well as for
larger entropy.
continuous dependent variable Regression problems, but
with the help of independent mainly used for Classification There is binary cross entropy loss and multi-class cross
variables. problems. entropy loss.
The goal of the Linear Logistic regression is used to Cross-entropy builds upon the idea of entropy from
regression is to find the best fit predict the categorical information theory and calculates the number of bits
line that can accurately predict dependent variable with the required to represent or transmit an average event from
the output for the continuous help of independent
one distribution compared to another distribution.
dependent variable. variables.
The output of Logistic When do we use it?
Regression problem can be Cross-entropy loss is used when adjusting model
only between the 0 and 1. weights during training.
: What Are Autoencoders?
Softmax classifier Autoencoders are very useful in the field of unsupervised
It turns out that the SVM is one of two commonly seen machine learning. You can use them to compress the
classifiers. The other popular choice is the Softmax data and reduce its dimensionality.
classifier, which has a different loss function. If you’ve Architecture
heard of the binary Logistic Regression classifier before, Three layers: *Encoder *Code *Decoder
the Softmax classifier is its generalization to multiple Types of AutoencodersUnder Complete Autoencoders : IT
classes. Unlike the SVM which treats the outputs f(xi,W) is an unsupervised neural network that you can use to
Li=−log(efyi∑jefj)or equivalentlyLi=−fyi+log∑jefj generate a compressed version of the input data.
Why non-linearity is important in deep learning? Use Cases
Further, a non-linear activation function allows the Autoencoders have various use-cases like:
stacking of multiple layers of neurons to create a deep • Anomaly detection:
neural network, which is required to learn complex data • Data denoising image and audio:
sets with high accuracy • Image inpainting:
• Information retrieval:
5

Sparse AutoencoderS : Is controlled by changing the to recognize visual patterns directly from pixel images
number of nodes at each hidden layer. with minimal preprocessing..
Contractive Autoencoders The input is passed through a LeNet-5 (1998)
bottleneck in a contractive autoencoder and then AlexNet (2012)
reconstructed in the decoder. ZFNet(2013)
Denoising Autoencoders: are similar to regular GoogLeNet/Inception(2014)
autoencoders in that they take an input and produce an VGGNet (2014)
output. ResNet(2015)
Variational Autoencoders TRANSFER LEARNING
Variational autoencoders (VAEs) are models that address Transfer learning is a technique in machine learning
a specific problem with standard autoencoders. where a model trained on one task is used as the starting
Encoder-Decoder Model point for a model on a second task
•Encoder •Hidden Vector • Decoder Advantages of transfer learning:
*Speed up the training process: *Better performance:
The Encoder will convert the input sequence into a *Handling small dataset:
single-dimensional vector (hidden vector). The decoder Disadvantages:
will convert the hidden vector into the output sequence. *Domain mismatch *Overfitting: *Complexity:
Encoder-Decoder models are jointly trained to maximize DEEP LEARNING CHALLENGES
the conditional probabilities of the target sequence 1. Lots and lots of data 2. Overfitting in neural networks 3.
given the input sequence. Hyperparameter Optimization 4. Requires high-
Encoder performance hardware 5. Neural networks are essentially a
*Multiple RNN cells can be stacked together to form the Blackbox 6. Lack of Flexibility and Multitasking
encoder. RNN reads each inputs sequentially Different Normalization Layers in Deep Learning
*For every timestep (each input) t, the hidden state h is Presently Deep Learning has been revolutionizing many
updated according to the input at that timestep X[i]. subfields such as natural language processing, computer
*After all the inputs are read by encoder model, the final vision, robotics, etc.
hidden state of the model represents the *Batch Normalization *Weight Normalization
context/summary of the whole input sequence *Layer Normalization *Group Normalization
Decoder *Weight Standarization
• The Decoder generates the output sequence by Facial Recognition Using Deep Learning
predicting the next output Yt given the hidden Convolutional Neural Networks allow us to extract a
state ht. wide range of features from images. Turns out, we can
• The input for the decoder is the final hidden use this idea of feature extraction for face recognition
vector obtained at the end of encoder model. too! That’s what we are going to explore in this tutorial,
• Each layer will have three inputs, hidden vector using deep conv nets for face recognition.
from previous layer ht-1 and the previous layer What is the difference between zero-shot, one-shot
output yt-1, original hidden vector h. learning, and few-shot learning models?
Output Layer Apart from one-shot learning, there exist other models
Applications that require just several examples (few-shot learning) or
It possesses many applications such as no examples at all (zero-shot learning).
• Google’s Machine Translation Few-shot learning is simply a variation of one-shot
• Question answering chatbots learning model with several training images available.
• Speech recognition The goal of zero-shot learning is to categorize unknown
• Time Series Application etc., classes without training data at all.
Convolution Layers Applications
There are three types of layers that make up the CNN face recognition and signature verification.
which are the convolutional layers, pooling layers, and computer vision,
fully-connected (FC) layers. cross-lingual word recognition,
1. Convolutional Layer
2. Pooling Layer What is Image Segmentation?
3. Fully Connected Layer One of the most important operations in Computer
5. Activation Functions Vision is Segmentation. Image segmentation is the task
CNN Architectures: LeNet, AlexNet, VGG, GoogLeNet, of clustering parts of an image together that belong to
ResNet and more… the same object class. This process is also called pixel-
A Convolutional Neural Network (CNN, or ConvNet) are level classification. In other words, it involves
a special kind of multi-layer neural networks, designed partitioning images (or video frames) into multiple
segments or objects.
6

Semantic vs. Instance Segmentation Other use cases of GAN could be:
Image segmentation can be formulated as a Text-to-Image Translation. Clothing Translation
classification problem of pixels with semantic labels Face Frontal View Generation. Photo Inpainting.
(semantic segmentation) or partitioning of individual Generate New Human Poses. Photos to Emojis.
objects (instance segmentation). Semantic Face Aging. Super Resolution.
segmentation performs pixellevel labeling with a set of What is modeling in deep learning?
object categories (for example, people, trees, sky, cars) A computer model learns to perform classification tasks
for all image pixels. It is generally a more difficult directly from images, text, or sound..
undertaking than image classification, which predicts a What is preprocsssing?
single label for the entire image or frame. Instance Preprocessing data is a common first step in the deep
segmentation extends the scope of semantic learning workflow to prepare raw data in a format that
segmentation further by detecting and delineating all the network can accept.
the objects of interest in an image. What is Feature extraction ?
What's the KL Divergence? Feature extraction for machine learning and deep
The Kullback-Leibler divergence (hereafter written as KL learning. Feature extraction refers to the process of
divergence) is a measure of how a probability transforming raw data into numerical features that can
distribution differs from another probability distribution. be processed while preserving the information in the
Classically, in Bayesian theory, there is some true original data set..
distribution P(X) Advantage and disadvantages of ADAGRAD

GAN Advantages
Generative adversarial networks (GANs) are an exciting Disadvantages
recent innovation in machine learning. GANs are The learning rate is A squared term is added for
generative models: they create new data instances that automatically each iteration. Since it is always
resemble your training data. For example, GANs can updated. There is no positive, the learning rate
create images that look like photographs of human need to manually constantly decreases and can
faces, even though the faces don't belong to any real update the learning become infinitely small.
person. rate for each
Less efficient than some other
feature.
optimization algorithms like
Gives better results AdaDelta and Adam
than simple SGD if
we have both sparse
and dense features

You might also like