0% found this document useful (0 votes)
7 views6 pages

Reserch Papers On Deep Learning Mpgi

The document provides an overview of various concepts in deep learning, including multilayer perceptrons, feedforward neural networks, backpropagation, and optimization algorithms like AdaGrad and RMSProp. It discusses dimensionality reduction techniques such as Principal Component Analysis and Singular Value Decomposition, as well as regularization methods to prevent overfitting. Additionally, it covers neural network architectures like convolutional and recurrent neural networks, along with concepts like bias-variance tradeoff, normalization, and encoder-decoder models.

Uploaded by

askadu16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views6 pages

Reserch Papers On Deep Learning Mpgi

The document provides an overview of various concepts in deep learning, including multilayer perceptrons, feedforward neural networks, backpropagation, and optimization algorithms like AdaGrad and RMSProp. It discusses dimensionality reduction techniques such as Principal Component Analysis and Singular Value Decomposition, as well as regularization methods to prevent overfitting. Additionally, it covers neural network architectures like convolutional and recurrent neural networks, along with concepts like bias-variance tradeoff, normalization, and encoder-decoder models.

Uploaded by

askadu16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Deep Leaning

What is Multilayer Perceptron in neural network?


A multilayer perceptron (MLP) is a fully connected class of feedforward artificial neural network
(ANN). The term MLP is used ambiguously, sometimes loosely to mean any feedforward ANN, sometimes
strictly to refer to networks composed of multiple layers of perceptrons (with threshold activation)

What is a Feed Forward Neural Network?

A Feed Forward Neural Network is an artificial neural network in which the connections between nodes
does not form a cycle. The opposite of a feed forward neural network is a recurrent neural network, in which
certain pathways are cycled. The feed forward model is the simplest form of neural network as information
is only processed in one direction. While the data may pass through multiple hidden nodes, it always moves
in one direction and never backwards.

What is Backpropagation?

Backpropagation is the essence of neural network training. It is the method of fine-tuning the weights of a
neural network based on the error rate obtained in the previous epoch (i.e., iteration). Proper tuning of the
weights allows you to reduce error rates and make the model reliable by increasing its generalization.

Backpropagation in neural network is a short form for “backward propagation of errors.” It is a standard
method of training artificial neural networks. This method helps calculate the gradient of a loss function
with respect to all the weights in the network

Gradient Descent (GD), Momentum Based GD, Nesterov Accelerated GD, Stochastic GD

What is AdaGrad?
Adaptive Gradient Algorithm (Adagrad) is an algorithm for gradient-based optimization. The learning
rate is adapted component-wise to the parameters by incorporating knowledge of past observations.

RMSProp

RMSProp is a very effective extension of gradient descent and is one of the preferred approaches
generally used to fit deep learning neural networks. Empirically, RMSProp has been shown to be an
effective and practical optimization algorithm for deep neural networks.
Eigenvalues

They are often referred as right vectors, which simply means a column vector (as opposed to a row vector
or a left vector). A right-vector is a vector as we understand them. Eigenvalues are coefficients applied to
eigenvectors that give the vectors their length or magnitude. Picking the features which represent that
data and eliminating less useful features is an example of dimensionality reduction. We can use
eigenvalues and vectors to identify those dimensions which are most useful and prioritize our
computational resources toward them

Principal Component Analysis

Principal Component Analysis is an unsupervised learning algorithm that is used for the dimensionality
reduction in machine learning. It is a statistical process that converts the observations of correlated features
into a set of linearly uncorrelated features with the help of orthogonal transformation. These new
transformed features are called the Principal Components. It is one of the popular tools that is used for
exploratory data analysis and predictive modeling. It is a technique to draw strong patterns from the given
dataset by reducing the variances.

PCA generally tries to find the lower-dimensional surface to project the high-dimensional data.

PCA works by considering the variance of each attribute because the high attribute shows the good split
between the classes, and hence it reduces the dimensionality. Some real-world applications of PCA are
image processing, movie recommendation system, optimizing the power allocation in various
communication channels. It is a feature extraction technique, so it contains the important variables and
drops the least important variable.

The Singular Value Decomposition

The Singular Value Decomposition (SVD) of a matrix is a factorization of that matrix into three matrices. It
has some interesting algebraic properties and conveys important geometrical and theoretical insights about
linear transformations. It also has some important applications in data science. In this article, I will try to
explain the mathematical intuition behind SVD and its geometrical meaning.

Mathematics behind SVD

The SVD of mxn matrix A is given by the formula :

where:

 U: mxn matrix of the orthonormal eigenvectors of .


 T
V : transpose of a nxn matrix containing the orthonormal eigenvectors of A^{T}A.
 W: a nxn diagonal matrix of the singular values which are the square roots of the eigenvalues
of .
PCA is essentially a linear transformation but Auto-encoders are capable of modelling complex non
linear functions. PCA features are totally linearly uncorrelated with each other since features are
projections onto the orthogonal basis.

Regularization

Regularization helps with the effects of out-of-control parameters by using different methods to minimize
parameter size over time.

In mathematical notation, we see regularization represented by the coefficient lambda, controlling the trade-
off between finding a good fit and keeping the value of certain feature weights low as the exponents on
features increase.

Regularization coefficients L1 and L2 help fight overfitting by making certain weights smaller. Smaller-
valued weights lead to simpler hypotheses, which are the most generalizable. Unregularized weights with
several higher-order polynomials in the feature sets tend to overfit the training set.

As the input training set size grows, the effect of regularization decreases, and the parameters tend to
increase in magnitude. This is appropriate because an excess of features relative to training set examples
leads to overfitting in the first place. Bigger data is the ultimate regularizer.

Regularized autoencoders

There are other ways to constrain the reconstruction of an autoencoder than to impose a hidden layer of
smaller dimensions than the input. The regularized autoencoders use a loss function that helps the model to
have other properties besides copying input to the output. We can generally find two types of regularized
autoencoder: the denoising autoencoder and the sparse autoencoder.

Denoising autoencoder

We can modify the autoencoder to learn useful features is by changing the inputs; we can add random noise
to the input and recover it to the original form by removing noise from the input data. This prevents the
autoencoder from copying the data from input to output because it contains random noise. We ask it to
subtract the noise and produce meaningful underlying data. This is called a denoising autoencoder.

Bias, and Variance

Introduction

The terms-, you must have heard of them even if you’re new to the domain. But it’s common for budding
data scientists to confuse the two. It’s essential to understand that no machine learning model can be 100%
accurate. As a matter of fact, it’s not even supposed to be. There are always going to be some prediction
errors - bias and variance. And understanding the bias-variance tradeoff is an integral part of a data
scientist’s learning path.
Bias

Bias is the skewness in the machine learning model occurring due to incorrect assumptions in the machine
learning process. Bias can be defined as the error between model predictions and the actual results.
Essentially, it describes how well the model captures in the training data set.

 A model which doesn’t capture the trends in training data set well is said to show high bias.
 A model with low bias resembles the trends in the data set.

Characteristics of a high bias model include:

 Failure to capture proper data trends


 Likely to underfit
 Gives an overly simplified view of the data

Variance

Practically, Variance could be defined as the model’s flexibility to changes in the data set or how robust the
model is.

It is the variability in the model prediction- how adjustable the function is to changes in the data set. More
complex models lead to high variance. Models having high bias have low variance and vice versa.

Characteristics of a high variance model include:

 Noisy dataset
 Likely to overfit
 Non-generalised/ complex model
 Accounting for outliers
 What is supervised greedy layer-wise pre-training?

Greedy layer-wise pretraining provides a way to develop deep multi-layered neural networks whilst
only ever training shallow networks. Pretraining can be used to iteratively deepen a supervised model or
an unsupervised model that can be repurposed as a supervised model.

What is Normalization vs Batch Normalization?

Normalization is a procedure to change the value of the numeric variable in the dataset to a typical scale,
without misshaping contrasts in the range of value.

Batch normalization is a technique for training very deep neural networks that normalizes the contributions
to a layer for every mini-batch. This has the impact of settling the learning process and drastically
decreasing the number of training epochs required to train deep neural networks.
What is a vector representation deep learning?
A vector is often represented as a 1-dimensional array of numbers, referred to as components and is
displayed either in column form or row form. Represented geometrically, vectors typically represent
coordinates within a n-dimensional space, where n is the number of dimensions.

CNN

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network
(ANN), most commonly applied to analyze visual imagery.[1] CNNs are also known as Shift Invariant or
Space Invariant Artificial Neural Networks (SIANN), based on the shared-weight architecture of the
convolution kernels or filters that slide along input features and provide translation-equivariant responses
known as feature maps.[2][3] Counter-intuitively, most convolutional neural networks are not invariant to
translation, due to the downsampling operation they apply to the input.[4] They have applications in image
and video recognition, recommender systems,[5] image classification, image segmentation, medical image
analysis, natural language processing,[6] brain–computer interfaces,[7] and financial time series.[8]

CNN types

LeNet, AlexNet, ZF-Net, VGGNet, GoogLeNet, ResNet

What is recurrent neural network in deep learning?


A recurrent neural network (RNN) is a type of artificial neural network which uses sequential data or
time series data.

What is BPTT in deep learning?

Backpropagation through time (BPTT) is a gradient-based technique for training certain types of
recurrent neural networks. It can be used to train Elman networks. The algorithm was independently
derived by numerous researchers.

What is vanishing and exploding gradients?

Exploding gradient occurs when the derivatives or slope will get larger and larger as we go backward
with every layer during backpropagation. This situation is the exact opposite of the vanishing gradients.
This problem happens because of weights, not because of the activation function.
What is an encoder decoder model?

The best way to understand the concept of an encoder-decoder model is by playing Pictionary. The rules of
the game are very simple, player 1 randomly picks a word from a list and needs to sketch the meaning in a
drawing. The role of the second player in the team is to analyse the drawing and identify the word which it
describes. In this example we have three important elements player 1(the person that converts the word into
a drawing), the drawing (rabbit) and the person that guesses the word the drawing represents (player 2). This
is all we need to understand an encoder decoder model, below we will build a comparative of the Pictionary
game and an encoder decoder model for translating Spanish to English.

Pictionary Game, Image by the author

If we translate the above graph into machine learning concepts, we would see the below one. In the
following sections we will go through each component.

You might also like