0% found this document useful (0 votes)
29 views

Deep Learning

The document discusses deep learning, its definitions, architectures like deep neural networks and deep belief networks. It also discusses the differences between machine learning and deep learning, the working of deep learning, its limitations and advantages. Applications of deep learning discussed include automatic text generation, healthcare, machine translation, image recognition and predicting earthquakes. The document also discusses Bayesian learning, its features, difficulties, Bayes' theorem and decision surfaces.

Uploaded by

zaid zaid
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Deep Learning

The document discusses deep learning, its definitions, architectures like deep neural networks and deep belief networks. It also discusses the differences between machine learning and deep learning, the working of deep learning, its limitations and advantages. Applications of deep learning discussed include automatic text generation, healthcare, machine translation, image recognition and predicting earthquakes. The document also discusses Bayesian learning, its features, difficulties, Bayes' theorem and decision surfaces.

Uploaded by

zaid zaid
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

1) Introduction to Deep Learning:-

What is Deep Learning?


Deep learning is a branch of machine learning which is completely
based on artificial neural networks, as neural network is going to mimic the
human brain so deep learning is also a kind of mimic of human brain. In
deep learning, we don’t need to explicitly program everything. The concept
of deep learning is not new. It has been around for a couple of years now.
It’s on hype nowadays because earlier we did not have that much
processing power and a lot of data. As in the last 20 years, the processing
power increases exponentially, deep learning and machine learning came
in the picture.

A formal definition of deep learning is- neurons

Deep learning is a particular kind of machine learning that


achieves great power and flexibility by learning to represent the
world as a nested hierarchy of concepts, with each concept
defined in relation to simpler concepts, and more abstract
representations computed in terms of less abstract ones.

In human brain approximately 100 billion neurons all together this is a


picture of an individual neuron and each neuron is connected through
thousand of their neighbours.

The question here is how do we recreate these neurons in a computer. So,


we create an artificial structure called an artificial neural net where we have
nodes or neurons. We have some neurons for input value and some for
output value and in between, there may be lots of neurons interconnected
in the hidden layer.

Architectures :

1. Deep Neural Network – It is a neural network with a certain level of


complexity (having multiple hidden layers in between input and output
layers). They are capable of modeling and processing non-linear
relationships.
2. Deep Belief Network(DBN) – It is a class of Deep Neural Network. It
is multi-layer belief networks.
Steps for performing DBN :
a. Learn a layer of features from visible units using Contrastive
Divergence algorithm.
b. Treat activations of previously trained features as visible units and
then learn features of features.
c. Finally, the whole DBN is trained when the learning for the final
hidden layer is achieved.
3. Recurrent (perform same task for every element of a sequence)
Neural Network – Allows for parallel and sequential computation.
Similar to the human brain (large feedback network of connected
neurons). They are able to remember important things about the input
they received and hence enables them to be more precise.

Difference between Machine Learning and Deep Learning :

Machine Learning Deep Learning

Works on small amount of Dataset for Works on Large amount of


accuracy. Dataset.

Dependent on Low-end Machine. Heavily dependent on High-end


Machine.

Divides the tasks into sub-tasks, Solves problem end to end.


solves them individually and finally
combine the results.

Takes less time to train. Takes longer time to train.

Testing time may increase. Less time to test the data.


Working :

First, we need to identify the actual problem in order to get the right solution
and it should be understood, the feasibility of the Deep Learning should
also be checked (whether it should fit Deep Learning or not). Second, we
need to identify the relevant data which should correspond to the actual
problem and should be prepared accordingly. Third, Choose the Deep
Learning Algorithm appropriately. Fourth, Algorithm should be used while
training the dataset. Fifth, Final testing should be done on the dataset.

Limitations :

1. Learning through observations only.


2. The issue of biases.

Advantages :

1. Best in-class performance on problems.


2. Reduces need for feature engineering.
3. Eliminates unnecessary costs.
4. Identifies defects easily that are difficult to detect.

Disadvantages :

1. Large amount of data required.


2. Computationally expensive to train.
3. No strong theoretical foundation.

Applications :

1. Automatic Text Generation – Corpus of text is learned and from this


model new text is generated, word-by-word or character-by-
character.
Then this model is capable of learning how to spell, punctuate, form
sentences, or it may even capture the style.
2. Healthcare – Helps in diagnosing various diseases and treating it.
3. Automatic Machine Translation – Certain words, sentences or
phrases in one language is transformed into another language (Deep
Learning is achieving top results in the areas of text, images).
4. Image Recognition – Recognizes and identifies peoples and objects
in images as well as to understand content and context. This area is
already being used in Gaming, Retail, Tourism, etc.
5. Predicting Earthquakes – Teaches a computer to perform
viscoelastic computations which are used in predicting earthquakes.

2) Bayesian Learning:-
A learning technique that determines model parameters (such as the
network weights) by maximizing the posterior probability of the parameters
given the training data.The idea is that some parameter values are more
consistent with the observed data than others. By Bayes’ rule, maximizing
the posterior probability amounts to maximizing the so-called model
evidence, defined as the conditional probability of the training data given
the model parameters. The evidence often can be approximated by some
closed formula or by an update rule. Bayesian techniques render it possible
to use all data for training instead of reserving patterns for cross-validation
of parameters. Bayesian learning methods is able to provide useful
practical solutions and forecasting features toward solving complicated
problems
Features of Bayesian learning methods:
• Each observed training example can incrementally decrease or increase
the estimated probability that a hypothesis is correct.
– This provides a more flexible approach to learning than algorithms that
completely eliminate a hypothesis if it is found to be inconsistent with any
single example.
• Prior knowledge can be combined with observed data to determine the
final probability of a hypothesis. In Bayesian learning, prior knowledge is
provided by asserting
– a prior probability for each candidate hypothesis, and
– a probability distribution over observed data for each possible hypothesis.
• Bayesian methods can accommodate hypotheses that make probabilistic
predictions
• New instances can be classified by combining the predictions of multiple
hypotheses, weighted by their probabilities.
• Even in cases where Bayesian methods prove computationally
intractable, they can provide a standard of optimal decision making against
which other practical methods can be measured.

Difficulties with Bayesian Methods


• Require initial knowledge of many probabilities
– When these probabilities are not known in advance they are often
estimated based on background knowledge, previously available data, and
assumptions about the form of the underlying distributions.
• Significant computational cost is required to determine the Bayes optimal
hypothesis in the general case (linear in the number of candidate
hypotheses).
– In certain specialized situations, this computational cost can be
significantly reduced.

Bayes Theorem
• In machine learning, we try to determine the best hypothesis from some
hypothesis space H, given the observed training data D.
• In Bayesian learning, the best hypothesis means the most probable
hypothesis, given the data D plus any initial knowledge about the prior
probabilities of the various hypotheses in H.
• Bayes theorem provides a way to calculate the probability of a
hypothesis based on its prior probability, the probabilities of observing
various data given the hypothesis, and the observed data itself.
Bayes Theorem
P(h) is prior probability of hypothesis h
– P(h) to denote the initial probability that hypothesis h holds, before
observing training data.
– P(h) may reflect any background knowledge we have about the chance
that h is correct. If we have no such prior knowledge, then each candidate
hypothesis might simply get the same prior probability.

P(D) is prior probability of training data D


– The probability of D given no knowledge about which hypothesis holds

P(h|D) is posterior probability of h given D


– P(h|D) is called the posterior probability of h, because it reflects our
confidence that h holds after we have seen the training data D.
– The posterior probability P(h|D) reflects the influence of the training data
D, in contrast to the prior probability P(h), which is independent of D.

P(D|h) is posterior probability of D given h


– The probability of observing data D given some world in which hypothesis
h holds.
– Generally, we write P(xly) to denote the probability of event x given event
y.
Bayes Theorem
• In ML problems, we are interested in the probability P(h|D) that h
holds given the observed training data D.
• Bayes theorem provides a way to calculate the posterior probability
P(h|D), from the prior probability P(h), together with P(D) and P(D|h).
Bayes Theorem: P(h∨D)=P(D∨H ) P(h)/P (D)∨¿

• P(h|D) increases with P(h) and P(D|h) according to Bayes theorem.


• P(h|D) decreases as P(D) increases, because the more probable it is
that D will be observed independent of h, the less evidence D provides
in support of h.

3) Decision Surfaces:-
Decision surface A (hyper) surface in a multidimensional state
space that partitions the space into different regions. Data lying on one side
of a decision surface are defined as belonging to a different class from
those lying on the other. Decision surfaces may be created or modified as a
result of a learning process and they are frequently used in machine
learning, pattern recognition, and classification systems.
Classification in machine learning means training our data to assign labels
in our input dataset.

Each input feature defines an axis on the feature space. The minimum
number of features required to form a plane is two, with dots representing
input coordinates in the feature space. If there were three input variables,
the feature space would be a three-dimensional volume.

The motive of the classification model is to separate the feature space so


that we can decide the class label for points in the feature space with
minimum error.

This separation is done by decision surface or boundary, and it works as a


demonstrative tool for visualizing the model on a classification predictive
modeling task.

The data points lying to one side of the decision surface belong to one
class label to those lying on the other side of the surface. Due to the model
learning process, we can create or modify decision boundaries.

Although the word ‘surface’ suggests a 2-D feature space, we can still use
these methods for more than two features by creating a decision surface
for each pair of input features.

4) Linear Classifiers:-
Linear classifiers classify data into labels based on a linear combination
of input features. Therefore, these classifiers separate data using a line or
plane or a hyperplane (a plane in more than 2 dimensions). They can
only be used to classify data that is linearly separable. They can be
modified to classify non-linearly separable data.

5)

You might also like