0% found this document useful (0 votes)
2 views

Probabilistic Theory of Deep Learning

intro to probabilistic theory

Uploaded by

sethuramanr1976
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Probabilistic Theory of Deep Learning

intro to probabilistic theory

Uploaded by

sethuramanr1976
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Probabilistic Theory of Deep

Learning
This course explores probabilistic deep learning, a robust approach that
embraces uncertainty for more powerful and insightful models.

PRESENTED BY:

S.RANJANI-B.E.(CSE)

IIIrd year
Fundamentals of Probability
Theory
Probability theory is the foundation of probabilistic deep learning. It helps us
understand uncertainty and make informed decisions using data. We'll explore
key ideas like probability distributions, random variables, and Bayes' theorem.

1 Probability 2 Random Variables


Distributions Random variables represent
Probability distributions show things with uncertain values.
how likely different outcomes We'll explore different types,
are for a random event. We'll including those that can only
learn about common be whole numbers and those
distributions, such as the that can have any value.
Gaussian, Bernoulli, and
Poisson.

3 Conditional 4 Bayes' Theorem


Probability Bayes' theorem helps us
Conditional probability helps update our beliefs about
us understand the chances of something based on new
one event happening when information. It's a key part of
another event has already probabilistic modeling and
occurred. This is important for used a lot in Bayesian deep
making predictions and learning.
understanding data.
Understanding Uncertainty in
Deep Learning
Uncertainty in deep learning (DL) refers to the model’s confidence or lack of
certainty about its predictions.

Uncertainty helps us understand the reliability of deep learning models and


how confident we can be in their results. We'll explore various techniques to
quantify uncertainty, including Bayesian methods, model

Epistemic Uncertainty Aleatoric Uncertainty


This type of uncertainty arises Aleatoric uncertainty is a type of
from limited data or model uncertainty that arises from
imperfections. It can be reduced the random nature of a system
by gathering more data or or data and can't be explained
improving the model. away. It can be characterized as
the "noise" in data and is
irreducible, meaning it can't be
reduced with additional
information.
Maximum Likelihood
Estimation
MLE is a key method for estimating model parameters in probabilistic deep
learning. Instead of single predictions, probabilistic models produce probability
distributions over outcomes. MLE aims to find the parameter values that
maximize the likelihood of the observed data within the model's distribution.

Steps of Maximum Likelihood in Deep Learning

1. Define the Model

2. Write the Likelihood Function

3. Take the Log of the Likelihood (Log-Likelihood)

4. Maximize the Log-Likelihood

5. Estimation of the parameter


Let's break it down with a smaller example:
You have a coin, and you're trying to figure out the best guess for the
probability of heads (Y) based on the observed data.You flip the coin 3 times
and record the results.

Let’s say:

( X ) = outcome of the coin flip (either heads or tails)


( Y ) = probability of heads (we're trying to estimate this)

Data (Observed ( X ))

You flip the coin 3 times and get the following outcomes:

Flip 1: Heads (H)

Flip 2: Tails (T)


Flip 3: Heads (H)

1. Define the Model

The probability distribution is:

P(Heads) = Y
P(Tails) = 1 - Y

2.Likelihood Function

The likelihood function is a fundamental concept in statistics, especially in


the context of maximum likelihood estimation (MLE). It measures how well a
particular set of parameters explains the observed data.

The likelihood of observing the outcomes (H, T, H) is the product of the


probabilities for each outcome. Since we got heads, tails, and heads, the
likelihood is:

Likelihood = P(Heads) x P(Tails) x P(Heads) = Y x (1 - Y) x Y = Y^2 x (1 - Y)


3. Log-Likelihood
Since multiplying probabilities can get messy, we take the log of the likelihood
(log-likelihood):

Log-Likelihood = log(Y^2 x (1 - Y))

=2log(Y x (1-Y)) (power rule)

= 2log(Y) + log(1 - Y) (product rule)

4.Maximizing the log-likelihood

Let’s try guessing different values of ( Y ) (the probability of heads) and see
which gives us the highest log-likelihood.

Guess 1: ( Y = 0.5 ) (Fair Coin)

Log-Likelihood= 2 log(0.5) + log(1 - 0.5) = 2(-0.693) + (-0.693) = -2.079

Guess 2: ( Y = 0.8 ) (Biased towards Heads)

Log-Likelihood = 2 log(0.8) + log(1 - 0.8) = 2(-0.223) + (-1.609) = -2.055

Guess 3: ( Y = 0.6 ) (Biased slightly towards Heads)

Log-Likelihood = log(0.6) + (1 - 0.6) = 2(-0.511) + (-0.916) = -1.938

5. Estimation of the parameter

From the calculations, we see that the log-likelihood is highest when ( Y = 0.6
), which means the coin is estimated to have a 60% chance of landing heads
based on our observed data.
Bayesian Inference
Bayesian inference helps us learn from data by updating our beliefs about
things we don't know. It's a powerful tool in deep learning, allowing us to build
models that are more reliable and can handle uncertainty better. We'll explore
how Bayesian inference works and how it's used in deep learning, including
Bayesian neural networks and Bayesian optimization.

1 Prior Distribution
The prior distribution represents our initial guess about the
unknown things before we look at any data. It reflects what we
already know or believe about the problem.

2 Likelihood Function
The likelihood function tells us how likely the data is, given a
particular guess about the unknowns. It measures how well
our model fits the observed data.

3 Posterior Distribution
The posterior distribution combines our initial guess (prior)
and the information from the data (likelihood) to give us a
more accurate belief about the unknowns after looking at the
data.

Steps in Bayesian inference:

1. Define the Prior Probability

2. Define the Likelihood


3. Calculate the Total Probability
4. Apply Bayes' Theorem
5. Interpretation
Example:
Problem Setup:

Imagine you have a coin that might be biased. You want to determine the
probability that the coin is biased toward heads after flipping it a few times.

Step 1: Define the Prior Probability

Before flipping the coin, you have a belief about whether it’s fair or biased.
Let's say:

Prior belief: You think there’s a 50% chance the coin is fair (50% heads)
and a 50% chance it is biased (70% heads).

So we define:

P(Fair)=0.5
P(Biased)=0.5

Step 2: Define the Likelihood

Next, you flip the coin 3 times and get 2 heads and 1 tail. Now you need to
calculate how likely it is to get this result under both hypotheses.

The formula is:

P(k|X)=n!/k! x (n−k) x p^k x (1−p)^n−k

X: probability function (Fair, Biased)


p: probability of getting heads
1−p: probability of getting tails
n: Total number of coin flips -(3)
k: Number of successes (e.g., number of heads you want to calculate the
probability for)-(2)

Likelihood if the coin is fair:

p=0.5

P(2heads∣Fair)=3!/2!(3-2)! x (0.5)^2 x (1-0.5)^1=0.375

Likelihood if the coin is biased:

p=0.7

P(2heads∣Biased)=3!/2!(3-2)! x (0.7)^2 x (1-0.7)^1=0.441


Step 3: Calculate the Total Probability of Getting
2 Heads
Now we need the total probability of getting 2 heads, P(2 heads):

P(E)=P(E∣H1) P(H1)+P(E∣H2) P(H2)+…+P(E∣Hn) P(Hn)

E=Event H=Hypotheses

P(2heads)=P(2heads∣Fair)×P(Fair)+P(2heads∣Biased)×P(Biased)

P(2heads)=(0.375×0.5)+(0.441×0.5)

=0.1875+0.2205=0.408

Step 4: Apply Bayes' Theorem

Now we can use Bayes' Theorem to find the posterior probability that the coin
is biased given that we observed 2 heads:

The formula for Bayes' Theorem is:

P(A∣B)=P(B∣A) P(A)/P(B)

P(A∣B): The posterior probability


P(B∣A): The likelihood
P(A): The prior probability
P(B): The total probability

P(Biased∣2heads)=P(2heads∣Biased)×P(Biased)/P(2heads)

P(Biased∣2heads)=0.441×0.50/408

=0.22050.408≈0.540P

Step 5: Interpretation

After observing 2 heads in 3 flips, the probability that the coin is biased is
approximately 54%. This means your belief has shifted from an initial 50%
chance to a 54% chance that the coin is biased towards heads.
Optimisation and
Generalization:
Optimization

What it is: Optimization is the process of adjusting the model parameters (like
weights in a neural network) to minimize the difference between the predicted
outputs and the actual outputs (errors).

Why it matters: Effective optimization helps the model learn from the training
data, improving its performance.

In Probabilistic DL:

Optimization often involves minimizing a loss function, which quantifies


how well the model predicts outcomes.
Techniques like gradient descent are used to find the best parameter
values that lower the loss, taking uncertainty into account.

Generalization

What it is: Generalization refers to a model's ability to perform well on new,


unseen data, not just the data it was trained on.

Why it matters: If a model only learns the training data too well (overfitting), it
won’t be effective when faced with new examples. Good generalization means
the model can apply what it learned to different situations.

In Probabilistic DL:

Probabilistic models use uncertainty estimates to help generalize better.


For example, by considering the distribution of possible outputs, a model
can be more flexible and robust when making predictions on new data.
Practical Applications and
Future Directions
Probabilistic deep learning solves many real-world problems in fields like
computer vision, language understanding, robotics, and healthcare.

Healthcare Autonomous Driving


Probabilistic deep learning helps Probabilistic models improve self-
analyze medical images, predict driving cars' perception, decision-
diseases, and personalize making, and risk management.
treatments.

Robotics
Probabilistic approaches enable Cognitive Science
robots to navigate, manipulate Probabilistic models help us
objects, and interact with humans. understand human cognition and
decision-making.

You might also like