0% found this document useful (0 votes)
9 views2 pages

ANNParameter Estimation-II, III

This lecture discusses the incorporation of prior knowledge into parameter estimation using Bayesian methods, contrasting it with maximum likelihood estimation (MLE). It introduces the concept of Maximum A Posteriori (MAP) estimation, which maximizes posterior probability by combining likelihood and prior distributions, particularly emphasizing the use of the Beta distribution as a suitable prior for probabilities. The importance of selecting appropriate priors is highlighted, as they can influence estimation accuracy and regularization in the modeling process.

Uploaded by

lokeshgopal2104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views2 pages

ANNParameter Estimation-II, III

This lecture discusses the incorporation of prior knowledge into parameter estimation using Bayesian methods, contrasting it with maximum likelihood estimation (MLE). It introduces the concept of Maximum A Posteriori (MAP) estimation, which maximizes posterior probability by combining likelihood and prior distributions, particularly emphasizing the use of the Beta distribution as a suitable prior for probabilities. The importance of selecting appropriate priors is highlighted, as they can influence estimation accuracy and regularization in the modeling process.

Uploaded by

lokeshgopal2104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Parameter Estimation – II and III -Prior And MAP Estimate

This lecture builds on the concept of maximum likelihood estimation (MLE) by introducing
the idea of incorporating prior knowledge into parameter estimation using Bayesian
methods. The key points covered are as follows:
1. Motivation for Incorporating Prior Knowledge:
• Maximum Likelihood Limitation:
o The earlier motivation for using MLE was based on the assumption of no prior
knowledge about the parameters before gathering data. However, if you do
have some prior knowledge, you can incorporate it into the estimation
process.
• Example with Coin Tossing:
o In a coin-tossing experiment, if you have prior knowledge or a belief (e.g., a
fair coin with a high probability), you can use this information as a prior
probability in your estimation process.
2. Prior Distribution:
• Role of Priors:
o A prior distribution represents your beliefs about the parameters before
observing the data. For example, you might believe the probability of heads
(ρ) in a coin toss is around 0.5, and you can model this belief using a prior
distribution.
• Choice of Prior:
o A Gaussian distribution is mentioned as a possible prior, but it’s not ideal
because it can take values outside the [0, 1] range. Instead, the Beta
distribution is introduced as a more suitable prior for probabilities, as it is
defined within [0, 1].
3. Maximum A Posteriori (MAP) Estimation:
• MAP vs. MLE:
o MAP estimation is introduced as an alternative to MLE. While MLE maximizes
the likelihood, MAP maximizes the posterior probability, which incorporates
both the likelihood and the prior distribution.
• Simplifying MAP Estimation:
o The posterior probability can be simplified by focusing on maximizing the
numerator (prior multiplied by the likelihood), as the denominator (evidence)
is constant for all parameters and can be ignored in the optimization process.
• Logarithmic Transformation:
o To handle the product of probabilities, the logarithm of the posterior is taken,
converting the product into a summation, making the optimization easier.
4. Using Priors for Regularization:
• Regularization through Priors:
o Priors can be used to enforce regularization in parameter estimation. For
example, a prior that favors small parameter values can prevent overfitting by
penalizing large values, similar to the effects of Lasso (L1) or Ridge (L2)
regularization.
• Impact of Priors:
o If the prior is well-aligned with the true parameters, less data is needed for
accurate estimation. Conversely, if the prior is misaligned, more data is
required to correct the estimation. An incorrect or overly strong prior can
lead to biased results that are hard to correct, regardless of the data.
5. Beta Distribution and Pseudo Counts:
• Beta Distribution:
o The Beta distribution is described as a prior for probabilities, where the
parameters of the distribution (α and β) can be interpreted as "pseudo
counts" of heads and tails in the context of a coin-tossing experiment.
• Pseudo Counts:
o The α parameter increases the count of heads, and the β parameter increases
the count of tails, influencing the estimated probability ρ. For instance, if α >
β, the prior skews towards a higher probability of heads.
• Reasoning about Priors:
o Understanding the relationship between the prior parameters and the
estimated probabilities allows for reasoning about the effects of different
priors on the final estimates.
6. Conclusion:
• The lecture concludes by emphasizing the importance of priors in Bayesian
estimation, their role in regularization, and the need to carefully consider the choice
of prior to avoid biased or incorrect results.

You might also like