6 Min Read: Siwei Xu Aug 27

This document compares frequentist and Bayesian approaches to machine learning, using linear regression as an example. Frequentist methods assume parameters are fixed and find the most likely parameters given data, while Bayesian methods place prior probabilities on parameters and find the posterior probability of parameters given data. Specifically, linear regression uses maximum likelihood estimation to find point estimates, while Bayesian linear regression models the full posterior distribution over parameters and predictions. The document discusses the differences in how the two approaches quantify uncertainty and compares computational complexity.

Uploaded by

eppreta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views4 pages

6 Min Read: Siwei Xu Aug 27

Uploaded by

eppreta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Frequentist vs.

Bayesian Approaches in Machine Learning

A Comparison of Linear Regression and Bayesian Linear Regression

Siwei Xu
Aug 27 · 6 min read
There has always been a debate between Bayesian and frequentist statistical inference. Frequentists
dominated statistical practice during the 20th century. Many common machine learning algorithms like
linear regression and logistic regression use frequentist methods to perform statistical inference. While
Bayesians dominated statistical practice before the 20th century, in recent years many algorithms in the
Bayesian schools like Expectation-Maximization, Bayesian Neural Networks and Markov Chain Monte Carlo
have gained popularity in machine learning.
In this article, we will talk about their differences and connections in the context of machine learning. We
will also use two algorithms for illustration: linear regression and Bayesian linear regression.
Assumptions
For simplicity, we will use θ to denote the model parameter(s) throughout this article.
Frequentist methods assume the observed data is sampled from some distribution. We call this data
distribution the likelihood: P(Data|θ), where θ is treated as is constant and the goal is to find the θ that
would maximize the likelihood. For example, in logistic regression the data is assumed to be sampled from
Bernoulli distribution, and in linear regression the data is assumed to be sample from Gaussian distribution.
Bayesian methods assume the probabilities for both data and hypotheses(parameters specifying the
distribution of the data). In Bayesians, θ is a variable, and the assumptions include a prior distribution of the
hypotheses P(θ), and a likelihood of data P(Data|θ). The main critique of Bayesian inference is the
subjectivity of the prior as different priors may arrive at different posteriors and conclusions.
Parameter Learning
Frequentists use maximum likelihood estimation(MLE) to obtain a point estimation of the parameters θ.
The log-likelihood is expressed as:

The parameters θ are estimated by maximizing the log-likelihood, or minimizing the negative log
likelihood(loss function):

Instead of a point estimate, Bayesians estimate a full posterior distribution of the parameters using the
Bayes’ formula:
You might have noticed the computation of the denominator can be NP-hard because it has an integral (or
summation in the case of classification) over all possible values of θ. You might also wonder if we can have a
point estimate of θ, just like what MLE does. That’s where Maximum A Posteriori(MAP) estimation comes
into play. MAP bypasses the cumbersome computation of the posterior distribution and instead tries to find
the point estimates of θ that maximizes the posterior distribution.

Since logarithmic functions are monotonic, we can rewrite the above equation in the log space and
decompose it into 2 parts: maximizing the likelihood and maximizing the prior distribution:

Doesn’t this look similar to MLE?

In fact, the connection between these two is that MAP can be treated as performing MLE on a regularized
loss function where the prior corresponds to the regularization term. For example, if we assume the prior
distribution to be Gaussian, MAP is equal to MLE with L2 regularization; if we assume the prior distribution
to be Laplace, MAP is equal to MLE with L1 regularization.
There is another method to get a point estimate of the posterior distribution: Expected A
Posteriori(EAP) estimation. The difference between MAP and EAP is that MAP gets the mode(maximum) of
the posterior distribution whereas EAP gets the expected value of the posterior distribution.
Uncertainty
The main difference between frequentist and Bayesian approaches is the way they measure uncertainty in
parameter estimation.
As we mentioned earlier, frequentists use MLE to get point estimates of unknown parameters and they
don’t assign probabilities to possible parameter values. Therefore, to measure uncertainty, Frequentists rely
on null hypothesis and confidence intervals. However, it’s important to point out that confidence intervals
don’t directly translate to probabilities of hypothesis. For example, with a confidence interval of 95%, it only
means 95% of the confidence intervals you’ve generated will cover the true estimate, but it’s incorrect to
say that it covers the true estimate with a probability of 95% .
Bayesians, on the other hand, have a full posterior distribution over the possible parameter values and this
allows them to get uncertainty of the estimate by integrating the full posterior distribution.
Computation
Bayesians are usually more computationally intensive than frequentists due to integration over many
parameters. There are some approaches to reduce the computational intensity by using conjugate priors or
approximating the posterior distribution using sampling methods or variational inference.
Examples
In this section, we will see how to train and make predictions with two algorithms: linear regression and
Bayesian linear regression.
Linear Regression (frequentist)
We assume the below form of a linear regression model where the intercept is incorporated in the
parameter θ:

The data is assumed to be distributed according to Gaussian distribution:

Using MLE to maximize the log likelihood, we can get the point estimate of θ as shown below:

Once we’ve learned the parameters θ from the training data, we can directly use it to make predictions with
new data:

Bayesian Linear Regression (bayesian)

As mentioned earlier, the Bayesian way is to make assumptions for both the prior and likelihood:

Using these assumptions and the Bayes’ formula, we can get the posterior distribution:
At prediction time, we use the posterior distribution and the likelihood to calculate the posterior predictive
distribution:

Notice that the estimation for both the parameters and predictions are full distributions. Of course, if we
only need a point estimate, we can always use MAP or EAP.
Conclusions
The main goal of machine learning is to make predictions using the parameters learned from training data.
Whether we should achieve the goal using frequentist or Bayesian approach depends on :
1. The type of predictions we want: a point estimate or a probability of potential values.
2. Whether we have prior knowledge that can be incorporated into the modeling process.
On a side note, we discussed discriminative and generative models earlier. A common misconception is to
label discriminative models as frequentist and generative models as Bayesian. In fact, both frequentist and
Bayesian approaches can be used for discriminative or generative models. You can refer to this post for
more clarification.
I hope you enjoyed reading this article. :)
Towards Data Science
A Medium publication sharing concepts, ideas, and codes.
155
Sign up for The Daily Pick
By Towards Data Science
Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday to
Thursday. Make learning your daily ritual. Take a look
By signing up, you will create a Medium account if you don’t already have one. Review our Privacy
Policy for more information about our privacy practices.
 Machine Learning
 Bayesian Machine Learning
 Bayesian Statistics
 Data Science
 Editors Pick

Let’s chat about AI. https://www.linkedin.com/in/vivienne-siwei-xu/

Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
Var PPTS
No ratings yet
Var PPTS
249 pages
DA Unit 2
No ratings yet
DA Unit 2
124 pages
Bayesian-Statistics Final 20140416 3
No ratings yet
Bayesian-Statistics Final 20140416 3
38 pages
Mathematical Foundations of Data Science Class Notes
No ratings yet
Mathematical Foundations of Data Science Class Notes
45 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
36 pages
Machine Learning Econometrics Bayesian Algorithms
No ratings yet
Machine Learning Econometrics Bayesian Algorithms
33 pages
Mstat Note14 Bayesian Inference FSP
No ratings yet
Mstat Note14 Bayesian Inference FSP
30 pages
Bayesian Inference: A Practical Primer: Outline
No ratings yet
Bayesian Inference: A Practical Primer: Outline
28 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
Zzzz-Essential Bayes
No ratings yet
Zzzz-Essential Bayes
158 pages
Block 4 ST3189
No ratings yet
Block 4 ST3189
25 pages
Baysian-Slides 16 Bayes Intro
No ratings yet
Baysian-Slides 16 Bayes Intro
49 pages
DS 630 - Lec 02 - ST
No ratings yet
DS 630 - Lec 02 - ST
34 pages
Lecture 5 - 8 Bayesian Estimation
No ratings yet
Lecture 5 - 8 Bayesian Estimation
65 pages
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
No ratings yet
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
23 pages
Bayes 2021 Part1
No ratings yet
Bayes 2021 Part1
44 pages
Slide 1
No ratings yet
Slide 1
37 pages
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
No ratings yet
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
188 pages
Confidence Intervals For Discrete Data in Clinical Research - 1st Edition Unlimited Ebook Download
100% (12)
Confidence Intervals For Discrete Data in Clinical Research - 1st Edition Unlimited Ebook Download
17 pages
Maximum Likelihood Estimation by K.Kashin
No ratings yet
Maximum Likelihood Estimation by K.Kashin
34 pages
PML Class 1 2025
No ratings yet
PML Class 1 2025
54 pages
Frequentist Statistics
No ratings yet
Frequentist Statistics
34 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Bayesian Inference Slides 2021
No ratings yet
Bayesian Inference Slides 2021
37 pages
Bayesian Inference and Computation A Beginner's Guide - Brewer
No ratings yet
Bayesian Inference and Computation A Beginner's Guide - Brewer
40 pages
rpp2024 Rev Statistics
No ratings yet
rpp2024 Rev Statistics
38 pages
19-Bayesian 2
No ratings yet
19-Bayesian 2
39 pages
Bishop2008 Chapter ANewFrameworkForMachineLearnin
No ratings yet
Bishop2008 Chapter ANewFrameworkForMachineLearnin
24 pages
EEL 6935 Data Analytics: Probability Theory
No ratings yet
EEL 6935 Data Analytics: Probability Theory
11 pages
PLSC504 Bayes 2024 Slides
No ratings yet
PLSC504 Bayes 2024 Slides
30 pages
Bayes Manuscripts
No ratings yet
Bayes Manuscripts
180 pages
Lecture 5
No ratings yet
Lecture 5
23 pages
Bayesian Inference
No ratings yet
Bayesian Inference
18 pages
Intro To Bayes Approach. Reasons To Be Bayesian: Differences Between Bayesian and Frequentist Approaches 1
No ratings yet
Intro To Bayes Approach. Reasons To Be Bayesian: Differences Between Bayesian and Frequentist Approaches 1
9 pages
SL Chapter2
No ratings yet
SL Chapter2
14 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
Bayesian Linear Regression-II
No ratings yet
Bayesian Linear Regression-II
12 pages
Lecture 6
No ratings yet
Lecture 6
13 pages
Introduction To Probabilities, Bayesian and Frequentist Statistics
No ratings yet
Introduction To Probabilities, Bayesian and Frequentist Statistics
23 pages
جلسه پنجم-1
No ratings yet
جلسه پنجم-1
15 pages
Notes4 BayesianLearning
No ratings yet
Notes4 BayesianLearning
8 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
Bayesian Methods in Applied Econometrics, Or, Why Econometrics Should Always and Everywhere Be Bayesian
No ratings yet
Bayesian Methods in Applied Econometrics, Or, Why Econometrics Should Always and Everywhere Be Bayesian
14 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
Bayesian Statistics
No ratings yet
Bayesian Statistics
20 pages
Bayes in Meachine Learning
No ratings yet
Bayes in Meachine Learning
12 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
Statistical Models in Toxicology - 1st Edition Exclusive Download
100% (17)
Statistical Models in Toxicology - 1st Edition Exclusive Download
14 pages
Bayesian Statistics and Quality Modelling in The - Groen Kennisnet 134085
No ratings yet
Bayesian Statistics and Quality Modelling in The - Groen Kennisnet 134085
15 pages
Bayesian Analysis - Explanation
No ratings yet
Bayesian Analysis - Explanation
20 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
1 Lecture 5b: Probabilistic Perspectives On ML Algorithms
No ratings yet
1 Lecture 5b: Probabilistic Perspectives On ML Algorithms
6 pages
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
100% (1)
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
35 pages
Interplay of Bayes
No ratings yet
Interplay of Bayes
23 pages
Phylogenetic Tree Construction - Methods
No ratings yet
Phylogenetic Tree Construction - Methods
7 pages
Probability With The Binomial Distribution and Pascal's Triangle - A Key Idea in Statistics
100% (2)
Probability With The Binomial Distribution and Pascal's Triangle - A Key Idea in Statistics
54 pages
Questions For Unit 4
No ratings yet
Questions For Unit 4
6 pages
Bayesian Linear Regression in Data Mining: K.Sathyanarayana Sharma, Dr.S.Rajagopal
No ratings yet
Bayesian Linear Regression in Data Mining: K.Sathyanarayana Sharma, Dr.S.Rajagopal
3 pages
Bayesian Modelling Tuts-4-9
No ratings yet
Bayesian Modelling Tuts-4-9
6 pages
The Bayesian Vs
No ratings yet
The Bayesian Vs
1 page
D.R. Cox Nuffield College, Oxford OX1 1NF, UK E-Mail: David - Cox@nuf - Ox.ac - Uk
No ratings yet
D.R. Cox Nuffield College, Oxford OX1 1NF, UK E-Mail: David - Cox@nuf - Ox.ac - Uk
4 pages
Geological Modeling
No ratings yet
Geological Modeling
27 pages
18.443 MIT Stats Course
No ratings yet
18.443 MIT Stats Course
139 pages
Normal Distribution
No ratings yet
Normal Distribution
30 pages
Bayesian Inference
No ratings yet
Bayesian Inference
5 pages
Effects On Teachers' Self-Efficacy and Job Satisfaction: Teacher Gender, Years of Experience, and Job Stress
No ratings yet
Effects On Teachers' Self-Efficacy and Job Satisfaction: Teacher Gender, Years of Experience, and Job Stress
17 pages
Dickey e Fuller (1981)
No ratings yet
Dickey e Fuller (1981)
17 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
RM Guidelines
No ratings yet
RM Guidelines
62 pages
Examinations: Subject 106 Actuarial Mathematics 2
No ratings yet
Examinations: Subject 106 Actuarial Mathematics 2
8 pages
Hydrology 2 - Flood Data R1
No ratings yet
Hydrology 2 - Flood Data R1
92 pages
Missing Data
No ratings yet
Missing Data
71 pages
Data Handling & Analytics: Unit 5
No ratings yet
Data Handling & Analytics: Unit 5
18 pages
The Role of Forensic Anthropology in Disaster Victim Identification DVI Recent Developments and Future Prospects
No ratings yet
The Role of Forensic Anthropology in Disaster Victim Identification DVI Recent Developments and Future Prospects
14 pages
Tesi Dottorato Veliu..
No ratings yet
Tesi Dottorato Veliu..
119 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
What Risk Is and Why It Is Important
No ratings yet
What Risk Is and Why It Is Important
17 pages
A Beginner's Guide To Fragility, Vulnerability and Risk-Porter
No ratings yet
A Beginner's Guide To Fragility, Vulnerability and Risk-Porter
50 pages
SIMULTECH2016 Barlas
No ratings yet
SIMULTECH2016 Barlas
31 pages
A Critical Review of Common Pitfalls and Guidelines To Effectively Infer
No ratings yet
A Critical Review of Common Pitfalls and Guidelines To Effectively Infer
17 pages
The Use of Gaussian Processes in System Identification
No ratings yet
The Use of Gaussian Processes in System Identification
13 pages
Ecf480 FPD 3 2015 2
No ratings yet
Ecf480 FPD 3 2015 2
15 pages
Don N. Page - Agnesi Weighting For The Measure Problem of Cosmology
No ratings yet
Don N. Page - Agnesi Weighting For The Measure Problem of Cosmology
26 pages
Lecture1 - EM Variational Inference
No ratings yet
Lecture1 - EM Variational Inference
18 pages
What Do TIPS Say About Real Interest Rates and Required Returns
No ratings yet
What Do TIPS Say About Real Interest Rates and Required Returns
25 pages
Stochastic: An Inspiration Generator: Rethink Random
No ratings yet
Stochastic: An Inspiration Generator: Rethink Random
25 pages
Undergraduate Curriculum 0
No ratings yet
Undergraduate Curriculum 0
8 pages
Lista Fabio Cozman
No ratings yet
Lista Fabio Cozman
6 pages
Bayesian Inference: Fundamentals and Applications
From Everand
Bayesian Inference: Fundamentals and Applications
Fouad Sabry
No ratings yet

6 Min Read: Siwei Xu Aug 27

Uploaded by

6 Min Read: Siwei Xu Aug 27

Uploaded by

Frequentist vs.

Bayesian Approaches in Machine Learning

Doesn’t this look similar to MLE?

The data is assumed to be distributed according to Gaussian distribution:

Bayesian Linear Regression (bayesian)

Let’s chat about AI. https://www.linkedin.com/in/vivienne-siwei-xu/

You might also like