0% found this document useful (0 votes)
28 views

EEL 6935 Data Analytics: Probability Theory

This document summarizes key concepts from a lecture on probability theory, including: 1) Probability provides a framework for quantifying and manipulating uncertainty from inherent randomness, measurement noise, and finite data sizes. 2) Frequentist probability is based on frequencies of observations, while Bayesian probability allows quantification of both repeatable and non-repeatable events by updating probabilities with evidence. 3) Bayesian and frequentist approaches differ in their interpretations of probability and techniques for model selection, regularization, and accuracy evaluation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

EEL 6935 Data Analytics: Probability Theory

This document summarizes key concepts from a lecture on probability theory, including: 1) Probability provides a framework for quantifying and manipulating uncertainty from inherent randomness, measurement noise, and finite data sizes. 2) Frequentist probability is based on frequencies of observations, while Bayesian probability allows quantification of both repeatable and non-repeatable events by updating probabilities with evidence. 3) Bayesian and frequentist approaches differ in their interpretations of probability and techniques for model selection, regularization, and accuracy evaluation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

EEL

6935 Data Analytics

LECTURE 2

Probability Theory

Jan. 11, 2018


Uncertainty & Probability
• Uncertainty in data:
• inherent in the observed physical process (e.g., voltage measurement in power grid, # customers in a market)
• noise in measurement (e.g., hardware/software limitations)
• finite data size (i.e., lack of access to the entire population)
• Probability:
• a consistent framework for quantification and manipulation of uncertainty
• helps in decision making (e.g., our brains)
• Frequentist probability ~ frequency of observations
• marginal probability joint probability conditional probability
𝑝 𝑌 = 𝑜 = 𝑛/ /𝑛 𝑝 𝑋 = 𝑟, 𝑌 = 𝑔 = 𝑛)* /𝑛 𝑝 𝑌 = 𝑔 𝑋 = 𝑏 = 𝑛-* /𝑛-

• Sum rule: 𝑝 𝑋 = 2 𝑝(𝑋, 𝑌) 𝑛) 𝑛-


4
• Product rule: 𝑝 𝑋, 𝑌 = 𝑝 𝑌 𝑋 𝑝(𝑋)
𝑔 𝑛)* 𝑛-* 𝑛*
𝑝(𝑋, 𝑌) 𝑝 𝑌 𝑋 𝑝(𝑋) 𝑌
• Bayes Theorem: 𝑝 𝑋 𝑌 = =
𝑛)/
𝑝(𝑌) ∑7 𝑝 𝑌 𝑋 𝑝(𝑋) o 𝑛-/ 𝑛/

𝑝 𝑋 = 𝑏 𝑌 = 𝑔 =? 𝑟 𝑏
𝑋
Probability
• Probability density of a continuous variable
- B
𝑝 𝑥 𝜖 𝑎, 𝑏 = < 𝑝 𝑥 𝑑𝑥 𝑝(𝑥) ≥ 0 < 𝑝 𝑥 𝑑𝑥 = 1
> CB
F
𝑃 𝑦 = < 𝑝 𝑥 𝑑𝑥 𝑝(𝐱) ≥ 0 H 𝑝 𝐱 𝑑𝐱 = 1
CB

𝐸𝑓 𝑥 =
𝑝 𝑥 = < 𝑝 𝑥, 𝑦 𝑑𝑦 𝑝 𝑥, 𝑦 = 𝑝 𝑦 𝑥 𝑝(𝑥)
2 𝑓 𝑥 𝑝(𝑥)
K

𝐸K 𝑓 𝑥, 𝑦 = 𝐸K|F 𝑓 𝑥, 𝑦 |𝑦 =
𝐸𝑓 𝑥 = < 𝑓 𝑥 𝑝 𝑥 𝑑𝑥
< 𝑓 𝑥, 𝑦 𝑝 𝑥, 𝑦 𝑑𝑥 < 𝑓 𝑥, 𝑦 𝑝 𝑥|𝑦 𝑑𝑥

𝑉𝑎𝑟 𝑓 𝐶𝑜𝑣 𝑥, 𝑦 𝐶𝑜𝑣 𝐱, 𝐲


O = 𝐸K,F 𝑥 − 𝐸 𝑥 𝑦−𝐸 𝑦 𝑻
=𝐸 𝑓 𝑥 −𝐸 𝑓 𝑥 = 𝐸𝐱,𝐲 𝐱 − 𝐸 𝐱 𝐲−𝐸 𝐲
O O
=𝐸 𝑓 𝑥 −𝐸 𝑓 𝑥 = 𝐸K,F 𝑥𝑦 − 𝐸 𝑥 𝐸 𝑦 = 𝐸𝐱,𝐲 𝐱𝐲 𝑻 − 𝐸 𝐱 𝐸 𝐲 T
Bayesian Probability
• Classical/Frequentist interpretation of probability ~ frequencies of repeatable events
• Bayesian probability ~ a quantification of uncertainty
• repeatable and non-repeatable events, e.g., the probability of a dragon flying through the window
• update with evidence, e.g., it is shown that there exist dragons in Florida, and there are small ones
that can fit through a window.
𝑝 𝐷 𝐱 𝑝(𝐱)
𝑝 𝐱𝐷 =
𝑝(𝐷)

posterior α likelihood x prior

• Prior probability is not an arbitrary choice, reflects common sense (or uninformative)

• Challenge: for predictions and model comparison, marginalization typically difficult!

𝑝 𝐷 = H 𝑝 𝐷 𝐱 𝑝 𝐱 𝑑𝐱
Bayesian vs. Frequentist
Bayesian Frequentist
Likelihood fixed data, random parameters random data, fixed parameters
training data: training + validation data:
Model Selection evidence cross validation (may be
(Occam’s razor) computationally cumbersome)
Regularization naturally provided by prior needs additional penalty
(prevents overfitting)
Accuracy naturally provided by posterior needs additional techniques
(quality evaluation) (confidence interval, bootstrap)

• Bayesian prior may not be realistic, but more and more training data decreases the effect of prior

• Advances in computational power, as well as techniques for computing posterior & marginal
(e.g., sampling techniques such as MCMC, and approximate inference such as variational Bayes)
promote Bayesian approach, enable its use in Big Datasets.
Gaussian Distribution
Gaussian Mean and Variance
Multivariate Gaussian
Gaussian Parameter Estimation

Likelihood function
Maximum (Log) Likelihood
Properties of and

You might also like