0% found this document useful (0 votes)

87 views25 pages

Statistical Learning

1. The document discusses statistical learning and key concepts like probability distributions, joint distributions, conditional probability, and Bayes' rule. 2. Bayesian learning is introduced as a way to compute the posterior probability of a hypothesis given evidence using Bayes' theorem. 3. An example of Bayesian prediction and learning using a candy problem is provided to illustrate computing posterior probabilities and predictions.

Uploaded by

Alex Stihi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views25 pages

Statistical Learning

Uploaded by

Alex Stihi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

CS480/680

Lecture 4: May 15, 2019

Statistical Learning
[RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

Statistical Learning

• View: we have uncertain knowledge of the world

• Idea: learning simply reduces this uncertainty

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

Terminology
• Probability distribution:
– A specification of a probability for each event in
our sample space
– Probabilities must sum to 1
• Assume the world is described by two (or
more) random variables
– Joint probability distribution
• Specification of probabilities for all combinations of
events

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

Joint distribution

• Given two random variables ! and ":

• Joint distribution:
Pr(! = ' Λ " = )) for all ', )

• Marginalisation (sumout rule):

Pr(! = ') = Σ) Pr(! = ' Λ " = ))
Pr(" = )) = Σ' Pr(! = ' Λ " = ))

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

Example: Joint Distribution
sunny ~sunny
cold ~cold cold ~cold

headache 0.108 0.012 headache 0.072 0.008

~headache 0.016 0.064 ~headache 0.144 0.576

P(headacheΛsunnyΛcold) = P(~headacheΛsunnyΛ~cold) =

P(headacheVsunny) =

P(headache) =

marginalization
University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5
Conditional Probability
• Pr($|&): fraction of worlds in which & is true
that also have $ true
H=“Have headache”
F=“Have Flu”
F
Pr(() = 1/10
Pr(-) = 1/40
Pr((|-) = 1/2
H
Headaches are rare and flu is
rarer, but if you have the flu,
then there is a 50-50 chance
you will have a headache
University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6
Conditional Probability
F Pr($|*) = Fraction of flu inflicted
worlds in which you have a
headache
H
=(# worlds with flu and headache)/
(# worlds with flu)

= (Area of “H and F” region)/

H=“Have headache” (Area of “F” region)
F=“Have Flu”
= Pr($ Λ *)/ Pr(*)
Pr($) = 1/10
Pr(*) = 1/40
Pr($|*) = 1/2

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

Conditional Probability
• Definition:
Pr($|&) = Pr($ Λ &) / Pr(&)

• Chain rule:
Pr($ Λ &) = Pr($|&) Pr(&)

Memorize these!

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

Inference
F One day you wake up with a
headache. You think “Drat! 50%
of flues are associated with
H
headaches so I must have a 50-
50 chance of coming down with
the flu”

H=“Have headache”
F=“Have Flu” Is your reasoning correct?

Pr($) = 1/10 Pr(*Λ$) =

Pr(*) = 1/40
Pr($|*) = 1/2 Pr * $ =

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

Example: Joint Distribution
sunny ~sunny
cold ~cold cold ~cold

headache 0.108 0.012 headache 0.072 0.008

~headache 0.016 0.064 ~headache 0.144 0.576

Pr(ℎ%&'&(ℎ% Λ (*+' | -.//0) =

Pr(ℎ%&'&(ℎ% Λ (*+' | ~-.//0) =

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

Bayes Rule
• Note
Pr($|&)Pr(&) = Pr($Λ&) = Pr(&Λ$) = Pr(&|$)*+($)

• Bayes Rule
Pr(&|$) = [(Pr($|&)Pr(&)]/Pr($)

Memorize this!

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

Using Bayes Rule for inference
• Often we want to form a hypothesis about the world
based on what we have observed
• Bayes rule is vitally important when viewed in terms
of stating the belief given to hypothesis H, given
evidence e
Prior probability
Likelihood

Posterior probability
Normalizing constant
University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12
Bayesian Learning
• Prior: Pr($)
• Likelihood: Pr(&|$)
• Evidence: ( = < &1, &2, … , &/ >

• Bayesian Learning amounts to computing the

posterior using Bayes’ Theorem:
Pr($|() = 1 Pr((|$)Pr($)

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

Bayesian Prediction
• Suppose we want to make a prediction about an
unknown quantity X

• Pr($|&) = Σ* Pr($|&, ℎ- ).(ℎ* |&)

= Σ* Pr($|ℎ- ).(ℎ* |&)

• Predictions are weighted averages of the predictions

of the individual hypotheses
• Hypotheses serve as “intermediaries” between raw
data and prediction

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

Candy Example
• Favorite candy sold in two flavors:
– Lime (hugh)
– Cherry (yum)
• Same wrapper for both flavors
• Sold in bags with different ratios:
– 100% cherry
– 75% cherry + 25% lime
– 50% cherry + 50% lime
– 25% cherry + 75% lime
– 100% lime

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

Candy Example
• You bought a bag of candy but don’t know its flavor
ratio

• After eating ! candies:

– What’s the flavor ratio of the bag?
– What will be the flavor of the next candy?

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16

Statistical Learning
• Hypothesis H: probabilistic theory of the world
– ℎ1: 100% cherry
– ℎ2: 75% cherry + 25% lime
– ℎ3: 50% cherry + 50% lime
– ℎ4: 25% cherry + 75% lime
– ℎ5: 100% lime
• Examples E: evidence about the world
– '1: 1st candy is cherry
– '2: 2nd candy is lime
– '3: 3rd candy is lime
– …

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17

Candy Example
• Assume prior Pr($) = < 0.1, 0.2, 0.4, 0.2, 0.1 >
• Assume candies are i.i.d. (identically and
independently distributed)
Pr(/|ℎ) = P2 3(42|ℎ)
• Suppose first 10 candies all taste lime:
Pr(/|ℎ5) =
Pr(/|ℎ3) =
Pr(/|ℎ1) =

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 18

Posterior

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 19

Prediction
Probability that next candy is lime

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 20

Bayesian Learning
• Bayesian learning properties:
– Optimal (i.e. given prior, no other prediction is correct
more often than the Bayesian one)
– No overfitting (all hypotheses considered and weighted)

• There is a price to pay:

– When hypothesis space is large, Bayesian learning may be
intractable
– i.e. sum (or integral) over hypothesis often intractable
• Solution: approximate Bayesian learning

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 21

Maximum a posteriori (MAP)
• Idea: make prediction based on most probable
hypothesis ℎ"#$
ℎ"#$ = &'()&*ℎ+ Pr(ℎ+ |0)
Pr(2|0) » Pr(2|ℎ345 )

• In contrast, Bayesian learning makes prediction

based on all hypotheses weighted by their
probability

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 22

MAP properties
• MAP prediction less accurate than Bayesian
prediction since it relies only on one hypothesis ℎ"#$
• But MAP and Bayesian predictions converge as data
increases
• Controlled overfitting (prior can be used to penalize
complex hypotheses)

• Finding ℎ"#$ may be intractable:

– ℎ"#$ = &'()&*+ Pr(ℎ|0)
– Optimization may be difficult

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 23

Maximum Likelihood (ML)
• Idea: simplify MAP by assuming uniform prior
(i.e., Pr(ℎ% ) = Pr(ℎ() "), ()
ℎ+,- = ./01.2ℎ Pr(ℎ) Pr(3|ℎ)
ℎ+5 = ./01.2ℎ Pr(3|ℎ)

• Make prediction based on ℎ+5 only:

Pr(6|3) » Pr(6|ℎ78 )

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 24

ML properties
• ML prediction less accurate than Bayesian and MAP
predictions since it ignores prior info and relies only
on one hypothesis ℎ"#
• But ML, MAP and Bayesian predictions converge as
data increases
• Subject to overfitting (no prior to penalize complex
hypothesis that could exploit statistically insignificant
data patterns)

• Finding ℎ"# is often easier than ℎ"$%

ℎ"# = '()*'+ℎ Σ- log Pr(4-|ℎ)
University of Waterloo CS480/680 Spring 2019 Pascal Poupart 25

Unit 2 - Probabilistic Reasoning
No ratings yet
Unit 2 - Probabilistic Reasoning
25 pages
Naive Bayes
No ratings yet
Naive Bayes
60 pages
ML - Unit4pdf
No ratings yet
ML - Unit4pdf
65 pages
Wren 80i Gas Turbine Engine Tech Specs
100% (1)
Wren 80i Gas Turbine Engine Tech Specs
25 pages
AL3391 AI UNIT 5 NOTES EduEngg
100% (1)
AL3391 AI UNIT 5 NOTES EduEngg
26 pages
Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
BML Lecture Notes
No ratings yet
BML Lecture Notes
126 pages
Regression Analysis
100% (2)
Regression Analysis
28 pages
Operations Research Multiple Choice Questions: B. Scientific
No ratings yet
Operations Research Multiple Choice Questions: B. Scientific
35 pages
Announcements
0% (1)
Announcements
29 pages
AIML - Module 4 - Updated
No ratings yet
AIML - Module 4 - Updated
41 pages
Ai Unit 2
No ratings yet
Ai Unit 2
30 pages
Cs 228
No ratings yet
Cs 228
98 pages
03 MLE MAP NBayes-1-21-2015
No ratings yet
03 MLE MAP NBayes-1-21-2015
40 pages
Ai (It) Unit-3
No ratings yet
Ai (It) Unit-3
85 pages
Unit 5
No ratings yet
Unit 5
98 pages
L08 Probabilistic Reasoning
No ratings yet
L08 Probabilistic Reasoning
90 pages
CHP: 13 and 14
No ratings yet
CHP: 13 and 14
62 pages
L07 Probabilistic Reasoning Till Sep6
No ratings yet
L07 Probabilistic Reasoning Till Sep6
71 pages
Guitar Setup Guide
No ratings yet
Guitar Setup Guide
6 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
Introduction To Uncertainity
No ratings yet
Introduction To Uncertainity
66 pages
Probability
No ratings yet
Probability
56 pages
Lecture7 - Probabilistic Reasoning (Updated)
No ratings yet
Lecture7 - Probabilistic Reasoning (Updated)
59 pages
Lecture 11
No ratings yet
Lecture 11
49 pages
CS115 Probability
No ratings yet
CS115 Probability
41 pages
Lec6 - Probabilistic Reasoning
No ratings yet
Lec6 - Probabilistic Reasoning
36 pages
CTY#18MAT41#MODULE-3#Binomial and Poisson Distribution-Problems
No ratings yet
CTY#18MAT41#MODULE-3#Binomial and Poisson Distribution-Problems
21 pages
Unit2 AI & ML
No ratings yet
Unit2 AI & ML
29 pages
Bayesian
No ratings yet
Bayesian
40 pages
ProbabilityStatitic Review
No ratings yet
ProbabilityStatitic Review
41 pages
BaYesian Models Machine Learning 2016
No ratings yet
BaYesian Models Machine Learning 2016
126 pages
l18 Irsw Pir
No ratings yet
l18 Irsw Pir
36 pages
Bayesian
No ratings yet
Bayesian
91 pages
G53mle L5
No ratings yet
G53mle L5
48 pages
Aiml Unit 2 Notes
No ratings yet
Aiml Unit 2 Notes
40 pages
Ai Unit 2
No ratings yet
Ai Unit 2
33 pages
Ai Unit 2-1
No ratings yet
Ai Unit 2-1
33 pages
Cs3491 Aiml Unit 2 Qbank
No ratings yet
Cs3491 Aiml Unit 2 Qbank
33 pages
Bayes For Beginners: Luca Chech and Jolanda Malamud Supervisor: Thomas Parr 13 February 2019
No ratings yet
Bayes For Beginners: Luca Chech and Jolanda Malamud Supervisor: Thomas Parr 13 February 2019
41 pages
Module3 - Learning, Uncertainity Lecture Notes. 16861418577274
No ratings yet
Module3 - Learning, Uncertainity Lecture Notes. 16861418577274
30 pages
Wa0031.
No ratings yet
Wa0031.
41 pages
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
No ratings yet
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
141 pages
(R17A1204) Artificial Intelligence (6) - 119-143
No ratings yet
(R17A1204) Artificial Intelligence (6) - 119-143
25 pages
SP14 CS188 Lecture 12 - Probability - Print
No ratings yet
SP14 CS188 Lecture 12 - Probability - Print
33 pages
UNIT 5 Artificial Intelligence Notes
No ratings yet
UNIT 5 Artificial Intelligence Notes
20 pages
Unit 4 Ci 2017
No ratings yet
Unit 4 Ci 2017
22 pages
Bayes Reasoning
No ratings yet
Bayes Reasoning
45 pages
Unit 5
No ratings yet
Unit 5
25 pages
ML Unit 4-1-24
No ratings yet
ML Unit 4-1-24
24 pages
Artificial Intelligence: Lecture 12 - Probability Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 12 - Probability Dr. Shivanjali Khare
32 pages
MLE - MAP - 1 18 11 Ann
No ratings yet
MLE - MAP - 1 18 11 Ann
20 pages
Uncertainity and Knowledge Engineering
No ratings yet
Uncertainity and Knowledge Engineering
24 pages
18CS71 Module 4
No ratings yet
18CS71 Module 4
30 pages
Current State of The Course!!!: We're Done With Part I Search and Planning! Part II: Probabilistic Reasoning
No ratings yet
Current State of The Course!!!: We're Done With Part I Search and Planning! Part II: Probabilistic Reasoning
30 pages
AIML Module 3,4
No ratings yet
AIML Module 3,4
16 pages
Time Series Analysis Henrik Madsen
100% (1)
Time Series Analysis Henrik Madsen
156 pages
G53mle L5
No ratings yet
G53mle L5
48 pages
SP14 CS188 Lecture 12 - Probability
No ratings yet
SP14 CS188 Lecture 12 - Probability
35 pages
Cs3351 Aiml Unit 2 Notes Eduengg
No ratings yet
Cs3351 Aiml Unit 2 Notes Eduengg
26 pages
Acting Under Uncertainty - Bayesian Inference-Probabilistic Reasoning
No ratings yet
Acting Under Uncertainty - Bayesian Inference-Probabilistic Reasoning
22 pages
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
No ratings yet
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
18 pages
3.1 New
No ratings yet
3.1 New
12 pages
Biostatistics and Epidemiology Course Outline
No ratings yet
Biostatistics and Epidemiology Course Outline
4 pages
Lec04 BayesianLearning
No ratings yet
Lec04 BayesianLearning
39 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
Ansys Fluent Guide - Creating Simple Geometries
No ratings yet
Ansys Fluent Guide - Creating Simple Geometries
18 pages
Cheats For GTA San Andreas
No ratings yet
Cheats For GTA San Andreas
4 pages
Objective Assignment 7: (Https://swayam - Gov.in)
No ratings yet
Objective Assignment 7: (Https://swayam - Gov.in)
4 pages
Columbus Tubes 2018 Catalogue V3 PDF
No ratings yet
Columbus Tubes 2018 Catalogue V3 PDF
34 pages
Advanced Signal Processing Linear Stochastic Processes
No ratings yet
Advanced Signal Processing Linear Stochastic Processes
66 pages
Panel Data: Fixed and Random Effects: I1 0 I1 0 I I1
No ratings yet
Panel Data: Fixed and Random Effects: I1 0 I1 0 I I1
8 pages
Advanced Signal Processing Introduction To Estimation Theory
No ratings yet
Advanced Signal Processing Introduction To Estimation Theory
40 pages
Completely Randomized Design
No ratings yet
Completely Randomized Design
34 pages
Panel Data Regression Models-Seminar
No ratings yet
Panel Data Regression Models-Seminar
18 pages
Assignment Project Using SPSS
No ratings yet
Assignment Project Using SPSS
14 pages
Agni4 - Agni5 - Zbus Agni3 Transmission Line Parameters Gauss2 - Gauss Seidal With Data
No ratings yet
Agni4 - Agni5 - Zbus Agni3 Transmission Line Parameters Gauss2 - Gauss Seidal With Data
5 pages
The Lady Tasting Coffee: A Case Study in Experimental Design
No ratings yet
The Lady Tasting Coffee: A Case Study in Experimental Design
19 pages
Econometrics Assignment
No ratings yet
Econometrics Assignment
2 pages
Mago, Jessica Marionne O. - Hypothesis Tests in Simple Linear Regression - Quiz
No ratings yet
Mago, Jessica Marionne O. - Hypothesis Tests in Simple Linear Regression - Quiz
2 pages
The Dual in Linear Programming Problem
No ratings yet
The Dual in Linear Programming Problem
10 pages
Expected Mean Squares (HZAU)
No ratings yet
Expected Mean Squares (HZAU)
11 pages
EstimationTheory Lecture 03
No ratings yet
EstimationTheory Lecture 03
21 pages
Vanilla Risk Parity Python
No ratings yet
Vanilla Risk Parity Python
7 pages
Full Summary of Panel Data
No ratings yet
Full Summary of Panel Data
17 pages
Decision Analysis
No ratings yet
Decision Analysis
36 pages
Chapter 12 Answer 2 Madeira Manufacturin
No ratings yet
Chapter 12 Answer 2 Madeira Manufacturin
7 pages
Trek Emonda ALR 5 Disc Specifications
No ratings yet
Trek Emonda ALR 5 Disc Specifications
2 pages
Tutorial Sheet 5
No ratings yet
Tutorial Sheet 5
1 page
Queueing Theory, Game Theory, CPM and Quadratic Programming Short Questions
No ratings yet
Queueing Theory, Game Theory, CPM and Quadratic Programming Short Questions
4 pages
Inventory Management - OPMG406 - L05 - Probabilistic Models and Safety Stock - Spring22
No ratings yet
Inventory Management - OPMG406 - L05 - Probabilistic Models and Safety Stock - Spring22
24 pages
Foi 5590
No ratings yet
Foi 5590
19 pages
The Lagrangian Relaxation Method For Solving Integer Programming Problems
No ratings yet
The Lagrangian Relaxation Method For Solving Integer Programming Problems
12 pages
Sequential Quadratic Programming
No ratings yet
Sequential Quadratic Programming
52 pages
BSN3 BioStat 6binomial Distribution-1
No ratings yet
BSN3 BioStat 6binomial Distribution-1
22 pages
Lesson Plan For Probability and Statistics - 2024-1
No ratings yet
Lesson Plan For Probability and Statistics - 2024-1
3 pages
2 PDF
No ratings yet
2 PDF
12 pages
Exercise 1. Decision Trees, EVPI and EVMI
No ratings yet
Exercise 1. Decision Trees, EVPI and EVMI
10 pages
Analysis of Variance-Two Way Classification
No ratings yet
Analysis of Variance-Two Way Classification
4 pages
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)

Statistical Learning

Uploaded by

Statistical Learning

Uploaded by

CS480/680

Lecture 4: May 15, 2019

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

• View: we have uncertain knowledge of the world

• Idea: learning simply reduces this uncertainty

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

• Given two random variables ! and ":

• Marginalisation (sumout rule):

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

headache 0.108 0.012 headache 0.072 0.008

~headache 0.016 0.064 ~headache 0.144 0.576

= (Area of “H and F” region)/

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

Pr($) = 1/10 Pr(*Λ$) =

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

headache 0.108 0.012 headache 0.072 0.008

~headache 0.016 0.064 ~headache 0.144 0.576

Pr(ℎ%&'&(ℎ% Λ (*+' | -.//0) =

Pr(ℎ%&'&(ℎ% Λ (*+' | ~-.//0) =

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

• Bayesian Learning amounts to computing the

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

• Pr($|&) = Σ* Pr($|&, ℎ- ).(ℎ* |&)

• Predictions are weighted averages of the predictions

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

• After eating ! candies:

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 18

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 19

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 20

• There is a price to pay:

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 21

• In contrast, Bayesian learning makes prediction

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 22

• Finding ℎ"#$ may be intractable:

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 23

• Make prediction based on ℎ+5 only:

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 24

• Finding ℎ"# is often easier than ℎ"$%

You might also like