0% found this document useful (0 votes)

15 views77 pages

3 Prob-Review

The document provides an introduction to machine learning with a focus on probability theory, covering essential concepts such as sample space, random variables, and probability distributions. Key topics include Kolmogorov’s axioms, expectation, variance, and Bayes' rule, along with examples and definitions related to discrete and continuous random variables. The document also outlines common probability distributions and their applications in machine learning.

Uploaded by

alsanahesapmesap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views77 pages

3 Prob-Review

Uploaded by

alsanahesapmesap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 77

Introduction to Machine Learning

Probability Theory for ML

Readings:
Mitchell Ch. 1, 2, 6.1 – 6.3
Murphy Ch. 2
Bishop Ch. 1 - 2

Slides are adopted from Matt Gormley, Rob Hall, Zahra Koochak and Jeremy Irvin 1
Outline
• Motivation
• Probability Theory
– Sample space, Outcomes, Events
– Kolmogorov’s Axioms of Probability
• Random Variables
– Random variables, Probability mass function (pmf), Probability density function (pdf),
Cumulative distribution function (cdf)
– Examples
– Notation
– Expectation and Variance
– Joint, conditional, marginal probabilities
– Independence
– Bayes’ Rule
• Common Probability Distributions
– Beta, Dirichlet, etc.
• Recap of Decision Trees
– Entropy
– Information Gain
• Probability in ML

2
Why Probability?

3
Computer Why Probability? Domain of
Science Interest

Machine Learning

Optimization Statistics

Probability

Calculus Measure
Theory Linear Algebra 4
PROBABILITY THEORY

5
Probability Theory: Definitions
Example 1: Flipping a coin
Sample Space {Heads, Tails}
Outcome Example: Heads
Event Example: {Heads}
Probability P({Heads}) = 0.5
P({Tails}) = 0.5

6
Probability Theory: Definitions
Probability provides a science for inference
about interesting events
Sample Space The set of all possible outcomes
Outcome Possible result of an experiment
Event Any subset of the sample space
Probability The non-negative number assigned
to each event in the sample space
• Each outcome is unique
• Only one outcome can occur per experiment
• An outcome can be in multiple events
• An elementary event consists of exactly one outcome
7
Probability Theory: Definitions
Example 2: Rolling a 6-sided die
Sample Space {1,2,3,4,5,6}
Outcome Example: 3
Event Example: {3}
(the event “the die came up 3”)
Probability P({3}) = 1/6
P({4}) = 1/6

8
Probability Theory: Definitions
Example 2: Rolling a 6-sided die
Sample Space {1,2,3,4,5,6}
Outcome Example: 3
Event Example: {2,4,6}
(the event “the roll was even”)
Probability P({2,4,6}) = 0.5
P({1,3,5}) = 0.5

9
Probability Theory: Definitions
Example 3: Timing how long it takes a monkey to
reproduce Shakespeare
Sample Space [0, +∞)
Outcome Example: 1,433,600 hours
Event Example: [1, 6] hours
Probability P([1,6]) = 0.000000000001
P([1,433,600, +∞)) = 0.99

10
Kolmogorov’s Axioms

11
Kolmogorov’s Axioms
All of
probability can
be derived
from just
these!

In words:
1. Each event has non-negative probability.
2. The probability that some event will occur is one.
3. The probability of the union of many disjoint sets is
the sum of their probabilities
12
Axioms Deriving Probability Theorems

Monotonicity: if A is a subset of B, then P(A) <= P(B)

Proof:
• A subset of B ➔ B = A + C for C=B-A
• A and C are disjoint ➔ P(B) = P(A or C)=P(A) + P(C)
• P(C) >= 0

• So P(B) >= P(A)

13
Slide adapted from William Cohen (10-601B, Spring 2016)
Probability Theory: Definitions
• The complement of an event E, denoted ~E,
is the event that E does not occur.

14
Axioms Deriving Probability Theorems

Theorem: P(~A) = 1 - P(A)

Proof:
• P(A or ~A) = P(Ω) = 1
• A and ~A are disjoint ➔ P(A) + P(~A )=P(A or ~A)
➔ P(A) + P(~A) = 1

….then solve for P(~A)

15
Slide adapted from William Cohen (10-601B, Spring 2016)
Axioms Deriving Probability Theorems

Theorem: P(A or B) = P(A) + P(B) - P(A and B)

- P(E2)
Proof: P(E1)+P(E2)
P(E3)+P(E2)
• E1 = A and ~(A and B)
• E2 = (A and B)
• E3 = B and ~(A and B)
• E1 or E2 or E3 = A or B and E1, E2, E3 disjoint ➔
P(A or B) = P(E1) + P(E2) + P(E3)
• further P(A) = P(E1) + P(E2) and P(B) = P(E3) + P(E2)
• ...

16
Slide adapted from William Cohen (10-601B, Spring 2016)
These Axioms are Not to be Trifled With
- Andrew Moore
• There have been many many other approaches to
understanding “uncertainty”:
• Fuzzy Logic, three-valued logic, Dempster-Shafer, non-monotonic
reasoning, …
• 40 years ago people in AI argued about these; now they
mostly don’t
– Any scheme for combining uncertain information, uncertain “beliefs”,
etc,… really should obey these axioms to be internally consistent (from
Jayne, 1958; Cox 1930’s)
– If you gamble based on “uncertain beliefs”, then [you can be exploited
by an opponent]  [your uncertainty formalism violates the axioms] -
di Finetti 1931 (the “Dutch book argument”)
RANDOM VARIABLES

18
Random Variables: Definitions
Random Def 1: Variable whose possible values
Variable are the outcomes of a random
(capital experiment
letters)

Value of a The value taken by a random variable

Random (lowercase
letters)
Variable

19
Random Variables: Definitions
Random Def 1: Variable whose possible values
Variable are the outcomes of a random
experiment

Discrete Random variable whose values come

Random from a countable set (e.g. the natural
Variable numbers or {True, False})
Continuous Random variable whose values come
Random from an interval or collection of
Variable intervals (e.g. the real numbers or the
range (3, 5)) 20
Random Variables: Definitions
Random Def 1: Variable whose possible values
Variable are the outcomes of a random
experiment

Def 2: A measureable function from

the sample space to the real numbers:

Discrete Random variable whose values come

Random from a countable set (e.g. the natural
Variable numbers or {True, False})
Continuous Random variable whose values come
Random from an interval or collection of
Variable intervals (e.g. the real numbers or the
range (3, 5)) 21
Random Variables: Definitions
Discrete Random variable whose values come
Random from a countable set (e.g. the natural
Variable numbers or {True, False})
Probability Function giving the probability that
mass discrete r.v. X takes value x.
function
(pmf)

22
Random Variables: Definitions
Example 2: Rolling a 6-sided die
Sample Space {1,2,3,4,5,6}
Outcome Example: 3
Event Example: {3}
(the event “the die came up 3”)
Probability P({3}) = 1/6
P({4}) = 1/6

23
Random Variables: Definitions
Example 2: Rolling a 6-sided die
Sample Space {1,2,3,4,5,6}
Outcome Example: 3
Event Example: {3}
(the event “the die came up 3”)
Probability P({3}) = 1/6
P({4}) = 1/6
Discrete Ran- Example: The value on the top face
dom Variable of the die.
Prob. Mass p(3) = 1/6
Function p(4) = 1/6
(pmf)
24
Random Variables: Definitions
Example 2: Rolling a 6-sided die
Sample Space {1,2,3,4,5,6}
Outcome Example: 3
Event Example: {2,4,6}
(the event “the roll was even”)
Probability P({2,4,6}) = 0.5
P({1,3,5}) = 0.5
Discrete Ran- Example: 1 if the die landed on an
dom Variable even number and 0 otherwise
Prob. Mass p(1) = 0.5
Function p(0) = 0.5
(pmf)
25
Random Variables: Definitions
Discrete Random variable whose values come
Random from a countable set (e.g. the natural
Variable numbers or {True, False})
Probability Function giving the probability that
mass discrete r.v. X takes value x.
function
(pmf)

26
Random Variables: Definitions
Continuous Random variable whose values come
Random from an interval or collection of
Variable intervals (e.g. the real numbers or the
range (3, 5))
Probability Function returns a nonnegative real
density indicating the relative likelihood that a
function continuous r.v. X takes value x
(pdf)
• For any continuous random variable: P(X = x) = 0
• Non-zero probabilities are only available to intervals:

27
Random Variables: Definitions
Example 3: Timing how long it takes a monkey to
reproduce Shakespeare
Sample Space [0, +∞)
Outcome Example: 1,433,600 hours
Event Example: [1, 6] hours
Probability P([1,6]) = 0.000000000001
P([1,433,600, +∞)) = 0.99
Continuous Example: Represents time to
Random Var. reproduce (not an interval!)
Prob. Density Example: Gamma distribution
Function
28
Random Variables: Definitions
“Region”-valued Random Variables
Sample Space Ω {1,2,3,4,5}
Events x The sub-regions 1, 2, 3, 4, or 5

Discrete Ran- X Represents a random selection of a

dom Variable sub-region
Prob. Mass Fn. P(X=x) Proportional to size of sub-region

X=1 X=4

X=3

X=5
X=2

29
Random Variables: Definitions
“Region”-valued Random Variables
Sample Space Ω All points in the region:
Events x The sub-regions 1, 2, 3, 4, or 5
RecallRan-
Discrete that an event
X Represents a random selection of a
domisVariable
any subset of the sub-region
sample space.
Prob.
SoMass
bothFn. P(X=x) Proportional to size of sub-region
definitions
of the sample space
here are valid.
X=1 X=4

X=3

X=5
X=2

30
Random Variables: Definitions
Cumulative Function that returns the probability
distribution that a random variable X is less than or
function equal to x:

• For discrete random variables:

• For continuous random variables:

32
Cumulative Distribution Function

33
34
Random Variables and Events
Question: Something seems wrong… Random Def 2: A measureable
• We defined P(E) (the capital ‘P’) as Variable function from the
a function mapping events to sample space to the
probabilities
real numbers:
• So why do we write P(X=x)?
• A good guess: X=x is an event…

Answer: P(X=x) is just shorthand! These sets are events!

Example 1:

Example 2:

35
Notational Shortcuts
A convenient shorthand:

36
Expectation and Variance
The expected value of X is E[X]. Also called the mean.

• Discrete random variables:

• Continuous random variables:

38
Expectation and Variance
The variance of X is Var(X).

• Discrete random variables:

• Continuous random variables:

39
Variance

40
Variance

41
Multiple Random Variables
• Joint probability
• Marginal probability
• Conditional probability

42
Joint Probability

43
Slide from Sam Roweis (MLSS, 2005)
Marginal Probabilities

44
Slide from Sam Roweis (MLSS, 2005)
45
Conditional Probability

46
Slide from Sam Roweis (MLSS, 2005)
Independence and
Conditional Independence

47
Slide from Sam Roweis (MLSS, 2005)
48
49
Definition of Conditional
Probability
P(A ^ B)
P(A|B) = -----------
P(B)

Corollary: The Chain Rule

P(A ^ B) = P(A|B) P(B)

Slide from William Cohen (10-601B, Spring 2016)

BAYES’ RULE

56
prior
posterior
P(B|A) * P(A)
P(A|B) = Bayes’ rule
P(B)

P(A|B) * P(B)
P(B|A) =
P(A) Bayes, Thomas (1763) An essay towards
solving a problem in the doctrine of
chances. Philosophical Transactions of the
Royal Society of London, 53:370-418

…by no means merely a curious speculation in the doctrine of chances, but

necessary to be solved in order to a sure foundation for all our reasonings
concerning past facts, and what is likely to be hereafter…. necessary to be
considered by any that would give a clear account of the strength of
analogical or inductive reasoning…

Slide from William Cohen (10-601B, Spring 2016)

60
61
COMMON PROBABILITY
DISTRIBUTIONS

62
Common Probability Distributions
• For Discrete Random Variables:
– Bernoulli
– Binomial
– Multinomial
– Categorical
– Poisson
• For Continuous Random Variables:
– Exponential
– Gamma
– Beta
– Dirichlet
– Laplace
– Gaussian (1D)
– Multivariate Gaussian

63
Bernoulli Distribution

64
Binomial Distribution

Slide from http://mathworld.wolfram.com/BinomialDistribution.html 65

Multinomial Distribution

66
Continuous Distributions

67
68
69
70
71
72
73
74
Variance

75
Law of Large Numbers

76
Law of Large Numbers (LLN)

77
Central Limit Theorem (CLT)

78
79
80
84
Oh, the Places You’ll Use Probability!
Supervised Classification
• Naïve Bayes

• Logistic regression

85
Oh, the Places You’ll Use Probability!
ML Theory
(Example: Sample Complexity)

86
Oh, the Places You’ll Use Probability!
Deep Learning
(Example: Deep Bi-directional RNN)

y1 y2 y3 y4

h1 h2 h3 h4

x1 x2 x3 x4

87
Oh, the Places You’ll Use Probability!
Graphical Models
• Hidden Markov Model (HMM)
<START> n v p d n

time flies like an arrow

• Conditional Random Field (CRF)

<START> ψ0 n ψ2 v ψ4 p ψ6 d ψ8 n

ψ1 ψ3 ψ5 ψ7 ψ9

88
Summary
1. Probability theory is rooted in (simple)
axioms
2. Random variables provide an important
tool for modeling the world
3. Our favorite probability distributions are
just functions! (usually with interesting
properties)
4. Probability and Statistics are essential to
Machine Learning
89

Asia Pacific Business Schools
100% (1)
Asia Pacific Business Schools
11 pages
Chapter3-Probability Distribution
100% (1)
Chapter3-Probability Distribution
35 pages
ML DL AI Cheatsheet
No ratings yet
ML DL AI Cheatsheet
52 pages
Binomial Probability Table
No ratings yet
Binomial Probability Table
4 pages
Stochastic Processes SM
No ratings yet
Stochastic Processes SM
82 pages
Probability
No ratings yet
Probability
69 pages
ML Cheat Sheet
50% (2)
ML Cheat Sheet
74 pages
Lecture03 Probability Review
No ratings yet
Lecture03 Probability Review
48 pages
Introduction To Discrete Probability Theory and Bayesian Networks
No ratings yet
Introduction To Discrete Probability Theory and Bayesian Networks
26 pages
Probability Distributions
No ratings yet
Probability Distributions
56 pages
Statistics and Probability Katabasis
No ratings yet
Statistics and Probability Katabasis
7 pages
Chapter 2 - Random Variables and Probabi - 2016 - Introduction To Statistical Ma
No ratings yet
Chapter 2 - Random Variables and Probabi - 2016 - Introduction To Statistical Ma
14 pages
Random Variables
No ratings yet
Random Variables
44 pages
BMA2102 Probability and Statistics II Lecture 1
No ratings yet
BMA2102 Probability and Statistics II Lecture 1
15 pages
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
No ratings yet
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
8 pages
Session 2 PDF
No ratings yet
Session 2 PDF
25 pages
CPSC531 Probability
No ratings yet
CPSC531 Probability
75 pages
Probability Basics
No ratings yet
Probability Basics
19 pages
PSFDS Unit1 Week3
No ratings yet
PSFDS Unit1 Week3
15 pages
All in One CheatSheet PDF
No ratings yet
All in One CheatSheet PDF
52 pages
Unit7 Probability Statistics I-1
No ratings yet
Unit7 Probability Statistics I-1
49 pages
340 Printable Course Notes
No ratings yet
340 Printable Course Notes
184 pages
TP Lecture1h
No ratings yet
TP Lecture1h
34 pages
CHP 5
No ratings yet
CHP 5
63 pages
Probability Review
No ratings yet
Probability Review
12 pages
L1 Queue
No ratings yet
L1 Queue
9 pages
Random Variables
No ratings yet
Random Variables
26 pages
Probability-The Science of Uncertainty and Data
No ratings yet
Probability-The Science of Uncertainty and Data
4 pages
Basic Probability Review
No ratings yet
Basic Probability Review
77 pages
PPT6-Probability and Random Variables
No ratings yet
PPT6-Probability and Random Variables
42 pages
Intro To Probability (Pattern Recognition)
No ratings yet
Intro To Probability (Pattern Recognition)
94 pages
Random Variables
No ratings yet
Random Variables
14 pages
Math MCQ Probability
100% (2)
Math MCQ Probability
6 pages
0b755df5-44c6-48da-9ad3-bafb0798629c
100% (5)
0b755df5-44c6-48da-9ad3-bafb0798629c
15 pages
Week05-06 EC With Annotations
No ratings yet
Week05-06 EC With Annotations
84 pages
2ND SEM 1ST QUARTER Statistic and Probability
100% (6)
2ND SEM 1ST QUARTER Statistic and Probability
7 pages
Probability 360
No ratings yet
Probability 360
74 pages
PCS Unit 3
No ratings yet
PCS Unit 3
16 pages
AI ML Cheatsheet
No ratings yet
AI ML Cheatsheet
51 pages
Lecture 1
No ratings yet
Lecture 1
81 pages
Random Variables and Distributions - New
No ratings yet
Random Variables and Distributions - New
84 pages
PTSP
No ratings yet
PTSP
74 pages
Ejemplo de Inferencia Umvue
No ratings yet
Ejemplo de Inferencia Umvue
10 pages
Notes UnitIV
No ratings yet
Notes UnitIV
32 pages
PTSP
No ratings yet
PTSP
101 pages
STA3030F - Jan 2015 PDF
No ratings yet
STA3030F - Jan 2015 PDF
13 pages
Foundations of Machine Learning: Part A: Probability Basics
No ratings yet
Foundations of Machine Learning: Part A: Probability Basics
75 pages
CH-5 Stat I
100% (1)
CH-5 Stat I
20 pages
SI Chapter-1
No ratings yet
SI Chapter-1
30 pages
Information & Communication
No ratings yet
Information & Communication
13 pages
Stat Worksheet
No ratings yet
Stat Worksheet
1 page
Chapter 1 - Review of Probability Theory - Part 1
No ratings yet
Chapter 1 - Review of Probability Theory - Part 1
15 pages
Beamer GEC 4 Probability 025041
No ratings yet
Beamer GEC 4 Probability 025041
26 pages
Statistics Minitab - Lab 8: and Standard Deviation
No ratings yet
Statistics Minitab - Lab 8: and Standard Deviation
5 pages
Exponential PDF
100% (1)
Exponential PDF
2 pages
Unit 2 P&S
No ratings yet
Unit 2 P&S
82 pages
Probability Review
No ratings yet
Probability Review
5 pages
Prob Q
No ratings yet
Prob Q
2 pages
Silo - Tips - Stat X 0 X 1
No ratings yet
Silo - Tips - Stat X 0 X 1
5 pages
Soalan Klon Set C Pelajar SM025
No ratings yet
Soalan Klon Set C Pelajar SM025
5 pages
SASA
No ratings yet
SASA
4 pages
Normal Distribution
No ratings yet
Normal Distribution
4 pages
QQQ - S1 Chapter 8 - Discrete Random Variables: (Recommended Time: 20 Minutes)
No ratings yet
QQQ - S1 Chapter 8 - Discrete Random Variables: (Recommended Time: 20 Minutes)
5 pages
Week 8 - Queuing Theory PDF
No ratings yet
Week 8 - Queuing Theory PDF
33 pages
WINSEM2024-25 MAT1011 ETH AP2024254000674 2025-01-22 Reference-Material-I
No ratings yet
WINSEM2024-25 MAT1011 ETH AP2024254000674 2025-01-22 Reference-Material-I
32 pages
Overhead Variance Analysis PDF
100% (1)
Overhead Variance Analysis PDF
2 pages
Statistics Probability Distributions I Is em
No ratings yet
Statistics Probability Distributions I Is em
58 pages
Math 1280 Statistics Final Exam Review Pages 31 36 PDF
No ratings yet
Math 1280 Statistics Final Exam Review Pages 31 36 PDF
6 pages
The Kullback-Leibler Divergence For Univariate Models
No ratings yet
The Kullback-Leibler Divergence For Univariate Models
2 pages
Probability FoundationalMathofAI S24
No ratings yet
Probability FoundationalMathofAI S24
7 pages
Module 1
No ratings yet
Module 1
12 pages
MQM100 PracticeExercises-2
No ratings yet
MQM100 PracticeExercises-2
16 pages
Normal Probability Distributions
No ratings yet
Normal Probability Distributions
104 pages
MAS STAT Question Bank
No ratings yet
MAS STAT Question Bank
2 pages
Discrete Prob Dist
No ratings yet
Discrete Prob Dist
69 pages
Module 1 Lecture 4-Probability Distributions
No ratings yet
Module 1 Lecture 4-Probability Distributions
39 pages
P&S Unit 1
No ratings yet
P&S Unit 1
50 pages
6917 Code:: - : (This Paper FC-Z Unique '. of The of Your Roll
No ratings yet
6917 Code:: - : (This Paper FC-Z Unique '. of The of Your Roll
6 pages
Quartiles, Deciles, Percentiles, Skewness, and Kurtosis
No ratings yet
Quartiles, Deciles, Percentiles, Skewness, and Kurtosis
11 pages
0.1. Probability Review
No ratings yet
0.1. Probability Review
6 pages
Statics CH5
No ratings yet
Statics CH5
89 pages
All Cheat Shests 1749903425
No ratings yet
All Cheat Shests 1749903425
3 pages
Module 1
No ratings yet
Module 1
99 pages
cs620 Lec 161 To 200
No ratings yet
cs620 Lec 161 To 200
46 pages
2023 237 Midterm
No ratings yet
2023 237 Midterm
37 pages
Chapter3 Examples
No ratings yet
Chapter3 Examples
3 pages
StallGuard-Muhammed Talha Karagül
No ratings yet
StallGuard-Muhammed Talha Karagül
5 pages
Concept Writing Guidelines - 2025
No ratings yet
Concept Writing Guidelines - 2025
20 pages
Introduction
No ratings yet
Introduction
6 pages
Brand Ladder
No ratings yet
Brand Ladder
10 pages
PML Class 0 2025
No ratings yet
PML Class 0 2025
55 pages
Chap6 ClassificationBasic
No ratings yet
Chap6 ClassificationBasic
83 pages
Insightment and Observation
No ratings yet
Insightment and Observation
21 pages
Chapter 3
No ratings yet
Chapter 3
230 pages
Chapter 1
No ratings yet
Chapter 1
144 pages
2 Intro Overview v2
No ratings yet
2 Intro Overview v2
74 pages
NB 24 Aug
No ratings yet
NB 24 Aug
79 pages
Best Ones
No ratings yet
Best Ones
32 pages
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Probability Theory: A Concise Course
From Everand
Probability Theory: A Concise Course
Y. A. Rozanov
4/5 (2)

3 Prob-Review

Uploaded by

3 Prob-Review

Uploaded by

Introduction to Machine Learning

Probability Theory for ML

Monotonicity: if A is a subset of B, then P(A) <= P(B)

• So P(B) >= P(A)

Theorem: P(~A) = 1 - P(A)

….then solve for P(~A)

Theorem: P(A or B) = P(A) + P(B) - P(A and B)

Value of a The value taken by a random variable

Discrete Random variable whose values come

Def 2: A measureable function from

Discrete Random variable whose values come

Discrete Ran- X Represents a random selection of a

• For discrete random variables:

• For continuous random variables:

Answer: P(X=x) is just shorthand! These sets are events!

• Discrete random variables:

• Continuous random variables:

• Discrete random variables:

• Continuous random variables:

Corollary: The Chain Rule

Slide from William Cohen (10-601B, Spring 2016)

…by no means merely a curious speculation in the doctrine of chances, but

Slide from William Cohen (10-601B, Spring 2016)

Slide from http://mathworld.wolfram.com/BinomialDistribution.html 65

time flies like an arrow

• Conditional Random Field (CRF)

You might also like