0% found this document useful (0 votes)
25 views15 pages

Essentials of Machine Learning - Lesson 02

The document discusses key concepts in probability and machine learning: 1) It defines probability as the quantification of uncertainty and discusses frequentist and Bayesian interpretations. 2) It introduces random variables as quantities subject to variation from chance, which can take discrete or continuous values. 3) It explains concepts like expectation, joint distributions, marginal and conditional probability, and independence. Bayes' rule is also covered for calculating conditional probabilities.

Uploaded by

e1840006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views15 pages

Essentials of Machine Learning - Lesson 02

The document discusses key concepts in probability and machine learning: 1) It defines probability as the quantification of uncertainty and discusses frequentist and Bayesian interpretations. 2) It introduces random variables as quantities subject to variation from chance, which can take discrete or continuous values. 3) It explains concepts like expectation, joint distributions, marginal and conditional probability, and independence. Bayes' rule is also covered for calculating conditional probabilities.

Uploaded by

e1840006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Essentials of Machine Learning

Lesson 02 - Probability

T Essentials of Machine Learning


What is probability?

• Quantification of uncertainty
• Frequentist interpretation: long run frequencies of events
e.g.: The probability of a particular coin landing heads up is 0.5
• Bayesian interpretation: quantify our degrees of belief about something
e.g.: the probability of it raining tomorrow is 0.3
• Not possible to repeat “tomorrow" many times
• Basic rules of probability are the same, no matter which interpretation is
adopted
• 3

Thushari Silva, PhD Essentials of Machine Learning


Random Variables

• A random variable (RV), X denotes a quantity that is subject to variations due to


chance
• May denote the result of an experiment (e.g. flipping a coin) or the
measurement of a real-world fluctuating quantity (e.g. temperature)
• Use capital letters to denote random variables and lower case letters to denote
values that they take, e.g. p(X = x)
• A discrete variable takes on values from a finite or countably infinite set
• Probability mass function p(X = x) for discrete random variables

Thushari Silva, PhD Essentials of Machine Learning


Random Variables – Examples

• Examples:
• Colour of a car blue, green, red
• Number of children in a family 0, 1, 2, 3, 4, 5, 6, > 6
• Toss two coins, let X = (number of heads)2. X can take on the values 0, 1 and
4.
• Example p(Colour = red) = 0:3
• ∑! 𝑃 𝑥 = 1

Thushari Silva, PhD Essentials of Machine Learning


Continuous Random Variables

• Continuous RVs take on values that vary continuously within one or more real
intervals
• Probability density function (pdf) p(x) for a continuous random variable X
#
𝑃 𝑎≤𝑋≤𝑏 = ∫" 𝑝 𝑥 𝑑𝑥
therefore
𝑃 𝑥 ≤ 𝑋 ≤ 𝑥 + 𝛿𝑥 ≅ 𝑝 𝑥 𝛿𝑥
• ∫ 𝑝 𝑥 𝑑𝑥 = 1 (but values of p(x) can be greater than 1)
• Examples (coming soon): Gaussian, Gamma, Exponential, Beta

Thushari Silva, PhD Essentials of Machine Learning


Expectation

• Consider a function f(x) mapping from x onto numerical values


• 𝐸[𝑓 𝑥 ] = ∑! 𝑓 𝑥 𝑃 𝑥
= ∫ 𝑓 𝑥 𝑃 𝑥 𝑑𝑥

For discrete and continuous variable resp.


• f(x) = x, we obtain the mean, 𝜇!
• f(x) = (x - 𝜇! )2 , we obtain variance

Thushari Silva, PhD Essentials of Machine Learning


Joint distributions

• Properties of several random variables are important for modelling complex


problems
• 𝑃(𝑋! = 𝑥!, 𝑋" = 𝑥" , …, 𝑋# = 𝑥# )
• “,” is read as “and”
• Examples about Grade and Intelligence (from Koller and Friedman, 2009)

Intelligence = low Intelligence = high


Grade = A 0.07 0.18
Grade = B 0.28 0.09
Grade = C 0.35 0.03

Thushari Silva, PhD Essentials of Machine Learning


Marginal Probability

• The sum rule


𝑃 x = ∑! 𝑝(𝑥, 𝑦)
• p(Grade = A) ??

• Replace sum by an integral for continuous RVs

Thushari Silva, PhD Essentials of Machine Learning


Conditional Probability

• Let X and Y be two disjoint groups of variables, such that p(Y = y) > 0. Then the conditional
probability distribution (CPD) of X given Y = y is given by:
$(&,()
𝑝 𝑋=𝑥𝑌=𝑦 =𝑝 𝑥𝑦 = $(()
• Product rule
p(𝑋, 𝑌) = 𝑝 𝑋 𝑝 𝑌 𝑋 = 𝑝 𝑌 𝑝 𝑋 𝑌
• Example: In the grades example, what is p(Intelligence = high|Grade = A)?
• ∑! p 𝑋 = 𝑥 𝑌 = 𝑦 = 1 for all x

Thushari Silva, PhD Essentials of Machine Learning


Chain Rule

• The chain rule is derived by repeated application of the product rule


𝑝(𝑋! , 𝑋" , … , 𝑋# ) = 𝑝(𝑋! , 𝑋" , … , 𝑋#$! )𝑝(𝑋# |𝑋! , 𝑋" , … , 𝑋#$! )
= 𝑝(𝑋" , 𝑋# , … , 𝑋$%# )𝑝(𝑋$%" |𝑋" , 𝑋# , … , 𝑋$%# )
𝑝(𝑋$ |𝑋" , 𝑋# , … , 𝑋$%" )
=…
= 𝑝(𝑋" ) ∏$&'# 𝑝(𝑋& |𝑋" , 𝑋# , … , 𝑋&%" )

Exercise : give decompositions of p(x, y, z) using the chain rule

Thushari Silva, PhD Essentials of Machine Learning


Bayes' Rule

• From the product rule,

*𝑌 𝑋 *(+) % 𝑌 𝑋 %(')
𝑃(𝑋|𝑌) =
∑+ * 𝑌 𝑋 *(+)
= %(')

Thushari Silva, PhD Essentials of Machine Learning


Bayes’ rule example

• Consider the following medical diagnosis problem.


Suppose you decide to have a medical test for a cancer. If the test is positive, what is the
probability you have cancer? Test has a sensitivity of 80% and prior probability of having a
cancer is 0.004.
Assume that false positive are quite likely. i.e. p(x = 1|y = 0) = 0.1
p(x =1 |y = 1) = 0.8 , p(y =1 |x = 1) = ??
p(y =1 |x = 1) = p(x =1 |y = 1) p(y = 1)
p(x =1 |y = 1) p(y = 1) +p(x =1 |y = 0) p(y = 0)
= 0.8×0.004 = 0.031 = 3%
0.8×0.004 + 0.1×0.996

Thushari Silva, PhD Essentials of Machine Learning


Probabilistic Inference using Bayes' Rule

• Tuberculosis (TB) and a skin test (Test)


• p(TB = yes) = 0:001 (for subjects who get tested)
• p(Test = yes | TB = yes) = 0.95
• p(Test = no | TB = no) = 0.95

• Person gets a positive test result. What is p(TB = yes |Test = yes)?
𝑃 𝑇𝐵 = 𝑦𝑒𝑠 | 𝑇𝑒𝑠𝑡 = 𝑦𝑒𝑠 = * -./01(./*(-./014./)
| -31(./ *(-31(./)

).+, × ).))!
= ≅ 0.0187
).+,× ).)!.).),×).+++
Thushari Silva, PhD Essentials of Machine Learning
Independence

• Let X and Y be two disjoint groups of variables. Then X is said to be independent


of Y if and only if
𝑝(𝑋|𝑌) = p(X) ; for all possible values x and y of X and Y;
otherwise X is said to be dependent on Y
• Using the definition of conditional probability, we get an equivalent expression
for the independence condition
𝑝(𝑋, 𝑌) = p(X)p(Y)
• X independent of Y , Y independent of X
• Independence of a set of variables. X1,…,XD are independent iff
𝑝(𝑋" , 𝑋# , … , 𝑋$ ) = ∏$
&'" 𝑃(𝑥& )
Thushari Silva, PhD Essentials of Machine Learning
Conditional Independence

• Let X, Y and Z be three disjoint groups of variables. X is said to be conditionally


independent of Y given Z iff:
p(x|y,z) = p(x, z) = p(x|z)
for all possible values of x, y and z.

Thushari Silva, PhD Essentials of Machine Learning

You might also like