Intro To Probability (Pattern Recognition)
Intro To Probability (Pattern Recognition)
Sarat Saharia
TU
Introduction
2
Introduction
Why Learn Probability?
• Nothing in life is certain. In everything we do, we gauge
the chances of successful outcomes, from business to
medicine to the weather
• A probability provides a quantitative description of the
chances or likelihoods associated with various outcomes
• It provides a bridge between descriptive and inferential
statistics
Probability
Populati Sample
on
Statistics
Probability model
• A probability model assumes that the variability in data
is due to chance or random variability. For a random
occurrence with a finite number of possible outcomes, a
probability model can be defined by listing all possible
outcomes and the probability that each one occur.
• P(x) denote the probability that a particular value x
occurs
• Example:
1. flipping a single unbiased coin. Here outcomes are head and tail
and the corresponding probabilities are
P(head) = ½ and P(tail) = ½.
2. Rolling a fair die. Outcomes are 1, 2, 3, 4, 5, and 6 and
probability of each outcome is 1/6.
4
Probability Estimate
• Methods for obtaining probability estimates:
– Frequentist approach
– Subjective approach
• Frequentist approach: probability of an event is
estimated by dividing the number of
occurrences of an event by the number of trials.
– Easy to understand but difficult or impossible to
obtain enough samples to get an estimate of the true
probabilities.
– Also, this approach is applied to repeatable events,
that is, events for which the probability is constant
over all trials. This is often difficult to verify in the real
world.
5
Probability Estimate
• Subjective approach: This is the only way to assign
probability measures to various outcomes for events
that are not repeatable. For example, the probability that
a certain candidate will win the next election from a
constituency.
• Use of the notion of a fair bet is one way to quantify the
process of selecting a subjective probability.
– Let P is the probability of the event on which you are betting.
– W is the amount you win if the event occurs and L is the amount
you lose if the event does not occur.
– For a fair bet,
PW = (1 – P)L
or P = L / (L + W).
6
Experiments and Events
• Experiment: a process for which the outcome
is not known with certainty. Example:
7
Experiments and Events
Experiment: Rolling a fair six sided die
Events:
– Obtaining a 6
– Obtaining an odd number
Experiment: Randomly choosing 10 transistors from a lot of 1000
new transistors
Events:
– Finding more than three defective
– Finding no defective transistors
Experiment: Selecting a newborn child a certain hospital
Events:
– The weight of the newborn child selected is above 3 kg.
– The newborn child is a girl
8
Probabilities of Events
• Sample space: The event containing all possible
outcomes of a statistical experiment is called the
sample space. Examples:
1. For the experiment 1, the sample space consists of the
numbers 1, 2, 3, 4, 5, and 6.
2. For the experiment 2, the sample space consists of the
numbers 1, 2, …, 10.
3. For the experiment 3, the sample space consists of all
numbers that represent the possible weights of a randomly
selected newborn child.
• Venn diagrams can be used to visualize the
relationships among events.
9
A
not A
A B A B
A or B A and B
10
Joint Event
The event A and B is called a joint event.
Example:
– A: the newborn child is a girl
– B: weight of the newborn child is above 3 kg
– A and B: the newborn child is a girl and her weight
is above 3 kg
11
Mutually Exclusive events
Two events A and B are called mutually exclusive
is A and B cannot occur simultaneously.
A: observe an odd number when roll a die
B: observe a 6
A and B are mutually exclusive
A B
)
tB
B
nd
no
)a
d
d(
an
tA
an
(no
A
B
A
13
Conditional Probabilities
• Joint Event: The conditional probability of A occurring,
given that B has occurred, is denoted by P(A|B) (read
as “P of A given B”) and is given by
P(A|B) = P(A and B)/P(B) (1)
This conditional probability is not defined if P(B) = 0.
Similarly,
P(B|A) = P(A and B)/P(A) (2)
14
The Multiplication Rule
• Independent Events: If P(A) is not dependent on whether B has
occurred, then the event A is independent of event B. Then P(A) =
P(A|B). An important consequence of the definition of
independence is the following multiplication rule (whenever A is
independent of B):
P(A and B) = P(A|B)P(B) = P(A) P(B) (6)
15
Random Variables
• A random variable is the outcome of a random process
which output a numeric value. The output of a random
variable is called a random number. An example of a
random variable is the process of randomly choosing a
sample from some population and measuring one of its
feature.
– often denoted with capital alphabetic symbols (X, Y, etc.)
– a normal random variable may be denoted as X ~ N(µ, σ)
• The probability distribution of a random variable X tells
us what values X can take and how to assign
probabilities to those values
16
Discrete random variable
• Discrete random variable: a random variable which can take
on a finite number of possible values or a countably infinite
number of values. A discrete random variable is described by
its distribution function which lists for each outcome x the
probability P(x) of x. If x1, x2, … xn are all possible outcomes,
then
• Example:
– number of pets owned (0, 1, 2, … )
– numerical day of the month (1, 2, …, 31)
– the total number of tails you get if you flip 100 coins
17
Discrete example: roll of a die
p(x)
1/6
x
1 2 3 4 5 6
18
Probability Distribution Function (PDF)
x p(x)
1 p(x=1)=1/6
2 p(x=2)=1/6
3 p(x=3)=1/6
4 p(x=4)=1/6
5 p(x=5)=1/6
6 p(x=6)=1/6
1.0 19
Cumulative Distribution Function
(CDF)
1.0 P(x)
5/6
2/3
1/2
1/3
1/6
1 2 3 4 5 6 x
20
Cumulative Distribution Function
(CDF)
x P(x≤A)
1 P(x≤1)=1/6
2 P(x≤2)=2/6
3 P(x≤3)=3/6
4 P(x≤4)=4/6
5 P(x≤5)=5/6
6 P(x≤6)=6/6
21
Examples
P(x≤3)=1/2
(9)
Example:
23
The Poisson Distribution
• The Poisson distribution is used to model random variables
that may have a countably infinite number of outcomes. For
example, number of automobiles arriving at a tollbooth,
number of phone calls received by a call center per hour and
the number of decay events per second from a radioactive
source. etc.
• The distribution function is given by
(10)
Example: 24
The Poisson Distribution
• Example
– number of automobiles arriving at a tollbooth
– The number of calls coming per minute into a hotel for
booking
– The number of meteorites greater than 1 meter diameter
that strike Earth in a year
– The number of patients arriving in an emergency room
between 10 and 11 pm
25
Continuous Random Variable
• A continuous random variable is described by a probability
density function. This function is used to obtain the
probability that the value of a continuous random variable is
in a given interval.
• If the random variable is x and its density function is p(x),
then
(11)
• In case of continuous random variable
(12)
• Cumulative distribution:
C(a) = P(x ≤ a)
26
Probability Density Function (PDF)
• The probability function that accompanies a continuous
random variable is a continuous mathematical function
that integrates to 1.
• The probabilities associated with continuous functions are
just areas under the curve (integrals!).
27
Continuous Random Variable
C(x) = 0 if x < a,
(x – a)/(b – a) if a ≤ x ≤ b,
(14)
1 if x > b
28
Continuous Random Variable
• The Exponential density
p(t) = βe-βt , t ≥ 0 (15)
(16)
29
The Normal Density Function
31
The Normal Density Function
32
The Shape of Normal Density
Normal distribution is bell shaped, and symmetrical around m.
90 μ 110
Why symmetrical? Let µ = 100. Suppose x Now suppose x
= 110. = 90
33
Normal Probability Density
34
Normal Probability Density
36
Statistical Measures
• Center of the data
– Mean
– Median
• Variation
– Range
– Quartiles
– Variance
– Standard Deviation
– Covariance
– Correlation
37
Mean or Average or
Expectation
•
38
Mean or Average
(5,6
)
(6,5
) Mea
(2,4 (3,4 (5,5 n
(3.3636,3.090
) ) ) 9)
(5,3
)
(2,1 (4,2
) )
(1,1 (1,2 (3,1
) ) )
39
Median (M)
• A resistant measure of the data’s center
• At least half of the ordered values are less
than or equal to the median value
• At least half of the ordered values are greater
than or equal to the median value
• If n is odd, the median is the middle ordered
value
• If n is even, the median is the average of the
two middle ordered values
40
Median (M)
Location of the median: L(M) = (n+1)/2 ,
where n = sample size.
41
Median
• Example 1 data: 2 4 6
Median (M) = 4
• Example 2 data: 2 4 6 8
Median = 5 (average of 4 and 6)
• Example 3 data: 6 2 4
Median ≠ 2
(order the values: 2 4 6 , so Median = 4)
42
Comparing the Mean &
Median
• Computation of mean is easier.
• Finding median in higher dimension is much
complex.
• Mean is prone to noise.
• The mean and median of data from a
symmetric distribution should be close
together. The actual (true) mean and median
of a symmetric distribution are exactly the
same. 43
Spread or Variability
• If all values are the same, then they all equal to
the mean. There is no variability.
– Eg: 2, 2, 2, 2, 2, 2; mean = 2
• Variability exists when some values are
different from (above or below) the mean.
– Eg: 10, 15,-20,-22,30, 22
• We will discuss the following measures of
spread: range, quartiles, variance, and
standard deviation
44
Range
• One way to measure spread is to give the
smallest (minimum) and largest (maximum)
values in the data set;
Range = max − min
– Eg: 10,-2,-7,22,0,11; Range = 22-(-7)=28
• The range is strongly affected by outliers
45
Quartiles
• Three numbers which divide the
ordered data into four equal sized
groups.
• Q1 has 25% of the data below it.
• Q2 has 50% of the data below it. (Median)
• Q3 has 75% of the data below it.
46
Quartiles Uniform Distribution
Q1 Q2 Q3
47
Obtaining the Quartiles
• Order the data.
• For Q2, just find the median.
• For Q1, look at the lower half of the data
values, those to the left of the median
location; find the median of this lower half.
• For Q3, look at the upper half of the data
values, those to the right of the median
location; find the median of this upper half.
48
Variance and Standard
Deviation
• Recall that variability exists when some
values are different from (above or
below) the mean.
• Each data value has an associated
deviation from the mean:
49
Deviations
• what is a typical deviation from the
mean? (standard deviation)
• small values of this typical deviation
indicate small variability in the data
• large values of this typical deviation
indicate large variability in the data
50
Variance
51
Variance
Mean
52
Variance
2
-
53
Variance
2
-
2
-
54
Variance
1
---------------- ……… + 2 2
- + - + ………
No. of Data
Points
55
Variance Formula
56
Standard Deviation
58
Variance and Standard Deviation
Observations Deviations Squared deviations
59
Variance and Standard Deviation
60
Variance (2D)
61
Variance (2D)
62
Variance (2D)
63
Variance (2D)
64
Variance (2D)
65
Covariance
66
Covariance
67
Covariance
68
Covariance
69
Covariance
70
Covariance
Positi
ve
Relati
on
71
Covariance
72
Covariance
73
Covariance
74
Covariance
75
Covariance
Negat
ive
Relati
on
76
Covariance
77
Covariance
No
Relati
on
78
Covariance
(2 , 1) (-2.4545,
(2 , 2) -2.8182)
(4 , 3) (-2.4545,
(6 , 1) -1.8182)
(8 , 3) (-0.4545,
(1 , 5) -0.8182)
(4 , 6) (1.5455,
(4 , 7) -2.8182)
(6 , 3) (3.5455,
(6 , 5) -0.8182)
(6 , 6) (-3.4545,
1.1818) 79
(4.4545, (0,
(-0.4545,
3.8182) 0)
Covariance Matrix
80
Correlation
Positive Negative No
relation relation relation
82
Multivariate Gaussians (or "multinormal distribution“ or
“multivariate normal distribution”)
Multivariate case:
Vector of observations x,
vector of means μ and covariance matrix Σ
Dimension of x Determinant
83
Multivariate Gaussians
Univariate case
Multivariate case
do not depend on x
normalization constants
depends on x and positive
84
The mean vector
85
Covariance of two random
variables
• Recall for two random variables xi, xj
86
The covariance matrix
transpose operator
Var(xm)=Cov(xm, xm)
87
An example: 2 variate case
Determinant
88
An example: 2 variate case
90
Gaussian Intuitions: Size of Σ
Identity matrix
μ = [0 0] μ = [0 0] μ = [0 0]
Σ=I Σ = 0.6I Σ = 2I
As Σ becomes larger,
Gaussian becomes more spread out
91
Gaussian Intuitions:
Off-diagonal
92
Gaussian Intuitions: off-diagonal and
diagonal
93
Choosing a probability distribution
• Nature of data source may determine the type of density
function on some cases
• The histogram of the data may suggest a model
• Any assumed distribution can be tested using chi-squared test
or the Kolmogorov-Smirnov test
• Most frequently used continuous density is normal density
• Linear functions of a normally distributed feature are also
normally distributed
• Features that are not normally distributed can be sometimes
converted to approximately normal by suitable transformation
• A popular graphical test for determining whether a data set is
approximately normally distributed is based on normal plot.
94