Lecture 2 - Probability Concepts and Applications
Lecture 2 - Probability Concepts and Applications
DATA MODELING
Discrete random variables is one for which the number of possible outcomes can
be counted.
• outcomes of dice rolls
• whether a customer likes or dislikes a product
Continuous random variables can assume any one of an infinite set of values
between a lower limit and an upper limit
• daily temperature
• time between machine failures
A probability distribution is a characterization of the possible values that a random
variable may assume along with the probability of assuming these values.
Discrete Probability Distributions
For a discrete random variable X, the probability distribution of the
discrete outcomes is called a probability mass function and is
denoted by a mathematical function, f(x).
• The symbol xi represents the ith value of the random variable X and f(xi) is the
probability.
Example: Probability mass function for rolling two dice
Probability Mass Function for Rolling Two
Dice
xi = values of the random variable X, which represents sum
X (Die Sum) Frequency f(X)
of the rolls of two dice 2 1 0.03
• x1 = 2, x2 = 3, …, x10 = 11, x11 = 12 3 2 0.06
f(x1) = 1/36 = 0.0278; f(x2) = 2/36 = 0.0556, etc 4 3 0.08
Probability Distribution for Rolling Two Dice 5 4 0.11
0.18 6 5 0.14
0.16 7 6 0.17
0.14
8 5 0.14
0.12
9 4 0.11
0.1
0.08
10 3 0.08
0.06 11 2 0.06
0.04 12 1 0.03
0.02 Sum 36 1
0
2 3 4 5 6 7 8 9 10 11 12
Cumulative Distribution Function
Outc Number Prob. Cumulative
A cumulative distribution function, F(x), ome of Ways f(x) Prob. F(X)
specifies the probability that the random 2 1 0.0278
variable X assumes a value less than or 3 2 0.0556
4 3 0.0833
equal to a specified value, x; that is, 5 4 0.1111
F(x) = P(X ≤ x) 6 5 0.1389
7 6 0.1667
8 5 0.1389
Probability of rolling a 6 or less = F(6) = 0.1667
9 4 0.1111
Probability of rolling between 4 and 8:
10 3 0.0833
= P(4 ≤ X ≤ 8) = P(3 < X ≤ 8) = P(X ≤ 8) – P(X ≤ 3) 11 2 0.0556
= 0.7222 – 0.0833 = 0.6389 12 1 0.0278
Total 36 1.0000
Expected Value of a Discrete
Probability Distribution
Expected value is a measure of the central tendency of the distribution;
corresponds to the notion of the mean, or average, for a sample.
where
Xi = random variable’s The Expected
possible values Value for
P(Xi) = probability of each Rolling two
dice
of the random variable’s possible
values
= summation sign
indicating we are adding all n
possible values
E(X) = expected value or
Expected Value on Television
Deal or No Deal
Contestant had 5 briefcases left with $100, $400, $1000, $50,000 or
$300,000 in them.
Expected value of briefcases is
Banker offered contestant $80,000 to quit. Is it a good or a bad deal?
Why?
Expected Value of a Charitable Raffle
Cost of raffle ticket is $50
1000 raffle tickets are sold.
Winning prize is $25,000
What is the expected value?
Is it worth to play the game?
Variance of a Discrete Probability
Distribution
For rolling two dice
where
X i= random variable’s possible values
E(Xi) = expected value of the random
variable
[Xi – E(X)] = difference between each
value of the random variable and
the expected value
E(X) = probability of each possible value
of the random variable
The Binomial Distribution
Many business experiments can be characterized by the Bernoulli
process.
The Bernoulli process is described by the binomial probability
distribution
1. Each trial has only two possible values
2. The probability of each outcome stays the same from one trial to the next
3. The trials are statistically independent
4. The number of trials is a positive integer
The Binomial Distribution
The binomial distribution is used to find the probability of a specific
number of successes in n trials
We need to know
The binomial formula is
n = number of trials Probability of r success in n trials
p = the probability of
success on any single
trial
The symbol ! means factorial, and n! =
We let
n(n – 1)(n – 2)…(1)
r = number of successes
q = 1 – p = the probability of a failure 4! = (4)(3)(2)(1) = 24
Also, 1! = 1 and 0! = 1 by definition
Solving Problems with the Binomial
Formula
Find the probability of getting 4 heads in 5 tosses
of a coin
n = 5, r = 4, p = 0.5, and q = 1 – 0.5 = 0.5
P(4 success in
5 trials)
Excel function:
=BINOM.DIST(number_s, trials, probability_s, cumulative)
If cumulative = TRUE :> function will provide cumulative probabilities;
otherwise the probability mass function, f(x).
Solving Problems with Excel
Suppose 10 individuals receive a telemarking promotion. Each
individual has a 0.2 probability of making a purchase. Find the
probability that exactly 3 of the 10 individuals make a purchase.
n= ,p= , and r =
MSA Electronics is experimenting with the manufacture of a new
transistor
Every hour a random sample of 5 transistors is taken
The probability of one transistor being defective is 0.15
What is the probability of finding 3, 4, or 5 defective?
n = , p = , and r =
The Poisson Distribution
A discrete probability distribution
• Often used in queuing models to describe arrival rates over time
• Probability function given by
where
Mean = m = 1/l
Excel function:
• =EXPON.DIST(x, lambda, cumulative)
If the number of events occurring
during an interval of time has a
Poisson distribution, then the time
between events is exponentially
distributed.
Using the Exponential Distribution
The mean time to failure of a critical engine
component is µ = 8,000 hours. What is the
probability of failing before 5000 hours?
P(X < x) =EXPON.DIST(x, lambda,
cumulative)
λ = 1/8000
P(X < 5000) =EXPON.DIST(5000, 1/8000,
TRUE)
= 0.4647
Data Modeling and Distribution Fitting
Using sample data may limit our ability to predict uncertain events that may occur
because potential values outside the range of the sample data are not included.
“Fitting” a theoretical distribution to the data and verifying the goodness of fit
statistically.
• Examine a histogram for clues about the distribution’s shape
• Look at summary statistics such as the mean, median, standard deviation, coefficient of
variation, and skewness
Analytically fit the data to the best type of probability distribution.
Three statistics measure goodness of fit:
• Chi-square
• Kolmogorov-Smirnov
• Anderson-Darling
Chi-Square Goodness of Fit
Test
Chi-Square Goodness of Fit Test
• Example: A researcher claims that the distribution of favorite pizza toppings among teenagers is
as shown below.
With d.f. = 4 and = 0.01, the critical value is χ20 = 13.277 =CHISQ.INV.RT(, df)
Chi-Square Goodness-of-Fit Test
Example continued:
Topping Observed Expected
Rejection Frequency Frequency
region
Cheese 78 82
0.01 Pepperoni 52 50
Sausage 30 30
X2 Mushrooms 25 20
χ20 = 13.277 Onions 15 18
2 (O E )2 (78 82)2 (52 50)2 (30 30)2 (25 20)2 (15 18)2
χ
E 82 50 30 20 18
2.025
Fail to reject H0.
There is not enough evidence at the 1% level to reject the surveyor’s claim.
Chi-Square Goodness of Fit Test