Module 2 - Probability and Distributions
Module 2 - Probability and Distributions
5
PMF of Weekly Sales at Store A
35%
0 30%
30%
1 2 3 4 5
25% 22% 22%
Sales per Week
20%
14%
15% 12%
10%
Conditional Probability
P(A|B) = Probability that Event A occurs, GIVEN THAT Event B has occurred.
e.g., P(D|C) = P[(Sales≤2) Given That (Sales=1, 3, or 5)]
P(D|C) = P[(Sales≤2) Given That (Sales=1, 3, or 5)] = P(Sales=1) / P(Sales=1, 3, or 5) = .14 / .56 = 0.25
P(A|B) = P[(Sales=5) Given That (Sales≥4) = P(Sales=5) / P(Sales≥4) = .12 / .34 = 0.35
P(B|A) = P[(Sales≥4) Given That (Sales=5) = P(Sales=5) / P(Sales=5) = .12 / .12 = 1.00
Independence
A and B are independent if knowing that B occurred
does not influence the probability of A occurring
P(A | B) = P(A)
Mode = 3
Median = 3
• Expected Value
20%
14%
15% 12%
10%
n Notation 5%
• Inner Quartiles – the 75th percentile value minus the 25th percentile value
Max
75th Percentile
Range Inner
= 5-1 = 4 25th Percentile Quartile
= 4-2 = 2
Min
Minimum = 1
Range = 4
25th Pct = 2
Inner Quartile = 2
μ=Mean = 2.96
σ2 = Variance = 1.48
50th Pct = Median = 3
σ = Standard Deviation = 1.215
Mode = 3
CV = Coefficient of Variation = 0.411
75th Pct = 4
Maximum = 5
MIT Center for
7
Transportation & Logistics
Discrete Probability Distributions
25.0%
20.0%
Uniform [1,6]
15.0%
10.0% Poisson (mean=1.5)
5.0%
0.0%
0 1 2 3 4 5 6 7 8 9 10
MIT Center for Random Variable X 10
Transportation & Logistics
Discrete Uniform Distribution
• Notation: U(a, b) Probability Mass Function
n a = Minimum ⎧ 1
b = Maximum ⎪ for a ≤ x ≤ b
n P ⎡⎣ X = x⎤⎦ = f (x | a,b) = ⎨ n
n n = # of values = b – a + 1 ⎪ 0
⎩ otherwise
• Metrics
n Mean = (a + b) / 2 i xi pi
PMF Rolling One Die
n Median = (a + b) / 2 20% 1 1 1/6
2 2 1/6
n Mode N/A 16%
12% 3 3 1/6
n Variance = ((b-a+1)2 – 1)/12 8% 4 4 1/6
4%
5 5 1/6
0%
1 2 3 4 5 6 6 6 1/6
0.45
Note:
0.40 • As λ increases, the distribution becomes
more symmetric and “bell shaped”
0.35 • Value is always an integer ≥0
• The value of λ does not need to be integer
0.30
λ= 0.75
0.25
λ= 2
0.20
λ= 5
λ= 10
0.15
0.10
0.05
-
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2. What is the probability that 2 or fewer calls will come in over the next minute?
P[X=0] = (e-2.2 λ0)/(0!) = (0.1108)(1)/1 = 0.11 or 11%
P[X=1] = (e-2.2 λ1)/(1!) = (0.1108)(2.2)/1 = 0.24 or 24%
P[X≤2]62%
P[X=2] = (e-2.2 λ2)/(2!) = (0.1108)(4.84)/2 = 0.27 or 27%
3. What is the probability that at least 1 call will come in over the next minute?
P[X>0] = 1 – P[X=0] = 1 – 0.11 = 0.89 or 89%
n n 2 n 2
E[X] = x = µ = ∑ pi xi Var[X] = σ = ∑ pi ( xi − x ) = ∑ pi ( xi − µ )
2
i=1 i=1 i=1
⎪ 0 0%
⎩ otherwise 1 2 3 4 5 6
n Poisson
0.70
Probability Mass Function 0.60
0.50
⎧ −λ x
⎪ e λ
0.40
for x = 0,1,2,... 0.30
P ⎡⎣ X = x⎤⎦ = f (x | λ ) = ⎨ x! 0.20
⎪ 0 otherwise 0.10
⎩ -
0 1 2 3 4 5
MIT Center for
18
Transportation & Logistics
Questions, Comments, Suggestions?
Use the Discussion Forum!
Week Unit Sales Week Unit Sales Week Unit Sales Week Unit Sales Week Unit Sales
1 3595 11 2346 21 3967 31 2898 41 2196
2 3011 12 2869 22 2844 32 3713 42 3469
3 2994 13 3450 23 2546 33 2845 43 3570
4 3576 14 2031 24 2771 34 2866 44 2071
5 3697 15 3198 25 4084 35 3549 45 3247
6 2648 16 2939 26 2755 36 2365 46 4740
7 3747 17 2034 27 2641 37 2462 47 2316
8 3165 18 2476 28 2875 38 2480 48 2625
9 3412 19 2339 29 3855 39 3055 49 3973
10 2750 20 3200 30 2880 40 2453 50 3491
N 2
10%
12%
0%
2%
4%
6%
8%
15
00 1500
16 1600
00
17 1700
00
18 1800
00
19 1900
00
20
2000
43 4200
00
44 4300
00
45 4400
00
Weekly demand at Eastern DC
46 4500
in 100 unit bins for the last year.
00
47 4600
00
48 4700
00
49 4800
00
50 4900
00
5000
7
Continuous Probability Distributions
• Differences from Discrete Random Variables
n Probability of specific value outcomes make no sense
n Probability of values within an interval is more helpful
n Cannot list all possible outcomes – instead we need to use a function
• Probability Density Function (pdf)
n Probability that X lies between values a and b is equal to area under
the curve between a and b
n Total area under the curve equals 1, but the P[X=t] = 0!
b
probability
∫ a
f (t)dt
a b t
t t
cumulative probability
n P(X≤t) = F(t)
n P(X>t) = 1 – F(t)
n P(c≤X≤d) = F(d) – F(c)
n P(X=t) = 0
F(a)
0
t
probability
pdf f(t)
a t
MIT Center for
10
Transportation & Logistics
Continuous Distributions
⎧ 1
⎪ if a ≤ t ≤ b
f (t | a, b) = ⎨ b − a
⎪⎩ 0 otherwise ⎛1⎞
(
Mean = ⎜ ⎟ a + b
⎝2⎠
)
⎧ 0 if t < a
⎪ ⎛1⎞
⎪ t−a (
Median = ⎜ ⎟ a + b
⎝2⎠
)
F(t | a,b) = ⎨ if a ≤ t ≤ b
⎪ b−a Mode = any value in range [a,b]
⎪ 1 if t > b ⎛1⎞ 2
⎩ Variance = ⎜ ⎟ b − a
⎝ 12 ⎠
( )
MIT Center for
13
Transportation & Logistics
Zippy Bright Transportation I
Zippy Bright has a consumer delivery unit. They distribute
product from a downtown location to all residences and offices in
the city. The deliveries are made on scooters and each customer
is delivered to directly. They found that the distance to each
customer location is ~U(2.75,6.50) kilometers.
1. What is the average distance, median distance, and CV?
We know that mean = (a+b)/2 = (2.75 + 6.50)/2 = 4.625 km which is also the median!
CV= σ/μ= √[(1/12)(b-a)2] / (a+b)/2 = √[(1/12)(6.5 – 2.75)2] / 4.625 = 1.0825 / 4.625 = 0.23
⎡ 1 ⎛ x − µ ⎞2 ⎤
1 ⎢− ⎜ ⎟ ⎥
⎢⎣ 2 ⎝ σ ⎠ ⎥⎦
f ( x | µ ,σ ) = 1/ 2
e μ μ+kσx x0
(2π ) σ
Characteristics
• Most commonly used distribution – many analyses assume ~ N
• High point in ‘bell curve’ occurs at mean
• Symmetric about the mean
• The mean ‘shifts’ the distribution – but not the ‘shape’
• The standard deviation changes the ‘shape’ but doesn’t ‘shift’ it
Probability of X
0.5%
0.4%
• P(X w/in 2σ around µ) = .9544 0.3%
0
20
25
30
35
40
45
50
55
60
65
70
• +/- 1.96 σ around µ = 0.950 Value
Normal CDF
• +/- 2.81 σ around µ = 0.995
100%
Probability of X 80%
So, what is 6σ? 60%
_10
Error occurs 9.9 x 10 of the time 40%
20%
0%
0
0
20
25
30
35
40
45
50
55
60
65
70
Value
− x02
fu(u0) e 2
Area =
( )
f u u0 =
2π
P[u<z]
Area =
P[u≥z]=
=1-P[u<z]
0 z u0
MIT Center for
18
Transportation & Logistics
Normal Functions for Spreadsheets
Function Microsoft Excel Google Sheets LibreOffice->Calc
We would say,
“X follows a triangle distribution
with a minimum of a, maximum
b, and a mode of c, ~T(a, b, c)”
a c b x
Characteristics
• Good way to get a sense of an unknown distribution
• People tend to recall extreme and common values
• Handles asymmetric distributions
MIT Center for
22
Transportation & Logistics
Zippy Bright Transportation III
Zippy Bright has a consumer delivery unit. They distribute 2
(b − a)
product from a downtown location to all residences and =0.25
We want to find P[X>d] where d=2 or P[X>2] which is = 1 – P[X≤2]. Because 1 ≤ 2 < 4, we
select the case where a≤d<c, plug in the equation and find P[X<=2] = 4.2% so that the
probability that a delivery is longer than 2 km is 1 – 0.042 = 0.958 or 95.8%
5. What is the distance that 90% of the trips will be shorter than?
We want to find d where the P[X≤d] = 0.90. This is obviously on the right hand side of
the distribution, but we can check by looking at P[X≤c] = (c-a)/(b-a) = 3/8 = 0.375. Since
90% is larger than this, we are on the right hand side.
“Uniformly fun!”