0% found this document useful (0 votes)
35 views

Module 2 - Probability and Distributions

Zippy Bright manufactures electric toothbrushes and received weekly sales data for their product XP219 from three stores over a year. The document analyzes the sales data for store A. It presents the weekly sales numbers in a table and graphically as a histogram and probability mass function. It also defines key probability concepts like probability, events, and the four basic probability laws including conditional probability.

Uploaded by

PPP
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Module 2 - Probability and Distributions

Zippy Bright manufactures electric toothbrushes and received weekly sales data for their product XP219 from three stores over a year. The document analyzes the sales data for store A. It presents the weekly sales numbers in a table and graphically as a histogram and probability mass function. It also defines key probability concepts like probability, events, and the four basic probability laws including conditional probability.

Uploaded by

PPP
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Managing Uncertainty I:

Probability and Discrete


Distributions

MIT Center for


Transportation & Logistics ctl.mit.edu
Zippy Bright
• Zippy Bright manufactures electric toothbrushes that are sold
through large retail outlets. Zippy Bright is concerned with
how variable the sales are at different stores. They requested
and received a year of weekly sales data on their premiere
product, the XP219, for three stores from one of their
retailers, Sellco.
• What can we say about the weekly sales in store A?
Week Unit Sales Week Unit Sales Week Unit Sales Week Unit Sales Week Unit Sales
1 1 11 3 21 2 31 1 41 2
2 5 12 2 22 4 32 2 42 3
3 3 13 3 23 4 33 3 43 3
4 2 14 4 24 3 34 4 44 3
5 3 15 2 25 4 35 5 45 4
6 3 16 1 26 1 36 5 46 1
7 3 17 3 27 2 37 1 47 2
8 2 18 4 28 3 38 5 48 3
9 5 19 4 29 4 39 5 49 4
10 2 20 3 30 4 40 1 50 2

MIT Center for


2
Transportation & Logistics
Zippy Bright – Graphing it out!

MIT Center for


3
Transportation & Logistics
Zippy Bright - Distributions
A histogram for the weekly sales.
Weekly Sales at Store A • A graphical representation of the distribution by mutually
20 exclusive and collectively exhaustive “bins” or intervals.
Number of Weeks

15 • Shows relative probability of each interval.


10

5
PMF of Weekly Sales at Store A
35%
0 30%
30%
1 2 3 4 5
25% 22% 22%
Sales per Week
20%
14%
15% 12%
10%

The Probability Mass Function 5%

• Probability of each discrete random variable 0%


1 2 3 4 5
• Probabilities sum to 100% or 1.00 Sales per Week

Probability Table Cumulative Distribution of Sales


100%
100% 88%
Cumulative 80%
Value Probability Probability 66%
60%
1 14% 14%
36%
2 22% 36% 40%
3 30% 66% 20% 14%
4 22% 88% 0%
MIT Center for
5 12% 100% 1 2 3 4 5
Sales per Week 4
Transportation & Logistics
Basic Probability Laws 1 & 2

MIT Center for


5
Transportation & Logistics
Basic Probability
• Probability Theory
n Mathematical framework for analyzing random events or experiments.
n Experiments are events we cannot predict with certainty, e.g., weekly sales!
• Notation Events:
n P(A) = probability that event A occurs, • A = “Sales = 5 units”
• B = “Sales ≥4 units”
w P(A)= 0.12
• C = “Sales are an Odd number”
w P(B)= P(4) + P(5) = 0.34 • D = “Sales are ≤2 units”
w P(C) = P(1) + P(3) + P(5) = .14 + .30 + .12 = 0.56
n P(A’) = complement of P(A) = probability some other event that is not A occurs
w P(A’)= 1 – P(A) = 0.88
w P(B’)= 1 – P(B) = 0.66
w P(C‘) = 1 – P(C) = 1 - 0.56 = 0.44
Cumulative • P(B U C) = P[(Sales≥4) OR (Sales =1,3,5)]
Value Probability Probability = P[Sales = 1, 3, 4, or 5] = 0.78
• P(B C) = P[(Sales≥4) AND (Sales =1,3,5)]
U
1 .14 .14
= P[Sales = 5] = 0.12
2 .22 .36
• P(A D) = P[(Sales=5) AND (Sales≤2)] = 0
U
3 .30 .66
• P(A U A’) = P[(Sales=5) OR (Sales≠5)] = 1.00
4 .22 .88
MIT Center for
5 .12Transportation &1.00
Logistics
Four Laws
Cumulative Events:
Value Probability Probability
1 .14 .14
• A = “Sales = 5 units”

of Probability 2 .22 .36 • B = “Sales ≥4 units”


3 .30 .66 • C = “Sales are an Odd number”
4 .22 .88 • D = “Sales are ≤2 units”
5 .12 1.00

1. Probability of any event is between 0 and 1


• P(Sales>6) = 0
0 ≤ P(A) ≤ 1 • P(1, 2, 3, 5)= 0.78
• P(Sales <6) = 1
• P(Sales < 1) = 0

2. If A and B are mutually exclusive events, then


P(A or B) = P(A U B) = P(A) + P(B)
• P(A U D) = P[(Sales=5) OR (Sales =1 or 2)]
= P(Sales=5) + P(Sales =1 or 2) = .12 + .36 = 0.48
• P(B U C) = P[(Sales≥4) OR (Sales =1,3,5)] = P[Sales = 1, 3, 4, or 5] = 0.78
≠ P(Sales≥4) + P(Sales =1,3,5)] = .34 + .56 = .90 why?????
MIT Center for
7
Transportation & Logistics
Basic Probability Laws 3 & 4

MIT Center for


8
Transportation & Logistics
Four Laws
Cumulative Events:
Value Probability Probability
1 .14 .14
• A = “Sales = 5 units”

of Probability 2 .22 .36 • B = “Sales ≥4 units”


3 .30 .66 • C = “Sales are an Odd number”
4 .22 .88 • D = “Sales are ≤2 units”
5 .12 1.00

Conditional Probability
P(A|B) = Probability that Event A occurs, GIVEN THAT Event B has occurred.
e.g., P(D|C) = P[(Sales≤2) Given That (Sales=1, 3, or 5)]

3. If A and B are any two events, then

P(D|C) = P[(Sales≤2) Given That (Sales=1, 3, or 5)] = P(Sales=1) / P(Sales=1, 3, or 5) = .14 / .56 = 0.25
P(A|B) = P[(Sales=5) Given That (Sales≥4) = P(Sales=5) / P(Sales≥4) = .12 / .34 = 0.35
P(B|A) = P[(Sales≥4) Given That (Sales=5) = P(Sales=5) / P(Sales=5) = .12 / .12 = 1.00

MIT Center for


9
Transportation & Logistics
Four Laws
Cumulative Events:
Value Probability Probability
1 .14 .14
• A = “Sales = 5 units”

of Probability 2 .22 .36 • B = “Sales ≥4 units”


3 .30 .66 • C = “Sales are an Odd number”
4 .22 .88 • D = “Sales are ≤2 units”
5 .12 1.00

Independence
A and B are independent if knowing that B occurred
does not influence the probability of A occurring

4. If A and B are independent events, then


P(A | B) = P(A)

Are Events C and A independent? Let’s test it!


• If P(C|A) = P(C) (that is the probability that sales are odd given that we sold 5 units),
then A and C are independent events.
• P(C|A) = P(C and A)/P(A) = P[(Sale=1, 3, or 5) and (Sales=5)] / P(Sales=5)
= P(Sales=5) / P(Sales=5) = 1.00
• Since this is not P(C) = 0.56, these are not independent events.
MIT Center for
10
Transportation & Logistics
Key Points

MIT Center for


11
Transportation & Logistics
Key Points
• Probability Laws
n Probability of any event is between 0 and 1 0 ≤ P(A) ≤ 1
n If A and B are mutually exclusive events, then
P(A or B) = P(A U B) = P(A) + P(B)
n If A and B are any two events, then

n If A and B are independent events, then

P(A | B) = P(A)

MIT Center for


12
Transportation & Logistics
Characterizing Uncertainty

MIT Center for


1
Transportation & Logistics
Characterizing a Distribution
• Several ways to characterize a distribution:
n Central Tendency – what is the “most likely” value?
n Spread – how much do the observations “differ”?

Unit Unit Unit Unit Unit


Week Sales Week Sales Week Sales Week Sales Week Sales
1 1 11 3 21 2 31 1 41 2
2 5 12 2 22 4 32 2 42 3
3 3 13 3 23 4 33 3 43 3
4 2 14 4 24 3 34 4 44 3
5 3 15 2 25 4 35 5 45 4
6 3 16 1 26 1 36 5 46 1
7 3 17 3 27 2 37 1 47 2
8 2 18 4 28 3 38 5 48 3
9 5 19 4 29 4 39 5 49 4
10 2 20 3 30 4 40 1 50 2

MIT Center for


2
Transportation & Logistics
Central Tendency Metrics
• Mode – value that appears most frequently

• Median – value in the “middle” of a distribution


– value separating the lower from the higher half

• Mean – sum of values divided by the total number of observations (average)


– sum of values multiplied by their probability (expected value)

Mode = 3
Median = 3

MIT Center for


3
Transportation & Logistics
Central Tendency - Mean
• Sum sales and divide by number of observations (weeks)
n Sum = 148 units sold
n N = 50 weeks 35%
30%
30%
n Mean = Average = 148/50 = 2.96 units/week 25% 22% 22%

• Expected Value
20%
14%
15% 12%
10%
n Notation 5%

w X = Discrete random variable 0%


1 2 3 4 5
w xi = Possible values of X, i.e., x1, x2, x3, . . .,xn
w pi = Corresponding probabilities, i.e., p1, p2, p3, . . ., pn
n Where P(X=xi) = pi and the probabilities sum to 1, i.e., p1+p2+p3+…+pn = 1
n The expected value of X, E[X], is equal to: x p px
i i i i
1 .14 .14
n 2 .22 .44
E[X] = x = µ = ∑ pi xi 3 .30 .90
i=1
4 .22 .88
5 .12 .60
MIT Center for
Transportation & Logistics
Σ=2.96 4
Spread Metrics
• Range – maximum value minus minimum value

• Inner Quartiles – the 75th percentile value minus the 25th percentile value

• Variance – expectation of the squared deviation around the mean


n 2 n 2
Var[X] = σ = ∑ pi ( xi − x ) = ∑ pi ( xi − µ )
2
i=1 i=1

Max
75th Percentile
Range Inner
= 5-1 = 4 25th Percentile Quartile
= 4-2 = 2
Min

MIT Center for


5
Transportation & Logistics
Spread - Variance
• Variance – Expectation of the squared deviation around the mean
– Also called the Second Moment around the mean
n 2 n 2
Var[X] = σ = ∑ pi ( xi − x ) = ∑ pi ( xi − µ )
2
i=1 i=1

xi pi pixi xi-μ (xi-μ)2 pi(xi-μ)2


1 .14 .14 -1.96 3.84 0.5376
2 .22 .44 -0.96 0.92 0.2024
3 .30 .90 0.04 0.0016 0.00048
4 .22 .88 1.04 1.08 0.2376
5 .12 .60 2.04 4.16 0.4992
μ=2.96 σ2=1.48

• Standard Deviation – Square root of the variance


- In same units as the mean! σ=√1.48 = 1.215 units/week

• Coefficient of Variation – Ratio of standard deviation to the mean


- Standard measure of variability CV=σ/μ= 1.215 / 2.96 = 0.411
MIT Center for
6
Transportation & Logistics
Zippy Bright – Summary Statistics

Minimum = 1
Range = 4
25th Pct = 2
Inner Quartile = 2
μ=Mean = 2.96
σ2 = Variance = 1.48
50th Pct = Median = 3
σ = Standard Deviation = 1.215
Mode = 3
CV = Coefficient of Variation = 0.411
75th Pct = 4
Maximum = 5
MIT Center for
7
Transportation & Logistics
Discrete Probability Distributions

MIT Center for


8
Transportation & Logistics
Probability Distributions
• Where do they come from?
n Empirical – based on actual data
n Theoretical – based on a mathematical form
• Which is better?
n It depends on what you are trying to accomplish
n Empirical distributions follow past history
n Theoretical distributions can allow for more robust modeling
n Typically, we look for the theoretical distribution that fits the
data

MIT Center for


9
Transportation & Logistics
Discrete Theoretical Distributions
• Discrete Uniform Distribution
n N possible values
n Each value has equal probability, i.e., pi= 1/N
n Ex: Rolling a die
• Poisson Distribution
n Probability of seeing x events within a certain time period
n Example: Random arrivals to a customer service desk
PMFs of Theoretical Distributions
40.0%
35.0%
30.0%
Probabiity of X

25.0%
20.0%
Uniform [1,6]
15.0%
10.0% Poisson (mean=1.5)
5.0%
0.0%
0 1 2 3 4 5 6 7 8 9 10
MIT Center for Random Variable X 10
Transportation & Logistics
Discrete Uniform Distribution
• Notation: U(a, b) Probability Mass Function
n a = Minimum ⎧ 1
b = Maximum ⎪ for a ≤ x ≤ b
n P ⎡⎣ X = x⎤⎦ = f (x | a,b) = ⎨ n
n n = # of values = b – a + 1 ⎪ 0
⎩ otherwise
• Metrics
n Mean = (a + b) / 2 i xi pi
PMF Rolling One Die
n Median = (a + b) / 2 20% 1 1 1/6
2 2 1/6
n Mode N/A 16%
12% 3 3 1/6
n Variance = ((b-a+1)2 – 1)/12 8% 4 4 1/6
4%
5 5 1/6
0%
1 2 3 4 5 6 6 6 1/6

μX = 1/6*1 + 1/6*2 + 1/6*3 + 1/6*4 + 1/6*5 + 1/6*6 = 3.5 = (6 + 1)/2

σ2X = 1/6*(1-3.5)2 + 1/6*(2-3.5)2 + 1/6*(3-3.5)2 + 1/6*(4-3.5)2 + 1/6*(5-3.5)2 + 1/6*(6-3.5)2 = 2.917


= ((6-1+1)2 -1) / 12
σX = √(2.917) = 1.708
MIT Center for
11
Transportation & Logistics
Poisson Distribution

MIT Center for


12
Transportation & Logistics
Poisson Distribution
• Widely used to model arrivals, slow moving inventory, etc.
• Discrete distribution that cannot take negative values
• Notation: P(λ) x p 0.70
i i 0.60
n λ = mean = variance 0 61% 0.50
1 30% 0.40
Probability Mass Function 2 8% 0.30
0.20
⎧ −λ x 3 1% 0.10
⎪ e λ for x = 0,1,2,... 4 0.2% -
P ⎡⎣ X = x⎤⎦ = f (x | λ ) = ⎨ x! 5 0.02% 0 1 2 3 4 5

⎩ 0 otherwise
Suppose λ=0.5
Recall: P[X=0] = (e-0.5 λ0)/(0!) = (0.607)(1)/1 = 0.61
e = Euler’s number ≅ 2.71828 . . . P[X=1] = (e-0.5 λ1)/(1!) = (0.607)(0.5)/1 = 0.30
λ = distribution parameter (mean) P[X=2] = (e-0.5 λ2)/(2!) = (0.607)(0.25)/2 = 0.08
x! = factorial of x, P[X=3] = (e-0.5 λ3)/(3!) = (0.607)(0.125)/6 = 0.01
e.g., 5! = 5×4×3×2×1 = 120 P[X=4] = (e-0.5 λ4)/(4!) = (0.607)(0.0625)/24 ≈ 0.002
and 0! = 1 P[X=5] = (e-0.5 λ5)/(5!) = (0.607)(0.0312)/120 ≈ 0.0002
MIT Center for
13
Transportation & Logistics
Poisson Distribution – for different λ values
0.50

0.45
Note:
0.40 • As λ increases, the distribution becomes
more symmetric and “bell shaped”
0.35 • Value is always an integer ≥0
• The value of λ does not need to be integer
0.30

λ= 0.75
0.25
λ= 2

0.20
λ= 5
λ= 10

0.15

0.10

0.05

-
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

MIT Center for


14
Transportation & Logistics
Probability Mass Function
Poisson Distribution ⎡ ⎤
⎧ −λ x
⎪ e λ
P ⎣ X = x ⎦ = f (x | λ ) = ⎨ x!
for x = 0,1,2,...

⎩ 0 otherwise
You are running the customer complaint
center for Zippy Bright. Customer Cumulative Density Function
−λ k
x e λ
complaint calls come in ~P(2.2) per minute. ⎡ ⎤
P ⎣ X ≤ x⎦ = ∑
k=0 k!
1. What is the probability that no calls will come in over the next minute?
P[X=0] = (e-2.2 λ0)/(0!) = (0.1108)(1)/1 = 0.11 or 11%

2. What is the probability that 2 or fewer calls will come in over the next minute?
P[X=0] = (e-2.2 λ0)/(0!) = (0.1108)(1)/1 = 0.11 or 11%
P[X=1] = (e-2.2 λ1)/(1!) = (0.1108)(2.2)/1 = 0.24 or 24%
P[X≤2]62%
P[X=2] = (e-2.2 λ2)/(2!) = (0.1108)(4.84)/2 = 0.27 or 27%

3. What is the probability that at least 1 call will come in over the next minute?
P[X>0] = 1 – P[X=0] = 1 – 0.11 = 0.89 or 89%

Spreadsheet Function Prob 1 Prob 2


Microsoft Excel =POISSON.DIST(x, mean, cumulative) =POISSON.DIST(0, 2.2, 0) =POISSON.DIST(2, 2.2, 1)
Google Sheets =POISSON(x, mean, cumulative) =POISSON(0, 2.2, 0) =POISSON(2, 2.2, 1)
MIT Center for
LibreOffice->Calc =POISSON(Number; Mean; C)
Transportation & Logistics
=POISSON(0; 2.2; 0) =POISSON(2; 2.2; 1) 15
Key Points

MIT Center for


16
Transportation & Logistics
Key Points
• Characterize a distribution:
n Central Tendency
w Mode – value that appears most frequently
w Median – value in the “middle” of a distribution, separating the lower
from the higher half
w Mean (μ) – sum of values multiplied by their probability (expected value
n Spread
w Range – maximum value minus minimum value
w Inner Quartiles – 75th percentile value minus the 25th percentile value
w Variance (σ2) - expectation of the squared deviation around the mean
w Standard Deviation (σ) - Square root of the variance
w Coefficient of Variation (CV) – Standard deviation over the mean = σ/μ

n n 2 n 2
E[X] = x = µ = ∑ pi xi Var[X] = σ = ∑ pi ( xi − x ) = ∑ pi ( xi − µ )
2
i=1 i=1 i=1

MIT Center for


17
Transportation & Logistics
Key Points
• Theoretical Distributions
n Discrete Uniform PMF Rolling One Die
20%
Probability Mass Function 16%
12%
⎧ 1
8%
⎪ for a ≤ x ≤ b
P ⎡⎣ X = x⎤⎦ = f (x | a,b) = ⎨ n 4%

⎪ 0 0%
⎩ otherwise 1 2 3 4 5 6

n Poisson
0.70
Probability Mass Function 0.60
0.50
⎧ −λ x
⎪ e λ
0.40
for x = 0,1,2,... 0.30
P ⎡⎣ X = x⎤⎦ = f (x | λ ) = ⎨ x! 0.20
⎪ 0 otherwise 0.10
⎩ -
0 1 2 3 4 5
MIT Center for
18
Transportation & Logistics
Questions, Comments, Suggestions?
Use the Discussion Forum!

“Dexter, Brody, and Wilson hoping that the probability of


getting the treat is not zero. ”
MIT Center for [email protected]
MIT Center for Transportation & Logistics ctl.mit.edu
Transportation & Logistics
Managing Uncertainty II
Continuous Distributions

MIT Center for


Transportation & Logistics ctl.mit.edu
Zippy Bright DCs
Zippy Bright manufactures electric toothbrushes that are sold through large
retail outlets. Currently, they distribute one of their premiere products, the
XP219, from three Distribution Centers (East, Center, and West) to more than
3500 stores. The weekly demand that the East DC faces is shown in the data
table below.
• What can we say about the weekly demand for this DC?

Week Unit Sales Week Unit Sales Week Unit Sales Week Unit Sales Week Unit Sales
1 3595 11 2346 21 3967 31 2898 41 2196
2 3011 12 2869 22 2844 32 3713 42 3469
3 2994 13 3450 23 2546 33 2845 43 3570
4 3576 14 2031 24 2771 34 2866 44 2071
5 3697 15 3198 25 4084 35 3549 45 3247
6 2648 16 2939 26 2755 36 2365 46 4740
7 3747 17 2034 27 2641 37 2462 47 2316
8 3165 18 2476 28 2875 38 2480 48 2625
9 3412 19 2339 29 3855 39 3055 49 3973
10 2750 20 3200 30 2880 40 2453 50 3491

MIT Center for


2
Transportation & Logistics
Summary Statistics for Spreadsheets
Function Microsoft Excel Google Sheets LibreOffice->Calc

Minimum =MIN(array) =MINA(array) =MIN(array)


Median =MEDIAN(array) =MEDIAN(array) =MEDIAN(array)
Mode =MODE(array) =MODE(array) =MODE(array)
Mean (μ) =AVERAGE(array) =AVERAGE(array) =AVERAGE(array)
Maximum =MAX(array) =MAX(array) =MAX(array)
Percentile =PERCENTILE.INC(array, k) =PERCENTILE(array, =PERCENTILE.INC(array,
percentile) alpha)
=VAR.P(array) =VARP(array) =VAR.P(array)
Population Variance (σ2)
Sample Variance (σ2) =VAR.S(array) =VAR(array) =VAR.S(array)
Pop. Std Deviation (σ) =STDEV.P(array) =STDEVP(array) =STDEV.P(array)
Sample Std Deviation (σ) =STDEV(array) =STDEV(array) =STDEV.S(array)

N 2

A Note on Population versus Sample Variance . . . σ 2


=
∑ (x − µ)
i=1 i
pop
• In real-life, we usually do not know the true mean of the N
population. Instead, we need to estimate it from a sample. n 2

• An unbiased estimate of the variance is shown, s2 2


s =
∑ (x − x )
i=1 i
sample
• In practice, useforthe sample variance and standard deviation
MIT Center
Transportation & Logistics
n −1 3
Zippy Bright DC-East: Weekly Demand

Minimum 2,031 Range 2,709


Median 2,889 Inner-Quartile Range 920 25th Percentile 2,566
Mean (μ) 3,022 Variance (σ2) 356,269 50th Percentile (Median) 2,889
Maximum 4,740 Std Deviation (σ) 603 75th Percentile 3,486
MIT Center for Coefficient of Variation 0.20
4
Transportation & Logistics
Continuous Distributions

MIT Center for


5
Transportation & Logistics
Why not just “Discretize” the Data?

Which histogram should we use?


60% 40% 60%
50% 50%
30%
40% 40%
30% 20% 30%
20% 20%
10%
10%
10%
0% 0%
0%
2500 3000 3500 4000 4500 5000 2500 3500 5000
3000 3500 4000 4500 5000

MIT Center for


6
Transportation & Logistics
0
1

10%
12%

0%
2%
4%
6%
8%
15
00 1500
16 1600
00
17 1700
00
18 1800
00
19 1900
00
20
2000

MIT Center for


00
21
00 2100
22
00 2200

Transportation & Logistics


23
00 2300
24
00 2400
25
00 2500
26
00 2600
27
00 2700
28
00 2800
29
00 2900
30
00 3000
31
00 3100
32
00 3200
33
00 3300
34
00 3400
35
00 3500
36
00 3600
37
00 3700
38
00
3800
39
00
40 3900
00
41 4000
00
42 4100
00
Weekly Demand at Eastern DC

43 4200
00
44 4300
00
45 4400
00
Weekly demand at Eastern DC

46 4500
in 100 unit bins for the last year.

00
47 4600
00
48 4700
00
49 4800
00
50 4900
00
5000
7
Continuous Probability Distributions
• Differences from Discrete Random Variables
n Probability of specific value outcomes make no sense
n Probability of values within an interval is more helpful
n Cannot list all possible outcomes – instead we need to use a function
• Probability Density Function (pdf)
n Probability that X lies between values a and b is equal to area under
the curve between a and b
n Total area under the curve equals 1, but the P[X=t] = 0!

b
probability

∫ a
f (t)dt

a b t

MIT Center for


8
Transportation & Logistics
Continuous vs. Discrete Distributions
Discrete Continuous
Multiply each value by Requires integration to
its probability calculate µ and σ2
n
µ = E( X ) = ∑ pi xi b
µ = ∫ t ⋅ f (t )dt
i=1
a
n
b
σ 2 = ∑ pi (xi − µ ) 2 σ = ∫ (t − µ ) 2 ⋅ f (t )dt
2
i=1 a
probability

pmf probability pdf f(t)

t t

MIT Center for


9
Transportation & Logistics
Continuous Probability Distributions
• Cumulative Density Function (cdf)
n F(t) = P(X≤t) or the probability that X does not exceed t
n 0.0 ≤ F(t) ≤ 1.0
n F(b) ≥ F(a) if b>a – it is increasing
1.0
cdf
• Simple Rules F(t)

cumulative probability
n P(X≤t) = F(t)
n P(X>t) = 1 – F(t)
n P(c≤X≤d) = F(d) – F(c)
n P(X=t) = 0
F(a)

0
t
probability

pdf f(t)

a t
MIT Center for
10
Transportation & Logistics
Continuous Distributions

MIT Center for


Source: Wikipedia 11
Transportation & Logistics
Uniform Distribution

MIT Center for


12
Transportation & Logistics
Uniform Distribution
We would say,
“X is uniformly distributed over
the range a to b, or X~U(a,b)”

⎧ 1
⎪ if a ≤ t ≤ b
f (t | a, b) = ⎨ b − a
⎪⎩ 0 otherwise ⎛1⎞
(
Mean = ⎜ ⎟ a + b
⎝2⎠
)
⎧ 0 if t < a
⎪ ⎛1⎞
⎪ t−a (
Median = ⎜ ⎟ a + b
⎝2⎠
)
F(t | a,b) = ⎨ if a ≤ t ≤ b
⎪ b−a Mode = any value in range [a,b]
⎪ 1 if t > b ⎛1⎞ 2
⎩ Variance = ⎜ ⎟ b − a
⎝ 12 ⎠
( )
MIT Center for
13
Transportation & Logistics
Zippy Bright Transportation I
Zippy Bright has a consumer delivery unit. They distribute
product from a downtown location to all residences and offices in
the city. The deliveries are made on scooters and each customer
is delivered to directly. They found that the distance to each
customer location is ~U(2.75,6.50) kilometers.
1. What is the average distance, median distance, and CV?
We know that mean = (a+b)/2 = (2.75 + 6.50)/2 = 4.625 km which is also the median!
CV= σ/μ= √[(1/12)(b-a)2] / (a+b)/2 = √[(1/12)(6.5 – 2.75)2] / 4.625 = 1.0825 / 4.625 = 0.23

2. What is the probability that distance >5 km?


F(t) = P[X≤t], since we want to find P[X>t], we need to find 1-F(t) = 1 – (t-a)/(b-a)
= 1 – (5-2.75) / (6.5 – 2.75) = 1 – 0.6 = 0.40 or 40%.
3. What is the probability that distance is +/- 1σ of the μ?
We know that σ = 1.0825 and that μ =4.625. So, we want to find, the probability that X is
between (4.625 – 1.0825) and (4.625 + 1.0825) or [3.5425, 5.7075].
We can find this using the cdf: F(5.7075) – F(3.5425) = 0.789 – 0.211 = 0.577 = 58%
MIT Center for
14
Transportation & Logistics
Normal Distribution

MIT Center for


15
Transportation & Logistics
Normal Distribution
We would say,
“X is normally distributed with mean µ fx(x0)

and standard deviation σ, or X~N(µ, σ)” Area = Area =


P[x<μ+kσx] P[x≥μ+kσx]
Note: mean=median=mode =μ =1-P[x<μ+kσx]

⎡ 1 ⎛ x − µ ⎞2 ⎤
1 ⎢− ⎜ ⎟ ⎥
⎢⎣ 2 ⎝ σ ⎠ ⎥⎦
f ( x | µ ,σ ) = 1/ 2
e μ μ+kσx x0
(2π ) σ

Characteristics
• Most commonly used distribution – many analyses assume ~ N
• High point in ‘bell curve’ occurs at mean
• Symmetric about the mean
• The mean ‘shifts’ the distribution – but not the ‘shape’
• The standard deviation changes the ‘shape’ but doesn’t ‘shift’ it

MIT Center for


16
Transportation & Logistics
The Normal Distribution +/- 2σ
+/- σ
Normal Distribution

Common dispersion metrics ~N(µ,σ) 0.6%

• P(X w/in 1σ around µ) = .6826

Probability of X
0.5%
0.4%
• P(X w/in 2σ around µ) = .9544 0.3%

• P(X w/in 3σ around µ) = .9974 0.2%


0.1%
0.0%

• +/- 1.65 σ around µ = 0.900

0
20

25

30

35

40

45

50

55

60

65

70
• +/- 1.96 σ around µ = 0.950 Value
Normal CDF
• +/- 2.81 σ around µ = 0.995
100%
Probability of X 80%
So, what is 6σ? 60%
_10
Error occurs 9.9 x 10 of the time 40%
20%
0%
0

0
20

25

30

35

40

45

50

55

60

65

70
Value

MIT Center for


17
Transportation & Logistics
Unit or Standard Normal Distribution
• Standard Normal Distribution (z scores)
n Z~N(0,1) where Z=(X-μ)/σ
n Z score gives the number of standard deviations away from the mean
n Allows for use of standard tables
n Used extensively in inventory theory for setting safety stock
n Area under the curve is 1
n Able to assess the probability of an event
n A z score can be positive or negative

− x02

fu(u0) e 2

Area =
( )
f u u0 =

P[u<z]
Area =
P[u≥z]=
=1-P[u<z]

0 z u0
MIT Center for
18
Transportation & Logistics
Normal Functions for Spreadsheets
Function Microsoft Excel Google Sheets LibreOffice->Calc

cdf of Normal =NORM.DIST(X, μ, σ, 1) =NORMDIST (X, μ, σ, 1) =NORM.DIST (X, μ, σ, 1)


Distribution
pdf of Normal =NORM.DIST(X, μ, σ, 0) =NORMDIST (X, μ, σ, 0) =NORM.DIST (X, , μ, σ, 0)
Distribution
Inverse of Normal =NORM.INV(Probability, μ, σ) =NORMINV (Probability, μ, σ) =NORM.INV (Probability, μ, σ)
cdf
Standard Normal cdf =NORM.S.DIST(z,1) =NORMSDIST (z) =NORM.S.DIST (z,1)
Inverse Standard =NORM.S.INV(Probability) =NORMSINV (Probability) =NORM.S.INV (Probability)
Normal cdf

Examples for ~N(100, 12):


• What is P[X<85]? =NORM.DIST(85, 100, 12, 1) = 0.105 = 10.5%
• What value of X covers 75% of the probability?
= NORM.INV (0.75, 100, 12) = 108.09 = 108
• How many standard deviations does it take to cover 99.99%?
= NORM.S.INV(.9999) = 3.719
• What probability is covered by 1.65 standard deviations over the mean?
= NORM.S.DIST(1.65,1) = 0.95 = 95%
MIT Center for
19
Transportation & Logistics
Zippy Bright Transportation II
Zippy Bright has a consumer delivery unit. They distribute
product from a downtown location to all residences and offices in
the city. The deliveries are made on scooters and each customer
is delivered to directly. They found that the distance to each
customer location is ~N(4.6, 1.10) kilometers.
1. What is the average distance, median distance, and CV?
This is trivial since they are all given! Average = median = 4.6 km. CV=σ/μ=1.1/4.6 = 0.24

2. What is the probability that distance >5 km?


We want to find P[X>5] = 1- P[X≤5] = 1 - NORM.DIST(5, 4.6, 1.1, 1) = 1- 0.643 = 0.36 or 36%

3. What is the probability that distance is +/- 1σ around the mean?


By definition, 68.3%. But we could also use the cdf functions.
P[X≤5.7] – P[X≤3.5] = NORM.DIST(5.7, 4.6, 1.1, 1) - NORM.DIST(3.5, 4.6, 1.1, 1)
= 0.841 - 0.158 = 0.683 or 68.3%

MIT Center for


20
Transportation & Logistics
Triangle Distribution

MIT Center for


21
Transportation & Logistics
Triangle Distribution
2
(b − a)

We would say,
“X follows a triangle distribution
with a minimum of a, maximum
b, and a mode of c, ~T(a, b, c)”
a c b x

PDF # 0 x<a CDF (𝑑 − 𝑎)!


% 𝑎<𝑑≤𝑐
(𝑏 − 𝑎)(𝑐 − 𝑎)
% (
2 x−a ) a≤x≤c
𝐹 𝑥 =𝑃 𝑥≤𝑑 =
% 𝑏−𝑑 !
%
f (x) = $
(b − a ) ( c − a ) 1−
𝑏−𝑎 𝑏−𝑐
𝑐≤𝑑<𝑏
% 2 (b − x )
% c≤ x≤b
% (b − a ) (b − c)
%
0 x>b 𝑃 𝑥 > 𝑑 = 1 − 𝑃(𝑥 ≤ 𝑑)
&

𝑎+ 𝑃 𝑥 ≤𝑑 𝑏−𝑎 𝑐−𝑎 𝑎<𝑑≤𝑐


𝑑= 𝑏− 1−𝑃 𝑥 ≤𝑑 𝑏−𝑎 𝑏−𝑐 𝑐≤𝑑<𝑏

Characteristics
• Good way to get a sense of an unknown distribution
• People tend to recall extreme and common values
• Handles asymmetric distributions
MIT Center for
22
Transportation & Logistics
Zippy Bright Transportation III
Zippy Bright has a consumer delivery unit. They distribute 2
(b − a)
product from a downtown location to all residences and =0.25

offices in the city. The deliveries are made on scooters and


each customer is delivered to directly. No one recalls exactly
what the distance to each customer location is, but the 1 4 9
consensus is that the shortest is about 1 km, the longest is
about 9 km, and the most common is probably 4 km.
1. What is the average distance and CV?
Average = (1 + 9 + 4)/3 = 4.67 km
Var[x] = σ2= 2.72 so σ= 1.65 and CV = 1.65/4.67 = 0.36
(𝑏 − 𝑑)!
2. What is the probability that distance <=5 km? 𝑃 𝑥 ≤𝑑 =1− 𝑐≤𝑑<𝑏
(𝑏 − 𝑎)(𝑏 − 𝑐)
We want to find P[X<=d] where d=5. Since 4 ≤ 5 < 9, we select the
case where c≤d<b, plug in the equation and find P[x<=5] = 60%.
That is, the probability that a delivery is within 5 km is 60%.

3. What is the probability that distance >=5 km?


We want to find P[X>d] where d=5. We know that the P[X>5] = 1 – P[X≤5].
And, we just found that P[X≤5] = 0.60, so, P[X>5] = 1 – 0.60 = 0.40. So the
MIT Center for
probability that a delivery is longer than 5 km is 40%.
23
Transportation & Logistics
Zippy Bright Transportation III
Zippy Bright has a consumer delivery unit. They distribute 2
(b − a)
product from a downtown location to all residences and =0.25

offices in the city. The deliveries are made on scooters and


each customer is delivered to directly. No one recalls exactly
what the distance to each customer location is, but the 1 4 9
consensus is that the shortest is about 1 km, the longest is
about 9 km, and the most common is probably 4 km.
(𝑑 − 𝑎)!
4. What is the probability that distance >2 km? 𝑃 𝑥≤𝑑 = 𝑎<𝑑<𝑐
(𝑏 − 𝑎)(𝑐 − 𝑎)

We want to find P[X>d] where d=2 or P[X>2] which is = 1 – P[X≤2]. Because 1 ≤ 2 < 4, we
select the case where a≤d<c, plug in the equation and find P[X<=2] = 4.2% so that the
probability that a delivery is longer than 2 km is 1 – 0.042 = 0.958 or 95.8%

5. What is the distance that 90% of the trips will be shorter than?
We want to find d where the P[X≤d] = 0.90. This is obviously on the right hand side of
the distribution, but we can check by looking at P[X≤c] = (c-a)/(b-a) = 3/8 = 0.375. Since
90% is larger than this, we are on the right hand side.

To find d we select the case where c≤d<b and plug in the


equation. This gives us 9 – sqrt[(0.10)(8)(5)] = 7 km. That
MIT Center for
Transportation & Logistics
is, 90% of all deliveries will be less than 7 km in length. 24
Key Points from Lesson

MIT Center for


25
Transportation & Logistics
Key Points from Lesson

MIT Center for


26
Transportation & Logistics
Questions, Comments, Suggestions?
Use the Discussion Forum!

“Uniformly fun!”

MIT Center for [email protected]


MIT Center for Transportation & Logistics ctl.mit.edu
Transportation & Logistics

You might also like