Sasin DECS 434 Session 1 and 2 - Probability and Excel
Sasin DECS 434 Session 1 and 2 - Probability and Excel
Excel Review
Professor Brett Saraniti
Module 3, 2022
Write down any REAL number on a piece of paper
No “infinity” or 3+2i
ANY real number you want. Up to you.
You meet people sequentially and you need to make a yes/no
decision BEFORE you consider the next candidate.
You can not tell someone NO and then find them later & beg
You know how many candidates there will be, but nothing else
Goal: Pick the best person.
Suggestion: Let the first one go, then pick the first
candidate afterwards who is better than the one
you let go. How often do you “win”?
1 3 2 NO
2 1 3 YES
2 3 1 YES
3 1 2 YES
3 2 1 NO
Let the first two go, then pick the first one
afterwards who is better than both of the first
two. How often do you “win”?
To Excel…
1 2 3 4 5
The expected value of the probability that we find “the best one”
1 1 1 1 𝑟𝑟 1 𝑟𝑟 1 𝑟𝑟
0 + 0+…+ 0 + + +…+
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑟𝑟 𝑛𝑛 𝑟𝑟+1 𝑛𝑛 𝑛𝑛−1
𝑟𝑟 1 1 1 1
= + + + ⋯+
𝑛𝑛 𝑟𝑟 𝑟𝑟+1 𝑟𝑟+2 𝑛𝑛−1
𝑟𝑟 1
+
1
+
1
+ ⋯+
1
≈ …from the
𝑛𝑛 𝑟𝑟 𝑟𝑟+1 𝑟𝑟+2 𝑛𝑛−1 last slide
𝑛𝑛
Recall e = 2.7182818…
Course readings:
On Sasinware
Types of work:
In-class projects
Group homework assignments
Exam
In Class Exercises
20 percent
Final Exam
40 percent
Team Homework
30 percent
Attendance
10 percent
13
What is the 95% worst
case scenario?
Typical Day
µ
NORM.INV(p, µ, σ)
• Example:
• µ =
• σ=
• NORM.INV(0.05, , ) =
95%
0.07
https://www.nytimes.com/interactive/2019/07/08/upshot/nyc-subway-variability-calculator.html
Marginal
Y Distribution of X
4 14
20 .30 .20 .50
X
60 .10 .40 .50 E(X) = .50(20) + .50(60) = 40
.40 .60
Marginal
Distribution of Y
SD(X) = 400 = 20
Joint
Cross Deviations
Probabilities
of X & Y
Cov(X,Y)
Corr(X,Y) =
SD(X) ∙ SD(Y)
−1 ≤ Corr 𝑋𝑋, 𝑌𝑌 ≤ 1
There are FOUR equally likely (X,Y) pairs
E(X) = 10.5 E(Y) = 27.5 X Y
0 40
COV(X,Y) = ¼ (0-10.5)(40-27.5) +
¼ (5-10.5)(35-27.5) + 5 35
¼ (12-10.5)(25-27.5) + 12 25
¼ (25-10.5)(10-27.5) 25 10
= -107.5
If we have a population of data, there are analogous functions to measure
synchrony.
For a set of data, we define the covariance to be the average of the cross
deviation pairs, treating each pair of X and Y as having equal probability.
1
COV X, Y = The Sum of [x-E(X)][y-E(Y)] for all N pairs of (x,y)
𝑁𝑁
Now that we learned about Covariance, we can describe the
traits of a linear combination of X & Y…
P = qX + (1-q)Y
Var(P) = q2 Var(X) + (1-q)2 Var(Y) + 2q(1-q)Cov(X,Y)
1 1 1
E(A) = E S = E S = (nμ) = μ
n n n
1 1 2 1 2 𝜎𝜎 2
Var(A) = Var S = Var S = (n𝜎𝜎 ) =
n n n2 𝑛𝑛
𝜎𝜎
So A ~ Normal (µ, )
n
Galton Podcast on Planet Money, August 7 2015
http://www.npr.org/sections/money/2015/08/07/429720443/17-205-
people-guessed-the-weight-of-a-cow-heres-how-they-did
“The distribution of the
estimates about their
middlemost value was
of the usual type, so
far that they clustered
closely in its
neighborhood and
became rapidly more
sparse as the distance
from it increased.”
X1 + X2 + … + Xn
S = X1 + X2 + … + Xn A=
n
Most residents of Waikiki have incomes much less than $128,000 BUT the
relevant population is obviously not just the residents. It’s the tourists!!
Census data does NOT have information about the income of tourists in
Waikiki…
To What Extent Do the
Traits of the Sample
Reflect the Traits of the
Population?
Population: 9 Million Tourists in Waikiki
Mean 𝝁𝝁
Standard deviation 𝝈𝝈
Sample size n
�
Sample mean 𝑿𝑿
Sample SD s
Assume every tourist in Waikiki has equal probability of being included in our sample.
�?
What does that imply about the sample mean, 𝑋𝑋
𝜎𝜎
� ~ Normal (µ,
X )
n
n = 200
𝑥𝑥̅ = 139,620 =average(A2:A201)
s = 92,700 =stdev.s(A2:A201)
We know our best guess for tourist income is 139,620
0
𝜎𝜎
� ~ Normal (µ,
X )
n
𝜎𝜎 𝜎𝜎
𝜇𝜇 − 1.96
𝑛𝑛
µ 𝜇𝜇 + 1.96
𝑛𝑛
� ≤ μ + 1.96 𝜎𝜎�
P μ − 1.96 𝜎𝜎� 𝑛𝑛 ≤ 𝑋𝑋 = .9500
𝑛𝑛
� − μ ≤ 1.96 𝜎𝜎�
P −1.96 𝜎𝜎� 𝑛𝑛 ≤ 𝑋𝑋 = .9500
𝑛𝑛
� − 1.96 𝜎𝜎�
P 𝑋𝑋 ≤ μ ≤ � + 1.96 𝜎𝜎�
𝑋𝑋 = .9500
𝑛𝑛 𝑛𝑛
Suppose we know the “true” (population) mean. Then there is a 95%
probability that the sample mean would fall into the following range:
μ − 1.96 𝜎𝜎
� 𝑛𝑛
, μ + 1.96 𝜎𝜎
� 𝑛𝑛
Suppose we know the sample mean. Then we are 95% confident that
the true population mean, μ, falls within the following range:
𝑥𝑥̅ − 1.96 𝜎𝜎 , 𝑥𝑥̅ + 1.96 𝜎𝜎
� 𝑛𝑛 � 𝑛𝑛
Now all we need to do is plug in the values for 𝑥𝑥,̅ σ, 𝑎𝑎𝑎𝑎𝑎𝑎 𝑛𝑛 and we’re done!!
We had to find:
𝑥𝑥̅ = 836
s = 229
n = 40
t = 2.0227
𝑠𝑠
� 𝑛𝑛
is called the standard error
𝑡𝑡 � 𝑠𝑠� 𝑛𝑛
is called the margin of error
Definition: Inference --
“In all, Mr. Levy received 10 grants from KLA-Tencor and its
predecessor company between 1994 and 2001 -- all
preceding quick run-ups in the share price; an analysis by
The Wall Street Journal found the probability that that
pattern occurred merely by chance is tiny -- around one in
20 million.”
Example: Absolute Cheating
“Of these 93 hands, POTRIPPER won 56 times, in an
average 8.13 players per hand. Assuming all players
are equally skilled you would expect 11.49 wins.
POTRIPPER was 14.03 standard deviations above
expectations. The probability of luck this good or better
is 1 in 1.88 × 1044. It would be easier to buy a
[lottery ticket] in six different states, and hit the
jackpot all six times”
We call this surprise probability the “significance” of the test or the “p-value”
The null hypothesis, Ho, is our “default” assumption.
We will believe Ho unless the weight of the evidence forces us to reject it.
For Phil (a.k.a. Patek Philippe) we will start out assuming that opening a
store in Waikiki is a bad idea. (i.e. the mean income is less than 128,000)
If the evidence is convincing… we will reject that assumption in favor of the
only alternative: that opening the store is a good idea.
n = 200
𝑥𝑥̅ = 139,620 =average(A2:A201)
s = 92,700 =stdev.s(A2:A201)
H0: μ ≤ 128,000 Null Hypothesis
Ha: μ > 128,000 Alternative Hypothesis
𝑥𝑥̅ = 139,620
s = 92,700
n = 200
α = 0.05
How “surprising” would it be to get such a high sample mean if, in fact, the
Null Hypothesis (H0) were true?
First calculate the t-statistic
Second ask Excel the probability of getting a value this high (or higher).
= T.DIST.RT(t,n - 1)
= T.DIST.RT(1.7727,199) This is used to compute the probability
= .0389 in the RIGHT TAIL… For other tests,
we’ll use similar functions (soon.)
IF the null hypothesis were true, the chances of getting a sample mean this
extreme would be .0389.
The p-value or significance of the test is the maximum probability
that we would observe a sample as “extreme” as we did, if the null
hypothesis were true.
In our case, a p-value of 3.89% is generally considered “small enough”
to reject the null hypothesis.
The decision rule is always based on p and α:
The structure of the testing process is the same but the specific mechanics of
each style of test will be slightly different.
If p < α we REJECT H0
H0: μ ≤ 60
Ha: μ > 60 � = 82, s = 140, and n = 250. Let’s use α = .05
X
𝑋𝑋� − 𝜇𝜇 82 − 60
𝑡𝑡 = 𝑠𝑠 = = 2.484
� 𝑛𝑛 140�
250
p = T.DIST.RT(2.484,249) = .0068
Because this p-value is less than .05, we reject the null hypothesis
and conclude: μ > 60
H0 ≥
H0: μ ≥ 350
Ha: μ < 350
�, s, and n.
We gather data and compute: X
� −350
𝑋𝑋
Plug them into this formula: 𝑡𝑡 = 𝑠𝑠
� 𝑛𝑛
If p < α we REJECT H0
H0: μ ≥ 350
Ha: μ < 350 � = 320, s = 150, and n = 48. Let’s use α = .01
X
320 − 350
𝑡𝑡 = = −1.386
150�
48
p = T.DIST(-1.386,47,1) = .086
If p < α we REJECT H0
H0: μ = 0.42
� = .51, s = .92, and n = 350. Let’s use α = .05
Ha: μ ≠ 0.42 X
.51 − 0.42
𝑡𝑡 = = 1.83
.92�
350
p = T.DIST.2T(1.83,349) = .0681
Half of all Ride Share Drivers drive 800 passengers per month
Half of all Ride Share Drivers drive ZERO passengers per month
What is the sample mean based on surveying all the drivers we ride with?
800
This is a problem that has dogged scientists across many
disciplines. There is a natural bias in favour of
reporting statistically significant results…Such results
are more likely to be published in academic journals and
to make the newspaper headlines. But when other
scientists try to replicate the results, the link disappears
because the initial result was a random outlier…
OR
Messing around with variable selection (perhaps unconsciously) until you get
“interesting results”
Check out:
https://fivethirtyeight.com/features/science-isnt-broken/
https://www.youtube.com/watch?v=0Rnq1NpHdmw