0% found this document useful (0 votes)
41 views104 pages

Sasin DECS 434 Session 1 and 2 - Probability and Excel

The document discusses probability concepts including the central limit theorem. It explains that as the number of independent trials of any random variable increases, the distribution of the sum of those random variables will approach a normal distribution. It also provides an example of using the central limit theorem to understand the distribution of averages.

Uploaded by

rawich
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views104 pages

Sasin DECS 434 Session 1 and 2 - Probability and Excel

The document discusses probability concepts including the central limit theorem. It explains that as the number of independent trials of any random variable increases, the distribution of the sum of those random variables will approach a normal distribution. It also provides an example of using the central limit theorem to understand the distribution of averages.

Uploaded by

rawich
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 104

Probability &

Excel Review
Professor Brett Saraniti
Module 3, 2022
 Write down any REAL number on a piece of paper
 No “infinity” or 3+2i
 ANY real number you want. Up to you.
 You meet people sequentially and you need to make a yes/no
decision BEFORE you consider the next candidate.
 You can not tell someone NO and then find them later & beg
 You know how many candidates there will be, but nothing else
 Goal: Pick the best person.
 Suggestion: Let the first one go, then pick the first
candidate afterwards who is better than the one
you let go. How often do you “win”?

 3 out of 6 = 50 percent!! ORDER BEST?


1 2 3 NO

1 3 2 NO

2 1 3 YES

2 3 1 YES

3 1 2 YES

3 2 1 NO
 Let the first two go, then pick the first one
afterwards who is better than both of the first
two. How often do you “win”?

 Trouble -- there are 120 permutations… We can


do it in Excel, but is there a better way?

 To Excel…
1 2 3 4 5

Define r = the number you let pass by


n = the total number of applicants
i = position of best person

In this case, we’ll consider: r = 2 & n = 5


If i = 1 or 2, we lose. We let the best candidate get away!
If i = 3, we win for sure!!
What happens if i = 4 or 5?
i= 1 2 3 4 5
Probability
1/5 1/5 1/5 1/5 1/5
at location i
Chances we
0 0 1 2/3 1/2
pick the best

The expected value of the probability that we find “the best one”

1/5(0) + 1/5(0) + 1/5(1) + 1/5(2/3) + 1/5(1/2) = 13/30 = .4333


i= 1 2 … r r+1 r+2 … n
Prob at
1/n 1/n 1/n 1/n 1/n 1/n
location i
Chances
we pick 0 0 0 r/r r/(r+1) r/n-1
the best

What is the expected value of our chances?

1 1 1 1 𝑟𝑟 1 𝑟𝑟 1 𝑟𝑟
0 + 0+…+ 0 + + +…+
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑟𝑟 𝑛𝑛 𝑟𝑟+1 𝑛𝑛 𝑛𝑛−1

𝑟𝑟 1 1 1 1
= + + + ⋯+
𝑛𝑛 𝑟𝑟 𝑟𝑟+1 𝑟𝑟+2 𝑛𝑛−1
𝑟𝑟 1
+
1
+
1
+ ⋯+
1
≈ …from the
𝑛𝑛 𝑟𝑟 𝑟𝑟+1 𝑟𝑟+2 𝑛𝑛−1 last slide
𝑛𝑛

(𝑟𝑟/𝑛𝑛) �(1/𝑥𝑥)𝑑𝑑𝑑𝑑 = (𝑟𝑟/𝑛𝑛) × [l𝑛𝑛( 𝑛𝑛) − l𝑛𝑛( 𝑟𝑟)] =


𝑟𝑟
(𝑟𝑟/𝑛𝑛) × l𝑛𝑛( 𝑛𝑛/𝑟𝑟) = −(𝑟𝑟/𝑛𝑛) × l𝑛𝑛( 𝑟𝑟/𝑛𝑛)

This function is maximized when r = n/e


At its maximum, it is equal to 1/e REGARDLESS OF “n”

Recall e = 2.7182818…

Since 1/e is about .37, this is called the 37% rule


DECS 434
 Class participation:
 Quality, not quantity
 Transformative Action Learning
 Preparation; Cold calling

 Course readings:
 On Sasinware

 Types of work:
 In-class projects
 Group homework assignments
 Exam
In Class Exercises
20 percent

Final Exam
40 percent

Team Homework
30 percent

Attendance
10 percent
13
 What is the 95% worst
case scenario?

 If we have a really bad


day, how much will we 95%
lose?

Typical Day

What is this outcome here?


 NORM.INV(p, µ, σ)
gives the value x
such that the
probability of an
outcome smaller
than x equals p

µ
NORM.INV(p, µ, σ)
• Example:
• µ =
• σ=
• NORM.INV(0.05, , ) =

• On a “really bad day,” the stock


price will fall % or more.
What number
goes here?
• 95% of the time, it won’t be that bad.
 For measuring the level of PRECISION for an estimator, we always use a two-sided
confidence interval.
 On rare occasions, people are interested in a one-sided confidence interval
Approximate
distribution of BAC
for 160 lb. men after
two drinks

95%

0.07
https://www.nytimes.com/interactive/2019/07/08/upshot/nyc-subway-variability-calculator.html
Marginal
Y Distribution of X
4 14
20 .30 .20 .50
X
60 .10 .40 .50 E(X) = .50(20) + .50(60) = 40
.40 .60

Marginal
Distribution of Y

E(Y) = .40(4) + .60(14) = 10


 From last slide:
 E(X) = 40
 E(Y) = 10

 Var(X) = .50(20-40)2+.50(60-40)2 = 400

 SD(X) = 400 = 20

ALL of these traits of X & Y are


 Var(Y) = .40(4-10)2+.60(14-10)2 = 24 computed from their marginal
 SD(Y) = 24 = 4.9 (i.e plain old individual)
probability distributions
 X & Y both vary.
 But they do not vary independently!
 The way they vary is correlated…

 In this case, the correlation is positive


 When X is below average, Y is also likely to be below average
 When X is above average, Y tends to be above its average

 The way we measure the connections between the two


random variables is called the Covariance 
Y
4 14
20 .30 .20
X
60 .10 .40

 Covariance is the way we measure relationships between variables.


 The Covariance (X,Y) = the sum of P(X=x,Y=y) [x-E(X)][y-E(Y)] for all (x,y) pairs

Joint
Cross Deviations
Probabilities
of X & Y

 COV(X,Y) = .30(20 - 40)(4 – 10) +


.20(20 - 40)(14 – 10) +
.10(60 - 40)(4 – 10) +
.40(60 - 40)(14 – 10) = 40
 The covariance can be small for two reasons:
 The two random variables do not tend to move together very much
 The two random variable just don’t vary all that much

 The covariance can be large for two reasons:


 The two random variables generally move together quite closely
 The two random variables vary an awful lot

 For this reason we often prefer a more intuitive concept: correlation


 The correlation between X and Y is defined as follows

Cov(X,Y)
Corr(X,Y) =
SD(X) ∙ SD(Y)

 Mathematically, we can show the following to be true:

−1 ≤ Corr 𝑋𝑋, 𝑌𝑌 ≤ 1
 There are FOUR equally likely (X,Y) pairs
 E(X) = 10.5 E(Y) = 27.5 X Y
0 40
 COV(X,Y) = ¼ (0-10.5)(40-27.5) +
¼ (5-10.5)(35-27.5) + 5 35
¼ (12-10.5)(25-27.5) + 12 25
¼ (25-10.5)(10-27.5) 25 10
= -107.5
 If we have a population of data, there are analogous functions to measure
synchrony.

=COVARIANCE.P(A2:A6,B2:B6) computes the covariance


of X and Y for the data presented.

For a set of data, we define the covariance to be the average of the cross
deviation pairs, treating each pair of X and Y as having equal probability.

1
COV X, Y = The Sum of [x-E(X)][y-E(Y)] for all N pairs of (x,y)
𝑁𝑁
 Now that we learned about Covariance, we can describe the
traits of a linear combination of X & Y…

 Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y)


 Var(aX+bY) = a2Var(X) + b2Var(Y) + 2abCov(X,Y)
 Note: When X and Y are independent, Cov(X,Y) = 0

 E(aX+bY) = aE(X) = bE(Y)


 Note: This is true regardless of their independence.
 If we mix our investment between two funds with returns X and Y, the
variance (i.e. RISK) of the portfolio will often be less than the variance (i.e.
RISK) of either of the two individual investments. Why?

 P = qX + (1-q)Y
 Var(P) = q2 Var(X) + (1-q)2 Var(Y) + 2q(1-q)Cov(X,Y)

 The same principle holds true for 3 or 4 or 5 investments…


“I know of scarcely anything so apt to impress the imagination
as the wonderful form of cosmic order expressed by [what you
are about to learn]. [It] would have been personified by the
Greeks if they had known of it. It reigns with serenity and
complete self-effacement amidst the wildest confusion. The
larger the mob, the greater the apparent anarchy, the more
perfect is its sway. It is the supreme law of unreason.”

- Sir Francis Galton 1889


The Central Limit Theorem
For any distribution of X, when n is large, the sum of n
independent trials of X, is approximately normally distributed.
S = X1 + X2 + … + Xn

S ~ Normal (nµ, 𝑛𝑛𝜎𝜎)

Where E(X) = μ and Var(X) = σ2


 S = X1 + X 2 + … + Xn  Knowing the traits of S that we
compute here don’t require any big
 E(S) = E(X1 + X2 + … + Xn ) deal theorems!!
= E(X1) + E(X2) + … + E( Xn)  It’s the SHAPE that is the big deal!
= μ + μ + …+ μ  S ~ Normal (nµ, 𝑛𝑛 𝜎𝜎)
= nμ
 Var(S) = Var(X1 + X2 + … + Xn )
= Var(X1) + Var(X2) + … + Var( Xn)
= σ 2 + σ 2 + … + σ2
= nσ2
 SD(S) = nσ2 = 𝑛𝑛 𝜎𝜎
S is the SUM of n independent random variables S = X1 + X2 + … + Xn
CLT tells us S ~ Normal (nµ, 𝑛𝑛 𝜎𝜎)

A is the AVERAGE of n independent random variables


X1 + X2 + … + Xn 𝑆𝑆 1
A= = = S
n n n

1 1 1
E(A) = E S = E S = (nμ) = μ
n n n
1 1 2 1 2 𝜎𝜎 2
Var(A) = Var S = Var S = (n𝜎𝜎 ) =
n n n2 𝑛𝑛
𝜎𝜎
So  A ~ Normal (µ, )
n
 Galton Podcast on Planet Money, August 7 2015

 http://www.npr.org/sections/money/2015/08/07/429720443/17-205-
people-guessed-the-weight-of-a-cow-heres-how-they-did
“The distribution of the
estimates about their
middlemost value was
of the usual type, so
far that they clustered
closely in its
neighborhood and
became rapidly more
sparse as the distance
from it increased.”
X1 + X2 + … + Xn
S = X1 + X2 + … + Xn A=
n

S, the sum of a large number of independent, identically distributed


random variables can be approximated by a Normal distribution.
 with an expected value of nμ
 and a standard deviation of 𝑛𝑛𝜎𝜎

A, the average of a large number of independent, identically distributed


random variables can be approximated by a Normal distribution.
 with an expected value of μ
𝜎𝜎
 and a standard deviation of
n
What Does A Luxury Retailer
Want to Know Before They
Open a Store in a New
Location?
 Patek Philippe would only like to open stores in locations with an average
income of $128,000 or higher.

 Most residents of Waikiki have incomes much less than $128,000 BUT the
relevant population is obviously not just the residents. It’s the tourists!!

 Census data does NOT have information about the income of tourists in
Waikiki…
To What Extent Do the
Traits of the Sample
Reflect the Traits of the
Population?
 Population: 9 Million Tourists in Waikiki
 Mean 𝝁𝝁
 Standard deviation 𝝈𝝈

 Sample size n

 Sample mean 𝑿𝑿
 Sample SD s
 Assume every tourist in Waikiki has equal probability of being included in our sample.
�?
 What does that imply about the sample mean, 𝑋𝑋

X1 + X2 + … + X𝑛𝑛 The income of the next person we


�=
 𝑋𝑋
𝑛𝑛 choose is a random variable, Xi

If the selection of each person is “random” then


all of the Xi are independent

So, thanks to the Central Limit Theorem, we


know a whole awful lot about the probability

distribution of 𝑋𝑋…
� = X1 + X2 + … + Xn
X
n
What can we say about the distribution of “X-Bar” ?

𝜎𝜎
� ~ Normal (µ,
X )
n

Looks like life is going to be SO EASY for us…

…except we don’t know μ or σ!!


� is an unbiased estimator of μ.
 𝑋𝑋
� = μ or “on average, 𝑋𝑋� is correct!”
 In English, that means E(𝑋𝑋)

� is an excellent “point estimate” of the population mean, μ.


 𝑋𝑋

 Also, s is an unbiased estimator of σ!! Maybe we can use them to


say some intelligent things about the mean income of the population.
Excel computes
these for us:

n = 200
𝑥𝑥̅ = 139,620 =average(A2:A201)
s = 92,700 =stdev.s(A2:A201)
 We know our best guess for tourist income is 139,620

 How good is our best guess? The Big Question:


 Is it likely to be close?
 How close?
How CLOSE are 𝑋𝑋� and μ?
 Is it ACTIONABLE?
 Are we comfortable telling our CEO that a store in Waikiki will be
profitable?
 How comfortable?
Approximate Precise

.0250 .9500 .0250

0
𝜎𝜎
� ~ Normal (µ,
X )
n

.0250 .9500 .0250

𝜎𝜎 𝜎𝜎
𝜇𝜇 − 1.96
𝑛𝑛
µ 𝜇𝜇 + 1.96
𝑛𝑛
� ≤ μ + 1.96 𝜎𝜎�
 P μ − 1.96 𝜎𝜎� 𝑛𝑛 ≤ 𝑋𝑋 = .9500
𝑛𝑛

� − μ ≤ 1.96 𝜎𝜎�
 P −1.96 𝜎𝜎� 𝑛𝑛 ≤ 𝑋𝑋 = .9500
𝑛𝑛

� − 1.96 𝜎𝜎�
 P 𝑋𝑋 ≤ μ ≤ � + 1.96 𝜎𝜎�
𝑋𝑋 = .9500
𝑛𝑛 𝑛𝑛
 Suppose we know the “true” (population) mean. Then there is a 95%
probability that the sample mean would fall into the following range:
μ − 1.96 𝜎𝜎
� 𝑛𝑛
, μ + 1.96 𝜎𝜎
� 𝑛𝑛

 Suppose we know the sample mean. Then we are 95% confident that
the true population mean, μ, falls within the following range:
𝑥𝑥̅ − 1.96 𝜎𝜎 , 𝑥𝑥̅ + 1.96 𝜎𝜎
� 𝑛𝑛 � 𝑛𝑛

� is a random variable. 𝑥𝑥̅ is just a number. Once we take the survey


 𝑋𝑋
and the randomness is “over with,” it doesn’t make sense to talk about
probability anymore… so we talk about confidence.
 Now it looks like we are good to go!!

 We define a 95% Confidence Interval for the population mean to be –

 𝑥𝑥̅ − 1.96 𝜎𝜎� 𝑛𝑛 , 𝑥𝑥̅ + 1.96 𝜎𝜎� 𝑛𝑛

 Now all we need to do is plug in the values for 𝑥𝑥,̅ σ, 𝑎𝑎𝑎𝑎𝑎𝑎 𝑛𝑛 and we’re done!!

 Oh &%@&%#!! We don’t have a value for σ… maybe we can just use s?


 In the early 1900’s, Guinness Brewery’s resident statistician, William
Gosset, figured out that we should not remain 95% confident in this
interval:

𝑥𝑥̅ − 1.96 𝜎𝜎 , 𝑥𝑥̅ + 1.96 𝜎𝜎


� 𝑛𝑛 � 𝑛𝑛

 A century ago this was a trade secret!


 Gossett published his findings anonymously (“Student”).
 Why? Guinness did not want competitors to understand the degree to
which they were using statistical quality control.
 We know:
𝑋𝑋� − 𝜇𝜇
𝑍𝑍 = 𝜎𝜎 ~𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁(0,1)
� 𝑛𝑛

 When we don’t know σ, we substitute s and use the t- Hint – Unless


distribution you are living

𝑋𝑋−𝜇𝜇
T= 𝑠𝑠 ~ 𝑡𝑡(𝑛𝑛 − 1) inside of a
textbook, you
� 𝑛𝑛 will NEVER
know σ
“For the person who is unfamiliar with N-dimensional geometry or
who knows the contributions to modern sampling theory only
from secondhand sources such as textbooks, this concept often
seems almost mystical, with no practical meaning”

--- Walker, H. M. (April 1940). "Degrees of Freedom". Journal of


Educational Psychology 31 (4): 253–269.

So… the best way to conceptualize degrees of freedom


for DECS 430 is very simple: df = n-1
95%

-4.3 -2.8 -1.96 1.96 2.8 4.3


Our 95% confidence interval is :
139,620 − 𝑡𝑡 92700 , 139,620 + 𝑡𝑡 92700
� 200 � 200

Where do we get the “t-value” for 95% confidence intervals?


=T.INV.2T(total tail probability, n - 1)
Our 95% confidence interval is:
139,620 − 1.972 92700 , 139,620 + 1.972 92700
� 200 � 200

Excel helped us find:


𝑥𝑥̅ = 139,620
s = 92,700
n = 200
t = 1.972

Conclusion: [126,694, 152,546] OR 139,620 +/- 12,926


We are doing some marketing for a top brand
in the Chicago market. Our summer intern has
suggested buying hats & jackets for Lyft & Uber drivers in the
area and paying them to wear our logo.
To estimate the “eyeballs” on our logo, we need to know the
average number of passengers per month that a Lyft or Uber
driver transports.
 We have obtained a random sample of data from 40 drivers.
We are interested in the expected number of passengers each
month across the entire population (all drivers in Chicago.)
 The population mean μ.

We have the following data from our sample:


 The sample mean 𝑥𝑥̅ = 836
 The sample standard deviation s = 229
 The sample size n = 40
 The t-value (for 39 df) t = 2.0227
Our 95% confidence interval is :
836 − 2.0227 229 , 836 + 2.0227 229
� 40 � 40

We had to find:
𝑥𝑥̅ = 836
s = 229
n = 40
t = 2.0227

Conclusion: [763, 909] OR 836 +/- 73


The formula:
Sample
size
For 95% CI,
Something a little
Sample
bigger than 1.96
mean Sample standard
deviation
� µ
𝑋𝑋−
𝑠𝑠 is called a t-statistic
� 𝑛𝑛

𝑠𝑠
� 𝑛𝑛
is called the standard error

𝑡𝑡 � 𝑠𝑠� 𝑛𝑛
is called the margin of error
Definition: Inference --

the process of arriving at some conclusion that, though it is not


logically derivable from the assumed premises, possesses some
degree of probability relative to the premises.
Example: Backdating Options
Matter of Timing: Five More Companies Show Questionable
Options Pattern --- A 20 Million-to-One Shot
By Charles Forelle and James Bandler
22 May 2006
The Wall Street Journal

“In all, Mr. Levy received 10 grants from KLA-Tencor and its
predecessor company between 1994 and 2001 -- all
preceding quick run-ups in the share price; an analysis by
The Wall Street Journal found the probability that that
pattern occurred merely by chance is tiny -- around one in
20 million.”
Example: Absolute Cheating
“Of these 93 hands, POTRIPPER won 56 times, in an
average 8.13 players per hand. Assuming all players
are equally skilled you would expect 11.49 wins.
POTRIPPER was 14.03 standard deviations above
expectations. The probability of luck this good or better
is 1 in 1.88 × 1044. It would be easier to buy a
[lottery ticket] in six different states, and hit the
jackpot all six times”

*Analysis from wizardofodds.com


Jan 15, 2008
 If the statement is TRUE, how surprising would it be to see data like this?

 Ans: Unbelievably Rare  Either we have to believe this just happens to be


an extremely unlikely result OR maybe our premise is wrong and the
statement is UNTRUE.

 We call this surprise probability the “significance” of the test or the “p-value”
 The null hypothesis, Ho, is our “default” assumption.
 We will believe Ho unless the weight of the evidence forces us to reject it.

 For Phil (a.k.a. Patek Philippe) we will start out assuming that opening a
store in Waikiki is a bad idea. (i.e. the mean income is less than 128,000)
 If the evidence is convincing… we will reject that assumption in favor of the
only alternative: that opening the store is a good idea.

Formally: H0: μ ≤ 128,000  Null Hypothesis


Ha: μ > 128,000  Alternative Hypothesis
 How sure do we need to be?
 We need to accept a (small) probability of making a mistake and rejecting
the null hypothesis when it is actually true. The choice of this value
should depend on the consequences of being wrong.

 We denote this choice as α (or alpha.)


 Most people use .05 but this is quite arbitrary. “Science” is working on a
better rule!
Excel computes
these for us:

n = 200
𝑥𝑥̅ = 139,620 =average(A2:A201)
s = 92,700 =stdev.s(A2:A201)
H0: μ ≤ 128,000  Null Hypothesis
Ha: μ > 128,000  Alternative Hypothesis

𝑥𝑥̅ = 139,620
s = 92,700
n = 200
α = 0.05

How “surprising” would it be to get such a high sample mean if, in fact, the
Null Hypothesis (H0) were true?
First calculate the t-statistic

𝑋𝑋� − 𝜇𝜇0 139,620 − 128,000


𝑡𝑡 = 𝑠𝑠 = = 1.7727
� 𝑛𝑛 92,700

200

Second ask Excel the probability of getting a value this high (or higher).
= T.DIST.RT(t,n - 1)
= T.DIST.RT(1.7727,199) This is used to compute the probability
= .0389 in the RIGHT TAIL… For other tests,
we’ll use similar functions (soon.)

IF the null hypothesis were true, the chances of getting a sample mean this
extreme would be .0389.
 The p-value or significance of the test is the maximum probability
that we would observe a sample as “extreme” as we did, if the null
hypothesis were true.
 In our case, a p-value of 3.89% is generally considered “small enough”
to reject the null hypothesis.
 The decision rule is always based on p and α:

p < α so we reject the null hypothesis and conclude μ > 128,000


 The Factory Manager at Steve’s Footwear in Indonesia needs to
keep their ovens at 375 degrees Fahrenheit throughout the day
in order to melt rubber soles optimally. Sometimes the ovens
are too hot or too cold which will lead to an unacceptable rate of
product defects for Steve’s many international clients.
 She sets up the test:
When 𝑋𝑋� can be “surprisingly” large or
Innocent  H0: μ = 375 “surprisingly” small, we are describing a
Guilty  Ha: μ ≠ 375 TWO-TAILED hypothesis test.

We will reject Ho if the sample mean is a


lot bigger than 375 OR if it is a lot smaller
than 375
-1.74 1.74

 We compute the relevant information from the raw data


x� = 391, s = 71.2, and n = 60. Let’s use α = .05 The p-value for a
two-tailed test is
391 − 375 given by the area in
𝑡𝑡 = = 1.74
71.2� BOTH tails
60
For some crazy reason, the
p = T.DIST.2T(1.74,59) = .087 T.DIST.2T function insists on a
positive input.
The decision rule is ALWAYS the same –
p > α so we cannot reject the null hypothesis. This data does NOT allow us to
prove that the ovens are miscalibrated.
 We’ve just seen three styles of hypothesis tests –

 The structure of the testing process is the same but the specific mechanics of
each style of test will be slightly different.

 The difference is the computation of the p-value in Excel

 The decision rule is always the same:


 If p < α, reject Ho
 If p > α, do not reject Ho
H0 ≤
H0: μ ≤ 60
Ha: μ > 60
�, s, and n.
We gather data and compute: X
� −𝜇𝜇
𝑋𝑋
Plug them into this formula: 𝑡𝑡 = 𝑠𝑠
� 𝑛𝑛

Use t to compute the p-value


p = T.DIST.RT(t,n-1)

If p < α we REJECT H0
H0: μ ≤ 60
Ha: μ > 60 � = 82, s = 140, and n = 250. Let’s use α = .05
X

𝑋𝑋� − 𝜇𝜇 82 − 60
𝑡𝑡 = 𝑠𝑠 = = 2.484
� 𝑛𝑛 140�
250

p = T.DIST.RT(2.484,249) = .0068
Because this p-value is less than .05, we reject the null hypothesis
and conclude: μ > 60
H0 ≥
H0: μ ≥ 350
Ha: μ < 350
�, s, and n.
We gather data and compute: X
� −350
𝑋𝑋
Plug them into this formula: 𝑡𝑡 = 𝑠𝑠
� 𝑛𝑛

Use t to compute the p-value


p = T.DIST(t,n-1,1)

If p < α we REJECT H0
H0: μ ≥ 350
Ha: μ < 350 � = 320, s = 150, and n = 48. Let’s use α = .01
X

320 − 350
𝑡𝑡 = = −1.386
150�
48

p = T.DIST(-1.386,47,1) = .086

Because p > α we CANNOT REJECT H0.


We have not proven that μ ≥ 350; rather, we have failed to prove
that μ < 350. There is evidence to support the alternative
hypothesis, but just not enough to be convincing.
H0 =
H0: μ = 0.42
Ha: μ ≠ 0.42
We gather data and compute: X�, s, and n.
� −0.42
𝑋𝑋
Plug them into this formula: 𝑡𝑡 = 𝑠𝑠
� 𝑛𝑛

Use t to compute the p-value


p = T.DIST.2T(abs(t),n-1)

If p < α we REJECT H0
H0: μ = 0.42
� = .51, s = .92, and n = 350. Let’s use α = .05
Ha: μ ≠ 0.42 X

.51 − 0.42
𝑡𝑡 = = 1.83
.92�
350

p = T.DIST.2T(1.83,349) = .0681

Since p > α we cannot REJECT H0


We make “stuff.” We currently have a number of product lines making a wide variety of
stuff. Our amazing R&D team has discovered two new things that might be interested in
developing: Thing A and Thing B.
1. The data analytics team has gathered sample data on potential sales based on some
prototypes.
2. The financial team has projected that the sales volume will need to exceed 300 thousand
units for either thing to be profitable.
3. The operations team has informed you that they have the capacity to launch ZERO or
ONE of the new things BUT not both.
Based on the sample data below, should you launch one of the two Things and if
so, which one? Be sure to explain the logic of your decision.

Sales volume in thousands of units.


A: 330, 325, 330, 280, 305, 320, 315, 290, 305, 320, 315, 290
B: 420, 390, 270, 480, 210, 220, 260, 440, 210
Selection Bias
We are doing some marketing for a top brand
in the Chicago market. Our summer intern has
suggested buying hats & jackets for Lyft & Uber drivers in the
area and paying them to wear our logo.
To estimate the “eyeballs” on our logo, we need to know the
average number of passengers per month that a Lyft or Uber
driver transports.
 We have obtained a random sample of data from 40 drivers.
 WE gather our data by asking every driver we ride with –
“How many passengers do you transport each month?”

 Half of all Ride Share Drivers drive 800 passengers per month
 Half of all Ride Share Drivers drive ZERO passengers per month

 What is the mean of the population?


400

 What is the sample mean based on surveying all the drivers we ride with?

800
This is a problem that has dogged scientists across many
disciplines. There is a natural bias in favour of
reporting statistically significant results…Such results
are more likely to be published in academic journals and
to make the newspaper headlines. But when other
scientists try to replicate the results, the link disappears
because the initial result was a random outlier…

Most financial research applies a two standard deviation


(or “two sigma” in the jargon) test to see if the results are
statistically significant. This is not rigorous enough.
“Most of the empirical research in
finance, whether published in academic
journals or put into production as an
active trading strategy by an investment
manager, is likely false.”
 Researchers looked at the top 100 journal articles in psychology from 2008
 They collaborated with many of the original authors and teams of grad
students & professors
 Each study was replicated using the exact same standards and frameworks
 Of the 100 original papers, 97 had shown positive results. Only 3 showed
negative results.
 How many of the 97 positive results (all with p < 0.05) were reproduced in the
new study?
The driving force is likely to be “Publication Bias”

When we find that something IS meaningful, I publish it…


When we find that something IS NOT meaningful, I put it in
the drawer
 P-Hacking -- Uncovering patterns in data that can be presented as statistically
significant without first devising a specific hypothesis

 OR

 Messing around with variable selection (perhaps unconsciously) until you get
“interesting results”

 Check out:
https://fivethirtyeight.com/features/science-isnt-broken/
https://www.youtube.com/watch?v=0Rnq1NpHdmw

You might also like