0% found this document useful (0 votes)

40 views

Probability & Stats Notes

The document discusses key concepts in probability theory and statistics including: 1) Random variables can be discrete or continuous, and are described by probability mass/density functions and cumulative distribution functions. 2) Important properties of random variables include the expected value, variance, skewness and kurtosis. 3) Probability distributions can be univariate or joint, with marginal distributions obtained by summing joint probabilities.

Uploaded by

jerry vera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views

Probability & Stats Notes

Uploaded by

jerry vera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Applied Econometrics

Review of probability and statistics

Notions of Probability Theory

Random Variable
• A random variable (r.v.) is a variable whose possible values are numerical outcomes of a random
phenomenon. Example. The Bernoulli r.v. takes values {0,1} and can describe the outcomes of a coin
toss.
• There are two types of random variables, discrete and continuous.
– A discrete r.v. is one which may take only a countable number of distinct values. Examples
include the Friday night attendance at a cinema, the number of patients in a doctor’s surgery, the
number of defective light bulbs in a box of ten.
– A continuous r.v. is one which takes an infinite number of possible values. Continuous random
variables are usually measurements, such as height, weight, the time required to run a mile.
• The probability of an outcome is a positive number measuring the likelihood that an event will occur.
The sum of the probabilities of all the outcomes must be equal to one.

Discrete random variables

• The probability mass function (p.m.f) of a discrete r.v. is a list of probabilities associated with
each of its possible values.
• The cumulative distribution function (c.d.f) is a function giving the probability that the r.v. is
less than a certain value. For a discrete random variable, it is found by summing up the probabilities.
• Example. The r.v. X can take value {1, 2, 3, 4} with probabilities {0.1, 0.3, 0.4, 0.2}. The p.m.f of 3 is
Pr(X = 3) = 0.4. The c.d.f. of 3 is Pr(X ≤ 3) = 0.8. Figure 1 depicts the probability mass function
and the cumulative distribution function for this random variable.

Continuous random Variable

• The probability of a continuous r.v. is described with a continuous function called the probability
density function (p.d.f).
• The cumulative distribution function (c.d.f.) is defined as the integral of the p.d.f from the
beginning of the support up to a certain value.
R1
• Example. The r.v. X has support on (−∞, +∞) and has p.d.f. f (x). The c.d.f. of X ≤ 1 is −∞ f (x)dx.
Figure 2 depicts this situation.

Expected value
• The expected value of a r.v. X, denoted as E(X) or µX , is intuitively the long-run average value of
repetitions of the experiment it represents.

1
Probability Mass Function Cumultative Distribution Function

0.4

1.0
0.8
0.3

0.6
0.2

0.4
0.1

0.2
0.0

0.0
1 2 3 4 1 2 3 4

Figure 1: Discrete random variable.

– If X is a discrete r.v. taking N values with probability p(x), then

N
X
E(x) = xi p(xi )
i=1

– If X is a continuous r.v. with p.d.f. f (x) and support on (−∞, +∞), then
Z +∞
E(x) = xf (x)dx
−∞

• Some useful properties of the expected value are the following:

– E(X + Y ) = E(X) + E(Y )
– E(aX) = aE(X) where a is a constant.

Variance
2
• The variance of a r.v. X, denoted as V ar(X) or σX , measures the dispersion of the probability
distribution around the mean, µX = E(X).
• The variance is defined as the expected value of the squared deviation from the mean, i.e.
V ar(X) = E((X − µX )2 )
– If X is a discrete r.v. taking N values with probability p(x), then
N
X
V ar(x) = (xi − µX )2 p(xi )
i=1

– If X is a continuous r.v. with p.d.f. f (x) and support on (−∞, +∞), then
Z +∞
V ar(x) = (x − µX )2 f (x)dx
−∞

2
Probability Density Function Cumulative Distribution Function

0.4

1.0
0.8
0.3

0.6
0.2

0.4
0.1

0.2
0.0
0.0

−4 −2 0 2 4 −4 −2 0 2 4

Figure 2: Continuous random variable.

p
• A common measure of dispersion is the standard deviation, denoted as σX = V ar(X).
• Some useful properties of the variance are the following:
– V ar(X) > 0.
– V ar(aX) = a2 V ar(X) where a is a constant.
– V ar(aX + bY ) = a2 V ar(X) + b2 V ar(Y ) + 2abCov(X, Y ) where Cov(X, Y ) is the covariance
between X and Y .
• Example. Three Normal p.d.f. with different expected values and variance are depicted in Figure 3.

Skewness and Kurtosis

• The skewness of a r.v. measures the asymmetry of the probability distribution around the mean, and
it is defined as
E((X − µX )3 )
Skew(X) = 3 .
σX
– If a distribution is symmetric then Skew(X) = 0.
– If Skew(X) > 0 then P (X = x) > P (X = −x), and vice versa.
• The kurtosis of a r.v. measures the tailedness of the probability distribution, and it is defined as

E((X − µX )4 )
Kurt(X) = 4 .
σX

– If Kurt(X) > 3, the distribution is leptokurtic. If Kurt(X) < 3, the distribution is plaktikurtic.
– The higher the kurtosis, the higher the likelihood of extreme events or outliers.
• Example. Distributions with different degrees of skewness and kurtosis are depicted in Figure 4.

3
1.0
0.8 E(X)=0,Var(X)=1
E(X)=2,Var(X)=0.5
E(X)=−4,Var(X)=2
0.6
0.4
0.2
0.0

−10 −5 0 5

Figure 3: Examples of different variance and expected values.

Moments
• The r-th moment of a r.v. X is defined as E(X r ). Example. The mean of X is the first moment of X.
• In general, mean, variance, skweness and kurtosis are all functions of the moments of the r.v.
• Example. V ar(X) = E(X 2 ) − E(X)2 .
• Moments are important to establish asymptotic properties.

Joint Probability Distribution

• The joint probability distribution of two or more r.v. describes the probability of joint events. The
sum of the probability of all the possible combinations of events must sum to one.
– Joint probability distributions may involve either continuous or discontinuous r.v. or both.
• Example. An employee living in the countryside travels to its downtown office every morning. The
journey time is affected by the rainfall. These random events can be described by two bernoulli r.v. X
and Y .
## Rain (X=0) No rain (X=1)
## Slow (Y=1) 0.15 0.07
## Fast (Y=0) 0.15 0.63
• The term marginal probability distribution of a r.v. X is used to distinguish the distribution of X
from the joint distribution with other variables. It can be obtained from the joint distribution by
summing (integrating) the probabilities of all the outcomes for which Y takes a specific value.
– For two discrete r.v. with joint probability mass function p(x, y) on a set of N 2 joint outcomes,
we write the marginal mass function of Y as
N
X
p(y) = p(X = xi , Y = y)
i=1

4
0.4
0.3
density

0.2
0.1
0.0

−6 −4 −2 0 2 4 6

support

Figure 4: Normal density (black), leptokurtic (kurt=190) and symmetric density (red), leptokurtic (kurt=12)
and right-asymmetric (skew=1.5) density (green), leptokurtic (kurt=7) and left-asymmetric (skew=-1.5)
density (blue).

– For two continuous r.v. with joint density function f (x, y) on R2 , we write the marginal density of
Y as Z +∞
f (y) = f (x, y)dx
−∞

• Example continued. The marginal probability of the journey time r.v. X is 0.22 and 0.78 for the slow
and fast journey, respectively.

Conditional distribution, expectation and variance

• The conditional distribution of Y given X is the probability that Y takes value y given that X has
value x, and it is denoted as
Pr(Y = y, X = x)
Pr(Y = y|X = x) = .
Pr(X = x)
– The sum of all the conditional probabilities over y given x must sum to one.
• Example continued. The conditional probability of a slow journey time given that it is raining is
0.15/0.30 = 0.5, and it is the same for the fast journey. While unconditionally, a fast journey is three
times more likely, fast and slow journeys are equally likely when it rains.
• The conditional mean is the mean of the conditional distribution of Y given X, i.e.
k
X
E(Y |X = x) = yi Pr(Y = yi |X = x)
i=1
.
• The conditional variance is the variance of the conditional distribution of Y given X, i.e.
k
X
V ar(Y |X = x) = (yi − E(Y |X = x)))2 Pr(Y = yi |X = x)
i=1
.

5
Covariance and correlation
• Two r.v. X and Y are independent if the value of X is not informative of the value of Y , and viceversa.
Their joint distribution distribution is the product of the marginals, i.e.

Pr(X = x, Y = y) = Pr(X = x) Pr(Y = y),

and P r(Y = y|X = x) = P r(y = y).

• Given two r.v. X and Y , the covariance, denoted as Cov(X, Y ), is a measure of linear dependence
between X and Y and is written as

Cov(X, Y ) = E [(X − µX )(Y − µY )]

– For two discrete r.v. with joint probability mass function p(x, y) on a set of N 2 joint outcomes,
we write
XN X N
Cov(X, Y ) = (xi − µX )(yj − µY )p(xi , yj )
i=1 j=1

– For two continuous r.v. with joint density function f (x, y) on R2 , we write
Z +∞ Z +∞
Cov(X, Y ) = (x − µX )(y − µY )f (x, y)dxdy
−∞ −∞

• If X and Y are independent then Cov(X, Y ) = 0, implying that E(XY ) = E(X)E(Y ). The converse is
not true, i.e. zero covariance does not imply independence.
• Note that Cov(X, X) = V ar(X) and Cov(X, Y ) = Cov(Y, X).
• Given two r.v. X and Y , the correlation coefficient is defined in terms of covariance as

Cov(X, Y )
Corr(X, Y ) = p
V ar(X)V ar(Y )

– −1 ≤ Corr(X, Y ) ≤ 1.
– Corr(X, Y ) = 1 means perfect linear positive association.
– Corr(X, Y ) = −1 means perfect linear negative association.
– Corr(X, Y ) = 0 means no linear association.
• Example. Figure 5 depicts four different situations with different degrees of correlation.

6
−0.8 * x + sqrt(1 − 0.8^2) * rnorm(200)
0.8 * x + sqrt(1 − 0.8^2) * rnorm(200)
Corr(X,Y)=0.8 Corr(X,Y)=−0.8

2
2

1
1

0
0

−2
−2

−2 −1 0 1 2 3 −2 −1 0 1 2 3

x x
Corr(X,Y)=0 (Independent) Corr(X,Y)=0 (Nonlinear dependence)

2
1 2

0
rnorm(200)

2 − x^2

−2
−1

−6
−3

−2 −1 0 1 2 3 −2 −1 0 1 2 3

Figure 5: Examples of different different degrees of correlation.

Some continuous distributions

The normal distribution
• The normal or Gaussian distribution with mean µ and variance σ 2 is a r.v. denoted as N (µ, σ 2 ) with
p.d.f.
1 1 x−µ 2
f (x) = √ e− 2 ( σ )
2πσ

• The normal distribution is symmetric around the mean and concentrates the 95% of its probability
mass in the interval {µ − 1.96σ, µ + 1.96σ}.
• If X ∼ N (µ, σ 2 ), then Z = X−µ
σ ∼ N (0, 1), and N (0, 1) is called standard normal distribution, and
its c.d.f. is often denoted with Φ(·).

Pr(X ≤ c1 ) = Pr(Z ≤ d1 ) = Φ(d1 )

Pr(X ≤ c2 ) = Pr(Z ≤ d2 ) = Φ(d2 )
Pr(c1 ≤ X ≤ c2 ) = Pr(d1 ≤ Z ≤ d2 ) = Φ(d2 ) − Φ(d1 )
c1 −µ c2 −µ
where d1 = σ and d2 = σ .

The multivariate normal distribution

• Let X a vector of d r.v. with mean vector µ and covariance matrix Σ. We say that X has multivariate
normal distribution, X ∼ Nd (µ, Σ), if the joint probability density is written as:
1 1 0
f (x) = (2π)−n/2 |Σ|− 2 e− 2 (x−µ) Σ(x−µ)

• If X ∼ Nd (µ, Σ), and A and b are non-random matrix and vector, respectively, then

AX + b ∼ Nd (Aµ + b, AΣA0 )

• If d = 2, we have

7
– X = (X1 , X2 )
– µ = (µ1 , µ2 ) where µ1 = E(X1 ) and µ2 = E(X2 )

σ1 σ12
– Σ= where σ1 = V ar(X1 ), σ2 = V ar(X2 ), σ12 = Cov(X1 , X2 ).
σ21 σ2

The Chi-squared distribution

• Let Xi ∼ N (0, 1) with i = 1, . . . , N be N r.v. identically and independently distributed standard
PN
normal. The r.v. W = i=1 Xi2 has chi-squared distribution with N degrees of freedom and it is
denoted as W ∼ χ2N .
• Example. Figure 6 shows the p.d.f. of the χ2 distribution for three different values of the degrees of
freedom parameter.
0.5
0.4

df=2
df=4
0.3

df=8
p.d.f.

0.2
0.1
0.0

0 5 10 15 20

support

Figure 6: Chi-squared distribution.

The Student’s t distribution

• Let X ∼ N (0, 1) and W ∼ χ2n be two independent r.v. The r.v. Y = √XW has Student’s t distribution
n
with n degrees of freedom, and it is denoted as Y ∼ tn . For n large, the Student’s t converges to the
normal distribution.
• Example. Figure 7 shows the p.d.f. of the Students’t distribution for two different values of the degrees
of freedom parameter along with the Normal distribution.

The Snedecor’s F distribution

W/n
• Let W ∼ χ2n and U χ2m two independent r.v. The r.v. Z = U/m has Snedecor’s F distribution with n
and m degrees of freedom, and it is written as Z ∼ Fn,m .
• Example. Figure 8 shows the p.d.f. of the Snedecor’s F distribution for three different values of the
degrees of freedom parameters.

8
0.4
0.3
N(0,1)
t4
p.d.f.

0.2
t10
0.1
0.0

−4 −2 0 2 4

support

Figure 7: Student’s t distribution.

Review of statistical theory

The bottle factory. You are the CEO of a firm producing handcrafted bottles of glass. You
have only one client, and you know that to satisfy his annual demand of bottles you need to
produce a bottle every ten minutes.
The bottles are handcrafted, so the time the artisan takes to produce one bottle changes every time.
Your concern is to be able to reach the annual target, and therefore you control the production
process.
Statistical question: The production time of a bottle Y is a continuous r.v. Is the population
mean of Y , µY = E(Y ) equal to 10?

• Statistical theory can help us answering these questions, following these steps:
– Sampling
– Estimation
– Hypothesis testing

Sampling
• In statistics, the interest is always on the population distribution. The population is the collection of
all possible entities of interests. It can be made of existing or hypothetical objects and can be thought
as an infinitely large quantity.
• Example: All the bottles in the production process; Men and Women; People with a PhD.
• To make statistical inference on the population distribution, we collect data on a subset of the
population, called sample, that needs to be representative of the population itself.

9
1.0
0.8
0.6
p.d.f.

0.4

F2,4
F8,16
0.2

F4,2
0.0

0 5 10 15

support

Figure 8: Snedecor F distribution.

• In what follows, we assume that the random phenomena of interest is a random variable Y with
population mean µY and variance σY2 .
• We perform simple random sampling: Choose N individuals at random from the population,
Y1 , . . . , YN .
– We can interpret these as N copies of the same r.v. Y , and thus as N different r.v.
– Once sampled, they take a specific value.
• Example: Record N production times randomly.
• Since Y1 , . . . , Yn are sampled indipendently from the same distribution, we say they are independent
and identically distributed or i.i.d.
• Note that a sample is just one specific realization of Y1 , . . . , YN . One can obtain as many samples he
wants just re-sampling.

Estimation
• Once we have a sample Y1 , . . . , YN , we can try to obtain an estimate of the population mean µY , and
to do so we need an estimator.
• An estimator is a function of a random sample that is informative of the quantity of interest, i.e. the
population mean in our case. The estimator is a random variable, and its outcome changes with
the sample. An estimate is the outcome for a specific sample.
• A “good” estimator should satisfy some desirable properties:
– Unbiasdness: The expected value of the estimator is equal to the true quantity.
– Consistency: As the sample size increases, the uncertainty around the true value reduces.
– Efficiency: If the variance of an estimator is lower than all the other estimators, then it is efficient.

10
• The natural estimator of the population mean is the sample mean Ȳ of Y1 , . . . , YN , defined as
N
1 X
Ȳ = Yi .
N i=1

PN
• Ȳ is the least square estimator of µY , i.e. it solves the minimization problem minm i=1 (Yi − m)2 .
• The sample average Ȳ is a natural estimator of µY , but not the only one. Is it the best?
– Ȳ is an unbiased estimator of µY , i.e. E(Ȳ ) = µY .
p
– Ȳ is consistent because of the Law of Large Numbers, i.e. Ȳ −
→ µY .
– Ȳ is the most efficient among all the unbiased linear estimators of µY . Ȳ is the best linear
unbiased estimator (BLUE) of µY .
N
• The Law of Large Numbers (LLN) states that if (Yi )i=1 are i.i.d. and σY2 < ∞, the probability that
Ȳ falls within an arbitrary small interval of the true population value µY tends to one as the sample
size N increases.
• Figure 9 provides an intuitive representation of the LLN.

N=10 N=100 N=1000

50
40

40
30

30
frequency

frequency

frequency
20

20
10

10
0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

sample mean sample mean sample mean

Figure 9: Law of Large Numbers. The figure shows how the mean estimates concentrates around the true
value as the number of observations increase from 10 to 1000 through 100.

• Since individuals in the sample are randomly drawn, the sample mean is a r.v. itself, and the distribution
of Ȳ over possible samples is called sampling distribution. This plays a fundamental role in statistical
inference.
• Simple math can be used to show that the mean and variance of the sampling distribution are:
E(Ȳ ) = µY
σ2
V ar(Ȳ ) = NY

where µY = E(Y ) and σY2 = V ar(Y ).

11
• There are two approaches to derive the sampling distribution of Ȳ :
– Exact distribution or finite sample distribution. This describes the distribution of Ȳ for every
N . Unfortunately, this is not easy to obtain in general as it depends on the distribution of Y . The
only exception is when Y is i.i.d and normally distributed, because then Ȳ is normal with
mean µY and variance σY2 /N .
– Asymptotic distribution or large sample distribution. This is an approximation of the exact
distribution that holds when N is large. Derive this approximate distribution is easy using the
Law of Large Numbers and the Central Limit Theorem.
• The Central Limit Theorem establishes that the normalized sum of independent r.v. converges
toward a normal distribution.
• Under certain assumptions on the moments of Y , the sampling distribution of Ȳ is approximated by a
normal distribution as N increases, i.e.
Ȳ − E(Ȳ ) d
q −
→ N (0, 1)
V ar(Ȳ )

• This result holds regardless the distribution of Y , but the speed of convergence, i.e. how much N need
to be large for this approximation to be good depends on it.

The bottle factory. We collect N = 100 observations on the production time of the bottles.
Figure 10 plots the observations, the population mean (black line) and the sample mean (red line).
Production time (minute)
18
16
14
12
10
8
6
4

0 20 40 60 80 100

Observations
Figure 10: Bottels factory example. Sample of recorded production times.

Hyopthesis Testing
• Example continued. Can we conclude that we are under the target production level?
• The hypothesis testing (on the population mean) verifies the statistical validity of a null hypothesis,
H0 , based on the observations, against an alternative hypothesis, H1 . The test can be:

12
– One-sided(>): H0 : E(Y ) = µY,0 vs. H1 : E(Y ) > µY,0
– One-sided(<): H0 : E(Y ) = µY,0 vs. H1 : E(Y ) < µY,0
– Two-sided: H0 : E(Y ) = µY,0 vs. H1 : E(Y ) 6= µY,0
• We can compute the test statistic
Ȳ − µY,0
z= √ .
σY / N
For large N , CLT suggests that z has distribution N (0, 1), and we reject H0 if the value of z is far
enough from 0. How far?
• Hypothesis testing can lead to two types of errors:
– Type I error: Rejecting H0 when this is true.
– Type II error: Not rejecting H0 when this is false.
• We want to reach a decision on H0 controlling the probability of committing a type I error. Decisions
in statistics are never conclusive, are always subject to a significance level.
• The significance level α of the test is a pre-specified probability of incorrectly rejecting the null
hypothesis when the null is true (Type I error).
• The critical value c∗α of the test statistic is the threshold value that allows us to reach a conclusion
on H0 , knowing that we have probability α of committing a type I error.
– If the test statistic exceeds the critical value, it falls in the rejection region, and H0 is rejected.
– If the test statistic does not exceed the critical value, it falls in the acceptance region, and
H0 is not rejected.
• The critical value are quantiles of the sampling distribution that depend on the significance level α
and the alternative hypothesis. In large sample,
– One-sided(>). cv := Φ−1 (1 − α)
– One-sided(<). cv := Φ−1 (α)
– Two-sided. cv := Φ−1 ( α2 ), Φ−1 ( α2 )

• Tables with tabulated critical values are important!

• Example. A graphical representation of the sampling distribution, the significance level, and the critical
values for the two-sided test is given in Figure 11.
Ȳ −µ
• Recall that z = √Y ,
σY / N
in practice σY is unknown. An estimator of the variance of Y is the sample
variance
N
1 X
s2Y = (Yi − Ȳ )2
n − 1 i=1
p
– s2Y −
→ σY2 (LLN).
d
– s2Y exact distribution is χ2N −1 if Y is normal, otherwise s2Y −
→ N (0, 1) (CLT).
• The feasible test statistic thus becomes the t-statistic,

Ȳ − µY,0
t= √
sY / N
– The exact distribution of t is the Student’s t distribution with N − 1 d.o.f, if Y is normal.
– In large samples, the Student’s t is well approximated by the standard normal. We can thus use
the critical value of the latter to decide on H0 .

13
0.4
0.3
dnorm(z)

0.2
0.1
0.0

X X

−4 −2 0 2 4

Figure 11: The sampling distribution N(0,1), the significance level at the 5 percent level (grey area), and the
critical values (red crosses) at +1.96 and -1.96 for the two-sided test.

• In practice, if t exceeds c∗α , the test rejects H0 and we say that µY is statistically different (> or
<) from µY,0 at the significance level α.
• An alternative way to decide on H0 is based on the p-value. This is the probability of drawing a
statistic t at least as adverse to H0 as the value computed with the data, t∗ . In large samples:
∗
∗

– One-sided(>). p-value = PrH0 sȲ −µ /
√Y > Ȳ −µ
N s
√Y
∗ / N = 1 − Φ − σȲ /−µ √Y
N
Y Y Y
∗
∗

– One-sided(<). p-value = PrH0 sȲ −µ √Y < Ȳ −µ
∗
√Y = Φ − Ȳ −µ √Y
Y/ N sY / N σY / N
∗

Ȳ ∗ −µ
– Two-sided. p-value = PrH0 sȲ −µ √Y
/ N
> Ȳ −µ√Y
s∗ / N
= 2Φ − σ / N
√Y
Y Y Y

• Once the significance level is fixed (α = 5% is a common choice), we conclude that H0 is rejected in
favor of the alternative hypothesis if the computed p-value is lower than the significance level.
• Technical issues in hypothesis testing: size and power of the test.
• The size of the test is the probability that the test falsely rejects H0 when this is true.
– The test has correct size if the size of the test is equal to the significance level.
– Remark. When the size is correct, we should commit Type-I error α% of the times at the
significance level α.
– The size of the test must be correct in order to avoid the over-rejection problem, i.e. committing
type I error too often.
– Problem with the size may emerge when N is small or when the data fails to match the
assumptions underlying the asymptotics.
• The power of the test is the probability that the test rejects the null when this is false.
– The most powerful the test, the lower the probability of committing type II errors.

14
The bottle factory. My concern as a CEO of the firm is that the mean production rate stays
on one bottle every ten minute, no less no more. Therefore, I want to test
H0 : E(Y ) = 10 vs. H1 : E(Y ) 6= 10.
I collected N = 100 observations on the production time and computed Ȳ ∗ = 9.62 and s∗Y = 2.92.
The t-statistic becomes
9.62 − 10
t= = −1.27.
2.92/10
Fixing the significance level at α = 0.05, we have that for the two-sided test the critical value is
c∗α = ±1.96. Thus we do not reject H0 . See Figure 12.

9.39−10
We can compute the p-value as 2Φ − 3.05/ √
100
= 0.201143. This is larger than the significance
level, thus we do not reject H0 .
0.4
0.3
dnorm(t)

t*
0.2
0.1

X X
0.0

−4 −2 0 2 4

Figure 12: Bottels factory example. The significance level at the 5 percent level (grey area), and the critical
values (red crosses). The value of the t-statistic (blue line) and the p-value (blue area).

Confidence interval
• A (1 − α)% confidence interval (C.I.) for µY is an interval that contains the true value of µY in
(1 − α)% of the repeated samples.
• Note that the confidence interval will differ from one sample to another, it is a r.v..
• The (1 − α)% C.I. contains all the values of µY that cannot be rejected at the α% level in the
hypothesis testing, given the sample.

sY sY
C.I. := µY : Ȳ − Φ−1 (1 − α) √ , Ȳ + Φ−1 (1 − α) √
N N

• Example. We can obtain a 95% confidence interval for our production process of the bottles, .i.e.
C.I = 9.62 − 1.96 2.92 2.92

10 ; 9.62 + 1.96 10

= {9.2850; 9.9549}

15
The bottle factory. As CEO of the bottles factory you have ascertained that the company is
able to satisfy the client demand of a bottle every 10 minutes. However, you have the suspect
that artisan A is slower than artisan B.
Statistical question: The average production time of artisan A is lower than that of artisan B?
We can formalize this hypothesis as

H0 : µA − µB = 0 H1 : µa − µB < 0

and can verify it using a test for the difference in mean.

Hypothesis testing on difference in mean

• Sampling: Collect N observations for the r.v. YA and YB .
• Estimation: An estimator of µA − µB is ȲA − ȲB .
– By CLT, Ȳi ∼ N (µi , σi2 /Ni ) for large Ni with i ∈ {A, B}.
2 2
– Since YA and YB are independent, ȲA − ȲB ∼ N (µA − µB , σA /NA + σB /NB )
• Hypothesis sampling: Our general test hypothesis is

H0 : µA − µB = d0 µa − µ 6= (><)d0

• We can test H0 using the test statistic

(ȲA − ȲB ) − d0
t= 2 /N + σ 2 /N
σA A B B

s2A s2B
– This is not feasible as the variance is unknown. A consistent estimator is s2Ȳ = NA + NB
A −ȲB

(ȲA −ȲB )−d0

– A feasible test statistic is t = sȲ −Ȳ . This only valid for NA and NB large!.
A B

The bottle factory. We collect N = 100 observations from artisan A and B and compute the
sample means, ȲA = 9.36 and ȲB = 10.41. See the scatterplot in Figure 13.
Our test hyopthesis is H0 : µA − µB = 0 µa − µB < 0.
We can compute the test statistic, t = 9.36−10.41
0.94 0.96 = −5.45. The critical value at level α = 0.05
10 + 10
for the normal distribution in one-side (<) test hypothesis is -1.65. We thus reject the null
hypothesis. See Figure 14.
We can also compute the p-value of the test as
! !
(ȲA − ȲB ) − d0 (Ȳ ∗ − ȲB∗ ) − d0 (ȲA∗ − ȲB∗ ) − d0
t = Pr < A =Φ = 2e − 08
H0 sȲA −ȲB sȲ ∗ −Ȳ ∗ sȲ ∗ −Ȳ ∗
A B A B

16
12
11
Artisan B

10
9

8 9 10 11

Artisan A

Figure 13: Bottels factory example. Scatterplot of the observations from artisans A and B.
0.4
0.3
dnorm(t)

t*
0.2
0.1

X
0.0

−6 −4 −2 0 2 4 6

Figure 14: Bottels factory example. The significance level at the 5 percent level (grey area), and the critical
value (red cross) at -1.65. The value of the t-statistic (blue line) and the p-value (blue area).

Download Complete (Ebook) Introduction to the Practice of Statistics by David S. Moore & George P. McCabe & Bruce A. Craig PDF for All Chapters
100% (6)
Download Complete (Ebook) Introduction to the Practice of Statistics by David S. Moore & George P. McCabe & Bruce A. Craig PDF for All Chapters
67 pages
(eBook PDF) Understanding Business Statistics 1st Edition 2024 scribd download
100% (2)
(eBook PDF) Understanding Business Statistics 1st Edition 2024 scribd download
41 pages
A Brief Course in Mathematical Statistics 1st Edition Tanis Hogg Solution Manual
0% (1)
A Brief Course in Mathematical Statistics 1st Edition Tanis Hogg Solution Manual
8 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
28 pages
Non-Parametric Methods Using Kernel Density Estimation
No ratings yet
Non-Parametric Methods Using Kernel Density Estimation
1 page
Discrete Random Variables: Integral
No ratings yet
Discrete Random Variables: Integral
7 pages
ECO 201 Lecture 2: Dr. Anomita Ghosh
No ratings yet
ECO 201 Lecture 2: Dr. Anomita Ghosh
46 pages
LEC2&3
No ratings yet
LEC2&3
46 pages
Handout2 BasicsOf Random Variables
No ratings yet
Handout2 BasicsOf Random Variables
3 pages
Random Variables and Probability Distributions
No ratings yet
Random Variables and Probability Distributions
15 pages
Random Variables: Presented by in Stochastic Analysis and Inverse Modelling
100% (1)
Random Variables: Presented by in Stochastic Analysis and Inverse Modelling
21 pages
ST3236_Note3
No ratings yet
ST3236_Note3
17 pages
2 Random Variables and Probability Distributions
No ratings yet
2 Random Variables and Probability Distributions
23 pages
R Variables
No ratings yet
R Variables
9 pages
C0 English
No ratings yet
C0 English
42 pages
Stats Distribution Theory
No ratings yet
Stats Distribution Theory
35 pages
Basic Probability and Statistics: Random Variables Distribution Functions Various Probability Distributions
No ratings yet
Basic Probability and Statistics: Random Variables Distribution Functions Various Probability Distributions
39 pages
Chapter 3 Random Variable and Mathematical Expectation
No ratings yet
Chapter 3 Random Variable and Mathematical Expectation
42 pages
Econometrics1 2 PDF
No ratings yet
Econometrics1 2 PDF
63 pages
2.1 Random Variables 2.1.1 Definition: PX PX X
100% (1)
2.1 Random Variables 2.1.1 Definition: PX PX X
13 pages
Ma8391 Notes
No ratings yet
Ma8391 Notes
60 pages
9 Annotated 5.4 and 5.5 Fall2014
No ratings yet
9 Annotated 5.4 and 5.5 Fall2014
6 pages
AE 248: AI and Data Science: Prabhu Ramachandran 2024-01-01
No ratings yet
AE 248: AI and Data Science: Prabhu Ramachandran 2024-01-01
12 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
Distributions and Normal Random Variables
No ratings yet
Distributions and Normal Random Variables
8 pages
Notion of Random Variables and Binomial and Poisson Distribution
No ratings yet
Notion of Random Variables and Binomial and Poisson Distribution
36 pages
Ksajlkda#
No ratings yet
Ksajlkda#
36 pages
Lecture Notes
No ratings yet
Lecture Notes
23 pages
Appendix A Probability and Statistics
No ratings yet
Appendix A Probability and Statistics
12 pages
02-Random Variables
No ratings yet
02-Random Variables
77 pages
ProbabilityStatistics_Probability2 (1)
No ratings yet
ProbabilityStatistics_Probability2 (1)
11 pages
Chap2 PDF
No ratings yet
Chap2 PDF
20 pages
Statistics Review 1
No ratings yet
Statistics Review 1
32 pages
Module 2
No ratings yet
Module 2
36 pages
Lecure-3 Probability
No ratings yet
Lecure-3 Probability
80 pages
Chapter 4 Slides
No ratings yet
Chapter 4 Slides
27 pages
02 Random Variables
No ratings yet
02 Random Variables
51 pages
3. Probability
No ratings yet
3. Probability
44 pages
Probability Distributions _ Short Notes __ MHTCET Rankers 2025
No ratings yet
Probability Distributions _ Short Notes __ MHTCET Rankers 2025
3 pages
Statistics Presentation 5
No ratings yet
Statistics Presentation 5
49 pages
Probability
No ratings yet
Probability
12 pages
Continuous Random Variable and Their Properties
No ratings yet
Continuous Random Variable and Their Properties
26 pages
Probability Review Stochastic
No ratings yet
Probability Review Stochastic
23 pages
Chapter 3
No ratings yet
Chapter 3
6 pages
Probability and Statistics - 2
No ratings yet
Probability and Statistics - 2
72 pages
Stats Cheat Sheets
No ratings yet
Stats Cheat Sheets
15 pages
lecture 6. Statistics (1)
No ratings yet
lecture 6. Statistics (1)
28 pages
Advanced Statistics
100% (1)
Advanced Statistics
131 pages
LECT3 Probability Theory
No ratings yet
LECT3 Probability Theory
42 pages
PA Lec 5 2024
No ratings yet
PA Lec 5 2024
36 pages
M3L08
No ratings yet
M3L08
9 pages
PROBABILITY 03 Rv-Dist-Moments 5 8
No ratings yet
PROBABILITY 03 Rv-Dist-Moments 5 8
21 pages
CHAPTER TWO (2) S
No ratings yet
CHAPTER TWO (2) S
69 pages
3 Discrete Random Variables and Probability Distributions
No ratings yet
3 Discrete Random Variables and Probability Distributions
22 pages
8 Random Variable
No ratings yet
8 Random Variable
7 pages
Elements of Probability Theory
No ratings yet
Elements of Probability Theory
6 pages
Joint Distribution: Eral Rvs
No ratings yet
Joint Distribution: Eral Rvs
12 pages
Class Notes 2
No ratings yet
Class Notes 2
18 pages
Topic4 DiscreteRV
No ratings yet
Topic4 DiscreteRV
40 pages
Class Notes 3
No ratings yet
Class Notes 3
18 pages
PA Lec 6 2024
No ratings yet
PA Lec 6 2024
29 pages
Probability Formula Sheet
No ratings yet
Probability Formula Sheet
11 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Introduction to Differentiable Manifolds
From Everand
Introduction to Differentiable Manifolds
Louis Auslander
4.5/5 (2)
Navidi ch6
No ratings yet
Navidi ch6
82 pages
Forecasting For Business Planning: A Case Study of IBM Product Sales
No ratings yet
Forecasting For Business Planning: A Case Study of IBM Product Sales
17 pages
2024 MS4212 Summary Class Exercise
No ratings yet
2024 MS4212 Summary Class Exercise
4 pages
Instant Download (Ebook) Applied Engineering Statistics by R. Russell Rhinehart, Robert M. Bethea ISBN 9781032119489, 1032119489 PDF All Chapters
100% (10)
Instant Download (Ebook) Applied Engineering Statistics by R. Russell Rhinehart, Robert M. Bethea ISBN 9781032119489, 1032119489 PDF All Chapters
81 pages
01.multiple Linear Regression - Ipynb - Colaboratory
No ratings yet
01.multiple Linear Regression - Ipynb - Colaboratory
10 pages
Where can buy A primer of ecological statistics 2nd edition Edition Ellison ebook with cheap price
100% (6)
Where can buy A primer of ecological statistics 2nd edition Edition Ellison ebook with cheap price
81 pages
hw4 So
100% (2)
hw4 So
18 pages
ReliabilityHw1 R26104047
No ratings yet
ReliabilityHw1 R26104047
4 pages
The Need To Report Effect Size Estimates Revisited. An Overview of Some Recommended Measures of Effect Size
No ratings yet
The Need To Report Effect Size Estimates Revisited. An Overview of Some Recommended Measures of Effect Size
8 pages
One-Way ANOVA Two-Way ANOVA
No ratings yet
One-Way ANOVA Two-Way ANOVA
31 pages
TKKD
No ratings yet
TKKD
7 pages
Chi Square Ass
No ratings yet
Chi Square Ass
2 pages
Losing Control (Group) The Machine Learning Control Method For Counterfactual Forecasting
No ratings yet
Losing Control (Group) The Machine Learning Control Method For Counterfactual Forecasting
44 pages
Real Statistics Examples Distributions
No ratings yet
Real Statistics Examples Distributions
491 pages
Statistics 622: Calibration
No ratings yet
Statistics 622: Calibration
25 pages
M1112SP IIIh 3
No ratings yet
M1112SP IIIh 3
3 pages
(Ebook) Introductory Statistics by OpenStax (hardcover version, full color) by Barbara Illowsky, Susan Dean ISBN 9781938168208, 9781938168291, 9781947172050, 1938168208, 1938168291, 1947172050 - Download the entire ebook instantly and explore every detail
No ratings yet
(Ebook) Introductory Statistics by OpenStax (hardcover version, full color) by Barbara Illowsky, Susan Dean ISBN 9781938168208, 9781938168291, 9781947172050, 1938168208, 1938168291, 1947172050 - Download the entire ebook instantly and explore every detail
48 pages
Chapter 10
No ratings yet
Chapter 10
3 pages
Correlation-T-Test-ANOVA
No ratings yet
Correlation-T-Test-ANOVA
62 pages
03 Model Selection and Train-Validation-Test Sets 12 Min
No ratings yet
03 Model Selection and Train-Validation-Test Sets 12 Min
7 pages
The Bootstrap and The Jackknife
No ratings yet
The Bootstrap and The Jackknife
15 pages
Lecture 6: Classical Normal Linear Regression Model Some Basic Ideas
No ratings yet
Lecture 6: Classical Normal Linear Regression Model Some Basic Ideas
9 pages
Group Assignment Final PDF
100% (1)
Group Assignment Final PDF
13 pages
Stat 240
No ratings yet
Stat 240
3 pages
Shapiro-Wilk Test - Wikipedia
No ratings yet
Shapiro-Wilk Test - Wikipedia
3 pages

Probability & Stats Notes

Uploaded by

Probability & Stats Notes

Uploaded by

Applied Econometrics

Review of probability and statistics

Notions of Probability Theory

Discrete random variables

Continuous random Variable

Figure 1: Discrete random variable.

– If X is a discrete r.v. taking N values with probability p(x), then

• Some useful properties of the expected value are the following:

Figure 2: Continuous random variable.

Skewness and Kurtosis

Figure 3: Examples of different variance and expected values.

Joint Probability Distribution

Conditional distribution, expectation and variance

Pr(X = x, Y = y) = Pr(X = x) Pr(Y = y),

and P r(Y = y|X = x) = P r(y = y).

Cov(X, Y ) = E [(X − µX )(Y − µY )]

Figure 5: Examples of different different degrees of correlation.

Some continuous distributions

Pr(X ≤ c1 ) = Pr(Z ≤ d1 ) = Φ(d1 )

The multivariate normal distribution

The Chi-squared distribution

Figure 6: Chi-squared distribution.

The Student’s t distribution

The Snedecor’s F distribution

Figure 7: Student’s t distribution.

Review of statistical theory

Figure 8: Snedecor F distribution.

N=10 N=100 N=1000

sample mean sample mean sample mean

where µY = E(Y ) and σY2 = V ar(Y ).

• Tables with tabulated critical values are important!

and can verify it using a test for the difference in mean.

Hypothesis testing on difference in mean

• We can test H0 using the test statistic

(ȲA −ȲB )−d0

You might also like