Some Stats Concepts
Some Stats Concepts
1 Why impossible? Some populations are infinite (the set of all possible coin tosses) and some are
• Unbiased - said of estimators: when the expected value of the sample estimator is
equal to the population parameter. The sample mean is unbiased 𝐸 [𝑥̅ ]=µ, where µ
the population average, if the data come from a random sample.
o Upward biased: if the expected value of the estimator is above the true value
of the population parameter, e.g., 𝐸 [𝑥̅ ]>µ. Downward biased is the opposite.
• Variance: the expected squared deviation of a random variable from its mean. If the
population mean of a random variable X is 𝜇, the variance is defined as 𝐸 [(𝑋 − 𝜇)! ].
o The sample analog of the variance in a dataset, also known as an estimator
(see above) of the variance, replaces 𝜇 with the sample average, and adds up
"
the data and divides by N-1: ∑# 0 !
%&"(𝑋% − 𝑋 ) , where N is again the sample
#$"
size and i indexes the observations of the dataset. This also happens to be an
unbiased (see above) estimator of the variance (whereas dividing by N
instead of N-1 produces a downward biased estimator).
o It is common to denote the population variance using 𝜎 ! , where 𝜎 is the
lowercase Greek letter sigma, which represents the standard deviation
o Standard deviation -- square root of the variance. This also measures how
much variation there is in the data, but it is more useful because it is
measured in units of the original data (as opposed to squared units with the
variance). The standard deviation of a population is often denoted 𝜎.
o Note that the common usage of “the standard deviation” refers to the sample
estimator concept in a particular dataset. But in statistics and econometrics
there is also a population concept defined in terms of random variables. This
is why it is meaningful to talk about things like “the standard deviation of an
estimator” even though in practice we only typically have one sample. 2
• Covariance a general measure of relatedness of two random variables analogous to
the variance. If the population mean of a random variable X is 𝜇' and Y is 𝜇( , then
the covariance is defined as 𝐸 [(𝑋 − 𝜇' )(𝑌 − 𝜇( )].
o The sample estimator of the covariance in a dataset replaces the 𝜇’s with the
sample averages, and adds up the data and divides by N-1:
"
∑# 0 0
%&"(𝑋% − 𝑋 )(𝑌% − 𝑌 ).
#$"
o The covariance does not have very meaningful units, and its magnitude is
hard to interpret. But the sign tells you whether X and Y are positively or
negatively related.
• You may recall that the correlation, a number between -1 and +1, standardizes the
covariance by dividing by the standard deviation of each variable. It measures the
2 Note that when we do so, we are considering the situation before we have collected the sample; Xi
represents what we might get from a random draw from the population, not the actual data.
Economics 20: Econometrics Professor Ethan Lewis
Dartmouth College
strength, but not the magnitude (slope) of any linear relationship between X and Y.3
It has no units, and is typically denoted with an “r” or “R” if it is a sample estimate,
and a 𝜌 when it is a population concept.
o In a bivariate (one Y, one X) linear regression only R2 is literally the squared
correlation between Y and X. In a multivariate regression this interpretation
does not hold.
• In general, for random variables X and Y and numbers a, b, and c,
Var(aX + bY + c) = a2Var(X) + b2Var(Y) + 2abCov(X,Y).
Note that:
o Since c is just a number, it does not affect the variance. If you gave everybody
in the room $5, it would raise mean wealth in the room, but not affect the
variation in wealth.
o If X and Y are independent, as is assumed to be for different observations in a
random sample (in a simple random sample, the data are “independent and
identically distributed” or “iid”) then the covariance term disappears, so
Var(aX + bY + c) = a2Var(X) + b2Var(Y)
o Why are the constants a and b squared? Recall that the variance is the
expected value of the squared deviations. So if you multiply all the data by 2,
the variance goes up by a factor of four, not 2. The standard deviation goes
up by a factor of 2.
• The covariance of a linear combinations of random variables:
o You probably don’t need to know this, but FYI,
Cov(aX1 + bX2, cY1+dY2) = acCov(X1, Y1) + adCov(X1, Y2) + bcCov(X2, Y1) +
bdCov(X2, Y2)
for random variables X1, X2, Y1, Y2 and constants a, b, c, d. This can be derived
from the definition of covariance
o Also, note from the definition of covariance that the covariance of a variable
with itself is the variance: Cov(X,X) = Var(X).
• Standardizing a random variable means subtracting off the mean and dividing by
the standard deviation; if X has a mean of 𝜇 and a standard deviation of 𝜎, then
'$)
standardized X is *
.
3 The linear distinction is important: in extreme cases two variables could even be perfectly related but have
zero correlation (if that relationship was nonlinear)! (For example, if y = x2, y and x would be perfectly
related. However, y and x would have zero correlation: a straight slope fitted between y and x would have
zero slope. Draw a picture to see why.)
Economics 20: Econometrics Professor Ethan Lewis
Dartmouth College
o This results in a new random variable which has a mean of zero and a
standard deviation of 1.
• Cumulative Density Function (or distribution function in the case of a discrete
random variable) or “CDF” -- measures the probability that a random variable takes
on a value below a specified value. Often denoted with a capital letter function and a
lowercase argument, as in G(x). Note that the argument is a number, not a random
variable. For random variable X, G(x) = Pr(X<x).
o Probability Density Function (distribution in the case of a discrete random
variable) or “PDF.” The derivative of the CDF. Note in the case of a
continuous random variable, the PDF does not measure a probability, since
the probability that a continuous random variable takes on any particular
value is zero.
o The CDF probabilities associated with a standard -- mean zero, variance one -
- normal random variable are shown in Table G.1. on page 831 of the text.
• Any linear transformation of a normally distributed random variable is also
normally distributed. This allows us to transform a variable and look up
probabilities that it takes on values in particular ranges using standard tables, such
as those in Appendix G on page 831.
o E.g., if X is normally distributed with mean 3 and variance 4, then Pr(X<-1) =
'$+ $"$+
Pr 6 !
< !
8 = Pr(𝑍 < −2), where Z is (often) used as a symbol for a
standard normal random variable. According to the table on page 831, this
probability is 0.0228. (Do you see this? How would you instead calculate
Pr(X>-1)?)
o The sum of independent (see below for definition), normally distributed
random variables is also normally distributed
• | = "given that" or "conditional on" as in Pr(purple-people eater|one-eye, one-horn)
= probability of being a purple people eater given that you have one eye and one
horn, or E[drinks last weekend|fraternity member] = expected number of drinks
consumed last weekend by a fraternity member.
• Two random variables are independent if the probability that one takes on any
particular value is unrelated to the value the other variable takes on.4
o Observations in a simple random sample are independent. If I survey people
at random, the answers one person gives to questions will be on average
unrelated to the other respondents’ answers.
o In linear regression, we are often talk about a weaker condition, so-called
“mean independence”: E[u|X] = 0. This condition says the expected value of
4 Technically, the condition is written as gxy(X,Y) = gx(x)·gy(y) where gxy(X,Y) is the joint PDF – integrated over
a range, it gives the joint probability that X is in the specified range at the same time as Y is in the specified
range – and gx(x) and gy(y) are the PDFs of X and Y, respectively (technically called the “marginal” PDFs).
Economics 20: Econometrics Professor Ethan Lewis
Dartmouth College