Some Stats Concepts

This document provides an overview of key statistical concepts for students taking an econometrics course, including: - Definitions of population, sample, random variable, estimator, expected value, and other terms - Descriptions of different types of data like time series, cross-section, and panel data - Explanations of concepts like variance, standard deviation, covariance, and correlation - Notes that understanding these foundational statistics ideas is important for succeeding in econometrics, especially for the first problem set.

Uploaded by

Ravin Anderson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views6 pages

Some Stats Concepts

Uploaded by

Ravin Anderson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Economics

20: Econometrics Professor Ethan Lewis

Dartmouth College

Some Stats Concepts you should know

If you understand how to do hypothesis testing and how to interpret regression
coefficients, you are well-prepared to take econometrics. However, we will even review
these topics in econometrics class, if only quickly. Below are some more basic statistics
concepts that you should understand before proceeding in this course. Many of these will
be useful in the first problem set.

• Population
o The idea of a “population” is a central concept in statistics. Its characteristics
are usually considered unknown, as it may be too costly (or in some cases
impossible)1 to find them out for certain. Instead, we will attempt to use
samples, along with statistical techniques, to estimate its characteristics.
• Sample
o Simple random sample: each member of the population has an equal
probability, a priori, of being in the sample.
o Stratified random sample: within defined subgroups (e.g., by gender or race)
a random sample is taken, but subgroups may not be sampled at the rate at
which the groups are represented in the population. Used, e.g., to get larger
samples of rare subgroups that are of interest to the survey takers.
• Dataset types – are distinguished by what constitutes the “unit of observation.” See
chapter 1 of Wooldridge textbook for more extended discussion of data types.
o Time series data – unit of observation is a time period (e.g., quarterly reports
of U.S. GDP; annual sales figures for a single company)
o Cross section data – unit of observation is an “entity” (person, company).
The data contain information on each entity from roughly the same time (e.g.,
the 1990 U.S. Census of Population.)
o Pooled cross-section data– combined cross-section data from several years.
Distinguished from panel data by the fact that the “entities” need not be
exactly the same in the different years. For example, a collection of telephone
surveys of people’s political preferences taken each year, where different
people were surveyed each year but the same questions were asked, would
be a “pooled cross-section.”
o Panel data – the unit of observation is usually an entity x year. Roughly the
same variables on the same entities are obtained in several years. For
example, the National Longitudinal Survey of Youth, which we will use for a
research project in this class, interviewed kids aged 14-19 in 1979 and
followed up with additional surveys every year or 1-2 years since then.

1 Why impossible? Some populations are infinite (the set of all possible coin tosses) and some are

unobservable (such as the “counterfactual” outcome in the Rubin Causal model).

Economics 20: Econometrics Professor Ethan Lewis
Dartmouth College

• Random Variable - numerical outcome of a random trial; examples: number from

the roll of a di, survey responses from a randomly drawn sample, and estimators.
o Discrete -- takes on a finite number of values (as in the roll of a di)
o Continuous – can take on any value over a specified range. In practice what
we often loosely refer to variables as “continuous” if they take on a large
number of values. For example, we often call years of education a continuous
variable even though it typically takes on only roughly 20 values.
• Estimator - a procedure for estimating a population parameter or value with a
sample. (For example, “add up all the data and divide by the number of
observations,” = 𝑥̅ , the sample mean). Estimators are considered a random
variables because (or in contexts where) the sample is randomly chosen. For
example, the value we get for a sample average will depend on who happens to be
chosen for our sample, analogous to randomness of the outcome of rolling a di.
o Estimate - a number obtained from applying an estimator to a particular
sample. This is NOT random.
o Standard Error - the estimated standard deviation (see below) of an
estimator. It measures how much we expect the estimate would vary from
one similarly taken sample to another.
§ You may recall that the standard error of the sample mean, 𝑥̅ , is 𝑠⁄√𝑁,
where s is the standard deviation in the data and N is the sample size.
§ So for example, in a survey of 2,500 consumers on annual holiday
spending with a mean of $750 and a standard deviation of $1200, the
standard error on the mean would be $1200/50 = $24. This means in
repeated random samples of 2,500 consumers, we would expect 𝑥̅ to
vary by about $24 from one sample to another.
• Expected value - E[x] = population average
o Recall that expected value is a linear operator; that is, for an random
variables X and Y, and for any numbers a, b, and c,

E[aX + bY + c] = aE[X] + bE[Y] + c.

For example if the mean income in some population is $50,000, and you give
everybody $1000 (= c) the mean income becomes $51,000. If you then
convert everyone’s income to Euros at 1.00€/$1.33 (=a) , mean income
becomes 51,000/1.33 = 38,476€
§ Note that c is just a number -- it is not random -- so its expected value
is the number itself.
Economics 20: Econometrics Professor Ethan Lewis
Dartmouth College

• Unbiased - said of estimators: when the expected value of the sample estimator is
equal to the population parameter. The sample mean is unbiased 𝐸 [𝑥̅ ]=µ, where µ
the population average, if the data come from a random sample.
o Upward biased: if the expected value of the estimator is above the true value
of the population parameter, e.g., 𝐸 [𝑥̅ ]>µ. Downward biased is the opposite.
• Variance: the expected squared deviation of a random variable from its mean. If the
population mean of a random variable X is 𝜇, the variance is defined as 𝐸 [(𝑋 − 𝜇)! ].
o The sample analog of the variance in a dataset, also known as an estimator
(see above) of the variance, replaces 𝜇 with the sample average, and adds up
"
the data and divides by N-1: ∑# 0 !
%&"(𝑋% − 𝑋 ) , where N is again the sample
#$"
size and i indexes the observations of the dataset. This also happens to be an
unbiased (see above) estimator of the variance (whereas dividing by N
instead of N-1 produces a downward biased estimator).
o It is common to denote the population variance using 𝜎 ! , where 𝜎 is the
lowercase Greek letter sigma, which represents the standard deviation
o Standard deviation -- square root of the variance. This also measures how
much variation there is in the data, but it is more useful because it is
measured in units of the original data (as opposed to squared units with the
variance). The standard deviation of a population is often denoted 𝜎.
o Note that the common usage of “the standard deviation” refers to the sample
estimator concept in a particular dataset. But in statistics and econometrics
there is also a population concept defined in terms of random variables. This
is why it is meaningful to talk about things like “the standard deviation of an
estimator” even though in practice we only typically have one sample. 2
• Covariance a general measure of relatedness of two random variables analogous to
the variance. If the population mean of a random variable X is 𝜇' and Y is 𝜇( , then
the covariance is defined as 𝐸 [(𝑋 − 𝜇' )(𝑌 − 𝜇( )].
o The sample estimator of the covariance in a dataset replaces the 𝜇’s with the
sample averages, and adds up the data and divides by N-1:
"
∑# 0 0
%&"(𝑋% − 𝑋 )(𝑌% − 𝑌 ).
#$"
o The covariance does not have very meaningful units, and its magnitude is
hard to interpret. But the sign tells you whether X and Y are positively or
negatively related.
• You may recall that the correlation, a number between -1 and +1, standardizes the
covariance by dividing by the standard deviation of each variable. It measures the

2 Note that when we do so, we are considering the situation before we have collected the sample; Xi

represents what we might get from a random draw from the population, not the actual data.
Economics 20: Econometrics Professor Ethan Lewis
Dartmouth College

strength, but not the magnitude (slope) of any linear relationship between X and Y.3
It has no units, and is typically denoted with an “r” or “R” if it is a sample estimate,
and a 𝜌 when it is a population concept.
o In a bivariate (one Y, one X) linear regression only R2 is literally the squared
correlation between Y and X. In a multivariate regression this interpretation
does not hold.
• In general, for random variables X and Y and numbers a, b, and c,

Var(aX + bY + c) = a2Var(X) + b2Var(Y) + 2abCov(X,Y).

Note that:
o Since c is just a number, it does not affect the variance. If you gave everybody
in the room $5, it would raise mean wealth in the room, but not affect the
variation in wealth.
o If X and Y are independent, as is assumed to be for different observations in a
random sample (in a simple random sample, the data are “independent and
identically distributed” or “iid”) then the covariance term disappears, so
Var(aX + bY + c) = a2Var(X) + b2Var(Y)
o Why are the constants a and b squared? Recall that the variance is the
expected value of the squared deviations. So if you multiply all the data by 2,
the variance goes up by a factor of four, not 2. The standard deviation goes
up by a factor of 2.
• The covariance of a linear combinations of random variables:
o You probably don’t need to know this, but FYI,

Cov(aX1 + bX2, cY1+dY2) = acCov(X1, Y1) + adCov(X1, Y2) + bcCov(X2, Y1) +
bdCov(X2, Y2)

for random variables X1, X2, Y1, Y2 and constants a, b, c, d. This can be derived
from the definition of covariance
o Also, note from the definition of covariance that the covariance of a variable
with itself is the variance: Cov(X,X) = Var(X).
• Standardizing a random variable means subtracting off the mean and dividing by
the standard deviation; if X has a mean of 𝜇 and a standard deviation of 𝜎, then
'$)
standardized X is *
.

3 The linear distinction is important: in extreme cases two variables could even be perfectly related but have

zero correlation (if that relationship was nonlinear)! (For example, if y = x2, y and x would be perfectly
related. However, y and x would have zero correlation: a straight slope fitted between y and x would have
zero slope. Draw a picture to see why.)
Economics 20: Econometrics Professor Ethan Lewis
Dartmouth College

o This results in a new random variable which has a mean of zero and a
standard deviation of 1.
• Cumulative Density Function (or distribution function in the case of a discrete
random variable) or “CDF” -- measures the probability that a random variable takes
on a value below a specified value. Often denoted with a capital letter function and a
lowercase argument, as in G(x). Note that the argument is a number, not a random
variable. For random variable X, G(x) = Pr(X<x).
o Probability Density Function (distribution in the case of a discrete random
variable) or “PDF.” The derivative of the CDF. Note in the case of a
continuous random variable, the PDF does not measure a probability, since
the probability that a continuous random variable takes on any particular
value is zero.
o The CDF probabilities associated with a standard -- mean zero, variance one -
- normal random variable are shown in Table G.1. on page 831 of the text.
• Any linear transformation of a normally distributed random variable is also
normally distributed. This allows us to transform a variable and look up
probabilities that it takes on values in particular ranges using standard tables, such
as those in Appendix G on page 831.
o E.g., if X is normally distributed with mean 3 and variance 4, then Pr(X<-1) =
'$+ $"$+
Pr 6 !
< !
8 = Pr(𝑍 < −2), where Z is (often) used as a symbol for a
standard normal random variable. According to the table on page 831, this
probability is 0.0228. (Do you see this? How would you instead calculate
Pr(X>-1)?)
o The sum of independent (see below for definition), normally distributed
random variables is also normally distributed
• | = "given that" or "conditional on" as in Pr(purple-people eater|one-eye, one-horn)
= probability of being a purple people eater given that you have one eye and one
horn, or E[drinks last weekend|fraternity member] = expected number of drinks
consumed last weekend by a fraternity member.
• Two random variables are independent if the probability that one takes on any
particular value is unrelated to the value the other variable takes on.4
o Observations in a simple random sample are independent. If I survey people
at random, the answers one person gives to questions will be on average
unrelated to the other respondents’ answers.
o In linear regression, we are often talk about a weaker condition, so-called
“mean independence”: E[u|X] = 0. This condition says the expected value of

4 Technically, the condition is written as gxy(X,Y) = gx(x)·gy(y) where gxy(X,Y) is the joint PDF – integrated over

a range, it gives the joint probability that X is in the specified range at the same time as Y is in the specified
range – and gx(x) and gy(y) are the PDFs of X and Y, respectively (technically called the “marginal” PDFs).
Economics 20: Econometrics Professor Ethan Lewis
Dartmouth College

random variable u (which is this example is zero) is unrelated to the value

random variable X takes on.
o If two variables are independent or even just mean independent, they also
have zero correlation and zero covariance.

For further reference, see also:

• Wooldridge Textbook Chapter 1 and Appendices A, B, and C

• Materials from my stats class – especially problems and tests (available on Canvas)

Econ1203 Notes
67% (3)
Econ1203 Notes
35 pages
Lecture 1
No ratings yet
Lecture 1
55 pages
Preliminary Concepts On Statistical Inference
100% (1)
Preliminary Concepts On Statistical Inference
39 pages
Module On Basic Statistical Concepts
100% (2)
Module On Basic Statistical Concepts
21 pages
STAT100 - Full Course Notes
No ratings yet
STAT100 - Full Course Notes
27 pages
Basic Statistics
No ratings yet
Basic Statistics
23 pages
ECON 327 Lecture 03 Statistics
No ratings yet
ECON 327 Lecture 03 Statistics
95 pages
ECMT1010 Notes
No ratings yet
ECMT1010 Notes
84 pages
ICT 2024 Mentorship Lecture 1 Notes PDF Download
100% (4)
ICT 2024 Mentorship Lecture 1 Notes PDF Download
5 pages
Probability and Statistics, Slides
No ratings yet
Probability and Statistics, Slides
73 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
105 pages
ECONOMICS SEM 4 Notes Sakshi
No ratings yet
ECONOMICS SEM 4 Notes Sakshi
10 pages
Econ Review Stat W2 Jan2023
No ratings yet
Econ Review Stat W2 Jan2023
49 pages
Powerpoint 3 (Confidence Intervals) 2425
No ratings yet
Powerpoint 3 (Confidence Intervals) 2425
50 pages
Notebook Aug2006
No ratings yet
Notebook Aug2006
57 pages
Lecture Slides 10 UN1201
No ratings yet
Lecture Slides 10 UN1201
35 pages
Lecture Set 1
No ratings yet
Lecture Set 1
52 pages
BDU Biometrics
No ratings yet
BDU Biometrics
122 pages
Francis Diebold - Econometrics Slides
No ratings yet
Francis Diebold - Econometrics Slides
281 pages
Stats Summary Notes
No ratings yet
Stats Summary Notes
32 pages
Basic Univariate Statistics For Engineers 2019
No ratings yet
Basic Univariate Statistics For Engineers 2019
32 pages
Lecture1 - Copy (1) Copy 2
No ratings yet
Lecture1 - Copy (1) Copy 2
24 pages
Seminar 4
No ratings yet
Seminar 4
43 pages
Technical Background - Lecture Notes
No ratings yet
Technical Background - Lecture Notes
24 pages
STAT-702 Unit # 1
No ratings yet
STAT-702 Unit # 1
84 pages
1 Introduction
No ratings yet
1 Introduction
35 pages
Point Estimation
No ratings yet
Point Estimation
22 pages
Symboisis Statistics
No ratings yet
Symboisis Statistics
102 pages
Statistics in Economics and Business
No ratings yet
Statistics in Economics and Business
34 pages
Sampling Design and Analysis MTH 494 Lecture-32: Ossam Chohan Assistant Professor CIIT Abbottabad
No ratings yet
Sampling Design and Analysis MTH 494 Lecture-32: Ossam Chohan Assistant Professor CIIT Abbottabad
119 pages
CH Ii Business Stat
No ratings yet
CH Ii Business Stat
28 pages
Eco No Notes 1
No ratings yet
Eco No Notes 1
47 pages
DSAI514 Lec2 Point Estimation Part 2
No ratings yet
DSAI514 Lec2 Point Estimation Part 2
18 pages
Review of Basic Statistical Concepts Hanke
No ratings yet
Review of Basic Statistical Concepts Hanke
28 pages
IST172 Note 7
No ratings yet
IST172 Note 7
15 pages
Statistics For Economists: Lecturer: DR Omid Mazdak Email: Omid - Mazdak@kcl - Ac.uk
No ratings yet
Statistics For Economists: Lecturer: DR Omid Mazdak Email: Omid - Mazdak@kcl - Ac.uk
25 pages
Inferential Statistics: by The End of This Chapter You Should Be Able To
No ratings yet
Inferential Statistics: by The End of This Chapter You Should Be Able To
46 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Statistical Inference Vs Descriptive Statistics: Conclusions
No ratings yet
Statistical Inference Vs Descriptive Statistics: Conclusions
19 pages
Begreber Note For Statistics
No ratings yet
Begreber Note For Statistics
17 pages
Lesson 1 Introduction To Statistics
No ratings yet
Lesson 1 Introduction To Statistics
12 pages
Review of Probability and Statistics
No ratings yet
Review of Probability and Statistics
34 pages
Midterms Gec Math Adooooor
No ratings yet
Midterms Gec Math Adooooor
6 pages
F (A) P (X A) : Var (X) 0 If and Only If X Is A Constant Var (X) Var (X+Y) Var (X) + Var (Y) Var (X-Y)
No ratings yet
F (A) P (X A) : Var (X) 0 If and Only If X Is A Constant Var (X) Var (X+Y) Var (X) + Var (Y) Var (X-Y)
8 pages
Candlestick Patterns
No ratings yet
Candlestick Patterns
17 pages
Rosalie Act. 2.0
No ratings yet
Rosalie Act. 2.0
9 pages
Reading-Point Estimates of Population Mean
No ratings yet
Reading-Point Estimates of Population Mean
5 pages
Chapter 5 Introduction To Statistical Inference
No ratings yet
Chapter 5 Introduction To Statistical Inference
9 pages
Notes On Sampling and Hypothesis Testing
No ratings yet
Notes On Sampling and Hypothesis Testing
10 pages
Random Sampling, Statistics, and Estimators
No ratings yet
Random Sampling, Statistics, and Estimators
9 pages
Basic Concepts Lecture Notes
No ratings yet
Basic Concepts Lecture Notes
7 pages
Reviewer For Statistics
No ratings yet
Reviewer For Statistics
4 pages
Basics
No ratings yet
Basics
8 pages
Stat Viva
No ratings yet
Stat Viva
10 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
7 pages
QR Midterm Memo
No ratings yet
QR Midterm Memo
2 pages
CH 00
No ratings yet
CH 00
4 pages
Intro To Data Analysis, Economic Statistics and Econometrics
No ratings yet
Intro To Data Analysis, Economic Statistics and Econometrics
9 pages
MEAN, MEDIAN & MODE Mock Test - Solution
No ratings yet
MEAN, MEDIAN & MODE Mock Test - Solution
9 pages
MFL71675701 00 211130 Web
No ratings yet
MFL71675701 00 211130 Web
420 pages
Synovus Bank-AL, FL, GA, TN, SC
No ratings yet
Synovus Bank-AL, FL, GA, TN, SC
2 pages
(Ebook PDF) Keay's Insolvency: Personal & Corporate Law and Practice 10th Editionpdf Download
100% (3)
(Ebook PDF) Keay's Insolvency: Personal & Corporate Law and Practice 10th Editionpdf Download
46 pages
Babylon - Ticket Details-1
No ratings yet
Babylon - Ticket Details-1
1 page
4 - BSM and Binominal
No ratings yet
4 - BSM and Binominal
93 pages
SuchMod Rpy
No ratings yet
SuchMod Rpy
4 pages
Golden Weld Procedure
No ratings yet
Golden Weld Procedure
9 pages
Barcode Books
No ratings yet
Barcode Books
81 pages
Danube Properties Development L.L.C
No ratings yet
Danube Properties Development L.L.C
4 pages
Document - Typing
No ratings yet
Document - Typing
72 pages
The Loom of Tradition: "Shawls As Artware"
No ratings yet
The Loom of Tradition: "Shawls As Artware"
13 pages
Student Fee Slip
No ratings yet
Student Fee Slip
2 pages
SET - Listed Firm
No ratings yet
SET - Listed Firm
12 pages
BS Lesson 1
No ratings yet
BS Lesson 1
49 pages
Account Statement 01 Apr 2024-25 Sep 2024
No ratings yet
Account Statement 01 Apr 2024-25 Sep 2024
2 pages
MV Friendly Seas - Inpection Report - Ccic Anhui - Punta Chungo - October - 20010153
No ratings yet
MV Friendly Seas - Inpection Report - Ccic Anhui - Punta Chungo - October - 20010153
11 pages
Doddaballapura Companies
No ratings yet
Doddaballapura Companies
23 pages
Module 13
No ratings yet
Module 13
8 pages
SWOT Analysis - FDI in India
No ratings yet
SWOT Analysis - FDI in India
6 pages
21.10.2022 - MSP - Coin Flipping Game
No ratings yet
21.10.2022 - MSP - Coin Flipping Game
19 pages
5 Wallet Balance+new Machine Introduction
No ratings yet
5 Wallet Balance+new Machine Introduction
10 pages
Tax Invoice: From
No ratings yet
Tax Invoice: From
1 page
Family Protection
No ratings yet
Family Protection
5 pages
Managerial Accounting Practice Midterm
No ratings yet
Managerial Accounting Practice Midterm
3 pages
03 URS GENERATED FORMAT FAR No. 4 March 2021
No ratings yet
03 URS GENERATED FORMAT FAR No. 4 March 2021
2 pages
Sabas-Ttm104-Midterm Activity1
No ratings yet
Sabas-Ttm104-Midterm Activity1
2 pages
Domestic Markets & Monetary Management Department: Provisional 5february2021 Being Public Holiday
No ratings yet
Domestic Markets & Monetary Management Department: Provisional 5february2021 Being Public Holiday
1 page
Statistics Super Review
From Everand
Statistics Super Review
Statistics Study Guides
2/5 (1)
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet

Some Stats Concepts

Uploaded by

Some Stats Concepts

Uploaded by

Economics

20: Econometrics Professor Ethan Lewis

Some Stats Concepts you should know

unobservable (such as the “counterfactual” outcome in the Rubin Causal model).

• Random Variable - numerical outcome of a random trial; examples: number from

random variable u (which is this example is zero) is unrelated to the value

For further reference, see also:

• Wooldridge Textbook Chapter 1 and Appendices A, B, and C

You might also like