0% found this document useful (0 votes)
54 views8 pages

F (A) P (X A) : Var (X) 0 If and Only If X Is A Constant Var (X) Var (X+Y) Var (X) + Var (Y) Var (X-Y)

The document discusses key concepts in probability and statistics including: 1. The probability density function (pdf) describes possible outcomes and probabilities of a random variable, while the cumulative density function (cdf) describes the probability a variable takes a value smaller than a given number. 2. Common distributions include the binomial, Poisson, and normal distributions. The t distribution is used for statistical inference. 3. Statistical inference involves using samples to make inferences about populations. Estimators are used to estimate parameters and should be unbiased, efficient, and consistent. Confidence intervals and hypothesis testing provide information about population parameters. 4. Selection bias can occur if the sampling procedure is not random, resulting in a non-represent

Uploaded by

Patricia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views8 pages

F (A) P (X A) : Var (X) 0 If and Only If X Is A Constant Var (X) Var (X+Y) Var (X) + Var (Y) Var (X-Y)

The document discusses key concepts in probability and statistics including: 1. The probability density function (pdf) describes possible outcomes and probabilities of a random variable, while the cumulative density function (cdf) describes the probability a variable takes a value smaller than a given number. 2. Common distributions include the binomial, Poisson, and normal distributions. The t distribution is used for statistical inference. 3. Statistical inference involves using samples to make inferences about populations. Estimators are used to estimate parameters and should be unbiased, efficient, and consistent. Confidence intervals and hypothesis testing provide information about population parameters. 4. Selection bias can occur if the sampling procedure is not random, resulting in a non-represent

Uploaded by

Patricia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

The probability density function (pdf) of X summarizes the information

concerning the possible outcomes of X and the corresponding probabilities.


The pdf of a continuous random variable is a continuous function. Because
the probability of obtaining any real value is zero, we use the pdf to
compute events that involve a range of values.
Example:
X is the time between two buses showing up at the same stop
Q: What is the probability that the next bus will arrive in more than 5
minutes but less than 10 minutes?
The cumulative density function (cdf) describes the probability that a
random variable takes a variable smaller than a given number:

F ( a )=P ( X a )
Properties:

F(-) = 0
F(+) = 1

For any number

c ,

For any numbers

P ( X >c )=1F ( c )
and

b ,

P ( a< X b )=F ( a )F ( b )

Expected Value
For any constant

c ,

E ( c )=c

the expected value of a constant is the

constant
For any constants

and

b ,

E ( aX +b )=aE ( X )+ b

Variance

2 Var (X ) E [ ( X )2 ]
2=E ( X 22 X+ 2 ) =E ( X 2 ) 2 2 + 2= E ( X 2 ) 2

Var(X) = 0 if and only if X is a constant E(X) = c


For any constants a and b, Var(aX+b) = a2Var(X)
Var(X+Y) = Var(X) + Var(Y) = Var(X-Y)

Standard deviation

sd ( X ) Var ( X )

For any constant c, sd(c) = 0


For any constants a and b, sd(aX+b) = |a|sd (X)

Standardizing a random variable

Z=

E ( Z )=

1

E ( X ) = =0

Var ( Z )=Var

X = 2 Var ( X )= 2 =1

Covariance

XY Cov(X , Y ) E [ ( X X ) ( Y Y ) ]

If X and Y are independent

Cov ( X , Y )=0

Var ( aX +bY )=E [ ( aX+ bY )2 ]E2 ( aX +bY )=E ( a2 X 2 +2 abXY +b2 Y 2 )E ( aX + bY ) E ( aX + bY )=a 2 E ( X 2) + 2
Conditional expectation

E [ c ( X )X ]=c ( X )

E [ a ( X ) Y +b ( X ) X ]=a ( X ) E ( Y | X )+ b ( X )

for any function

c (X)
for any functions

a(X)

and

b(X)

If X and Y are independent, then

E ( Y |X )= E(Y )

If X and Y are independent, then

Var ( X|Y )=Var (Y )

Correlation coefficient
Suppose we want to know the relationship between amount of education
and annual earnings in the working population. We could let X denote

education and Y denote earnings and then compute their covariance. But
the answer we get will depend on how we choose to measure education and
earnings. A covariance property implies that the covariance between
education and earnings depends on whether earnings are measured in
dollars or thousands of dollars, or whether education is measured in months
or years. It is pretty clear that how we measure these variables has no
bearing on how strongly they are related. But the covariance between them
does depend on the units of measurement. The fact that the covariance
depends on units of measurement is a deficiency that is overcome by the
correlation coefficient.

1 Corr ( X , Y ) 1 , if

Corr ( X ,Y ) =0

then there is no linear

relationship between X and Y, and they are said to be uncorrelated


random variables;
For constants a1, b1, a2 and b2 with a1a2 > 0,

Corr ( a1 X +b1 a2 Y + b2 )=Corr( X ,Y )


If a1a2 < 0

Corr ( a1 X +b1 a2 Y + b2 )=Corr ( X , Y )


The binomial distribution describes the probability of observing a given
number of successes in N independent Bernoulli trials with probability p.
The Poisson distribution describes the probability that a given number of
events occurs in a fixed time interval. The events are assumed to occur at a
known average rate and independent from each other.
The Normal distribution is a bell-shaped curve and is useful to describe
many real-world situations. Several random variables can be assumed to
normally distributed.
If X is a positive random variable, such as income, and Y = log(X) has a
normal distribution, then we say that X has a lognormal distribution. It turns
out that the lognormal distribution fits income distribution pretty well in
many countries. Other variables, such as prices of goods, appear to be well
described as lognormally distributed.
One special case of the normal distribution occurs when the mean is zero
and the variance (and, therefore, the standard deviation) is unity. If a
random variable Z has a Normal(0,1) distribution, then we say it has a
standard normal distribution.
The chi-squared distribution represents the sums of the squares of k
independent standard normal random variables:
k

X = Z
i=1

The t distribution is the main distribution used for inference in econometrics.


It is obtained from a standard normal distribution, Z, and a chi-squared
distribution, X:

T=

X
n

Statistical Inference involves learning something about a population we are


interested in.
A population is a well-defined group of subjects, such as individuals, firms,
cities, and so on. It is often impractical or even impossible to observe all the
members of a population at a given point in time.
Random sampling is the easiest sampling scheme. If Y1, Y1, . . . , Yn are
independent random variables with the same probability density function,
then {Y1, Y1, . . . , Yn} is said to be a random sample from f(y; ). Given the
random nature of the sampling procedure it is likely that we get a different
set of individuals each time we draw a random sample from a population.
An estimator of a parameter is a rule that assigns each possible outcome
of a sample a value of . There are unlimited ways of combining data to
estimate a given parameter .

Unbiasedness
An estimator, W, of a parameter is unbiased if its expected value is
the parameter : E(W) =
Unbiasedness does not mean that we always get the correct value for
the parameter . It means that we get the correct value in
expectation.
n

1
W 1= = Y 1
n i=1

E ( W 1 ) =E
o

1
1
1
Y 1 = E ( Y i )=
( n )=

n i=1
n i=1
n

()

W 2=Y 1
E ( W 2 ) =E ( Y i )=

The bias of an estimator is defined as the difference between its


expected value and the parameter :
Bias(W) E(W)

W 3=( Y 1+Y 2)
Bias:

E ( W 3 ) =E ( Y 1 +Y 2 ) =2 =

W 3=( Y 1+Y 2) /3
Bias:

E ( W 4 )=

E ( Y 1 +Y 2 )
2
1
= =

3
3
3

Knowing that an estimator is unbiased does not tell us how


informative it is, i.e., it does not tell us anything about its variance. If
an estimator has a large variance we get to know little about the
parameter we want to estimate. If the estimator has a small variance
we can be more confident about our estimate of the parameter.
n

W 1= =

1
Y
n i=1 1

E ( Y i )=

1
(n 2)=
2
n
n

( )

1
1
Var ( W 1 )=Var Y 1 = 2
n i=1
n i=1
o

W 2=Y 1
Var ( W 2 )=Var ( Y i ) = 2

W 3=( Y 1+Y 2)
Var ( W 3 )=Var ( Y 1 +Y 2 )=2 2

Efficiency
If W1 and W2 are two unbiased estimators of , W1 is efficient
relative to W2 if:
Var(W1) Var(W2) ,for all
One way to compare biased estimators is to compute the mean
squared error (MSE):
MSE(W) E[(W )

The MSE measures how far the estimator is away from , on average
it can be shown that
MSE(W) = Var(W) + Bias(W) 2
For an unbiased estimator
MSE(W) = Var(W)
An estimator W is consistent if both its bias and variance tend to zero as the
sample size increases:

P(|Wn | > ) 0 ,as n and for any > 0


The central limit theorem (CLT) states that the average from a random
sample, when standardized, approximates a standard normal distribution,
independently of the population distribution. With the CLT we have
information about the distribution of an estimator even without knowing the
distribution of the original population.
A point estimate, such as the sample mean, provides us with information
about a parameter of the population, but, by itself does not provide
information about how confident we are that our estimate is close to the
population parameter.
A confidence interval provides information about where the population
parameter is likely to lie.
Hypothesis testing is a method to draw insights from the population based
on a sample. Failing to reject the null hypothesis does not mean that the null
hypothesis is true. The null hypothesis, H 0, is rejected if it is unlikely that the
observed data came from a population in which H 0 is true.
We can make two types of mistakes:
Reject H0 when it is true: Type I error.
Fail to reject H0 when it is false: Type II error.
In general we want to minimize the probability of a Type I error, as its
consequences are usually more serious.
The p-value of a test corresponds to the likelihood of committing a Type I
error if we reject H0.
Sometimes we want to test whether two unknown parameters, 1 and 2,
from two (sub) populations are the same. To do so, we perform a two sample
t-test.
H0: 1 = 2
We have used statistical significance as a way of testing. However, in many
cases we observe statistical significance but with little practical implications.
We should always look at the practical implications of our conclusions.
Qualitative/ Categorical data

Nominal: categories have no natural order (race, gender, country)


Ordinal: there is a natural ordering of the categories (age bracket,
satisfaction level)

Quantitative data

Discrete: countable number of distinct values (age, number of kids)


Continuous: any value within an interval (wage, temperature)

Missing at Random - If data are missing at random, the remaining


observations are still a representative sample of the population. Simplest
Solution: list wise deletion, i.e., delete all observations that do not have
values for all variables in the analysis.
Missing Not at Random - If data are missing not at random, then the
remaining observations are not a representative sample of the population.
No Simple Solution.
Selection bias occurs when the sampling procedure is not random, and thus
the sample is not representative of the population:

Self-selection: some members of the population are more likely to


be included in the sample because of their characteristics e.g.,
participants in a voluntary insurance program;
Attrition: some observations may be less likely to be present in the
sample due to time constraints e.g., tendency to look only at firms
that survive.

Measurement bias occurs when the data collected contains errors that are
non-random:

Recall bias: respondents recall some events more vividly than others
e.g., child deaths by guns vs. swimming pools;
Sensitive questions: respondents may not report data accurately
e.g., wages, health conditions;
Faulty
equipment:
equipment
that
exhibits
systematic
measurement errors e.g., a thermometer that is off by 1 degree
Celsius.

An outlier is an observation situated away or detached from the overall


pattern of a distribution. Outliers can change significantly the descriptive
statistics, especially the sample mean (but not the median). How to detect
outliers? 1. Looking at the maximum and minimum values of a variable, 2.
Boxplot and 3. Histogram. How to deal with outliers? Find why they are
outliers, exclude observations that do not belong to the sample and take
note of the criteria used to exclude outliers.
Some variables follow skewed distributions: the median is smaller than the
average.
If we are able to find a link between two variables while keeping all other
factors equal, we can say that there is a causal relationship between the two
variables.
The simple regression model can be used to study the relationship between
two variables. Such a relationship is assumed to have a constant
component, a linear component and an error term:

y = 0 + 1x + u

y is referred to as the dependent variable or explained variable;


x is referred to as the independent variable or explanatory variable;
u is an error term.

The parameters 1 and 0 represent the slope and intercept of the relation
between x and y, respectively.
n

Total sum of squares (SST)

SST ( y i )

i=1

Explained sum of squares (SSE)

SSE ( i ) 2
i=1

Residual sum of squares (SSR)

SSR ( i ) = i
i=1

i=1

The total sum of squares corresponds to the sum of the explained sum of
squares and the residual sum of squares: SST = SSE + SSR
Goodness of Fit: the R-squared of a regression corresponds to the fraction of
the sample variation in y that is explained by x
R2 = SSE/SST = 1
SSR/SST
R-squared is a measure of how much of the total variance of the dependent
variable is explained by our model. Regressions with low r-squared can still
be useful. Many models in the social sciences have a low r-squared but are
still informative about the relationship of two variables; a low r-squared just
means that there are other factors affecting our dependent variable.
The main drawback of using simple regression analysis for empirical work is
that it is hard to draw ceteris paribus conclusions about how x affects y.
OLS.1 is often unrealistic (E(u|x) = 0)
Multiple regression analysis is more suitable for ceteris paribus analysis
because it allows to explicitly control for other factors that also affect the
dependent variable.
What happens if we include an irrelevant variable in a multiple regression
analysis?
Expected value does not change -> E(j) = j , E(3) = 0
But variance inflates, which is a bad thing.

You might also like