0% found this document useful (0 votes)
9 views9 pages

Sampling Distribution and SE

Uploaded by

adrish22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views9 pages

Sampling Distribution and SE

Uploaded by

adrish22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Sampling Distribution and SE

The standard error is nothing but the standard deviation of the sampling distribution of the
estimator, and the sampling distribution of an estimator is simply a probability or frequency
distribution
of the estimator, that is, a distribution of the set of values of the estimator obtained from all
possible
samples of the same size from a given population. Sampling distributions are used to draw
inferences
about the values of the population parameters on the basis of the values of the estimators
calculated. Explain this explicitly
from one or more samples.

Let’s break this down step by step to explain sampling distributions and the role of the standard error
explicitly.

1. Population vs. Sample

A population includes all individuals or items of interest in a study. For example, all people in a
city.
A sample is a subset of the population, selected for study. For example, 100 people randomly
chosen from the city.

2. What is an Estimator?

An estimator is a statistic (a function of the sample data) used to infer a population parameter.
Examples include:

ˉ) as an estimator of the population mean (μ).


The sample mean (x
The sample proportion (p) as an estimator of the population proportion.

3. Sampling Distribution of an Estimator

If you repeatedly take many samples of the same size from a population and compute an estimator
(e.g., sample mean or proportion) for each sample, these values will vary. The distribution of these
values is called the sampling distribution of the estimator.
For instance:

Suppose the population mean (μ) is 50, and you take samples of size 10. Each sample will have a
ˉ), but if you calculate x
different sample mean (x ˉ for all possible samples, their distribution forms
the sampling distribution of xˉ.

4. Standard Error

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 1/9
The standard error is the standard deviation of the sampling distribution of an estimator. It quantifies
how much an estimator (e.g., the sample mean) is expected to vary from sample to sample.
Mathematically:
σ
Standard Error of the Mean (SE) = ​

n ​

Where:

σ : Population standard deviation


n: Sample size

The formula shows that the standard error decreases as the sample size (n) increases, meaning larger
samples give more stable estimators.

5. Explicit Inference from One or More Samples

When we take one or more samples from a population:

One sample: Based on one sample, we compute an estimator (e.g., the sample mean) and
ˉ is close to μ, we
compare it to the expected behavior of the sampling distribution. For example, if x
infer the sample is representative.

More samples: By comparing multiple sample means (x ˉ1 , x


ˉ2 , …), we can assess variability and

make stronger inferences about μ using the sampling distribution.

In both cases, the standard error provides insight into the reliability of our estimators by reflecting how
much we expect them to deviate from the true population parameter.

Standard error of estimate or the standard error of the regression (se).

The Standard Error of Estimate (SE), also known as the Standard Error of the Regression (se ), ​

measures the average distance that the observed data points fall from the regression line in a
regression analysis. It provides a sense of how well the regression line represents the data.

1. Formula for Standard Error of Estimate

For a simple linear regression, the formula is:

n
∑i=1 (yi − y^i )2
se =
​ ​ ​ ​

n−2
​ ​ ​

Where:

yi : Observed values of the dependent variable.


y^i : Predicted values of the dependent variable from the regression equation.
​ ​

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 2/9
n: Number of observations (data points).
n − 2: Degrees of freedom in simple linear regression (due to the estimation of two parameters:
slope and intercept).

2. Components of the Formula

^i )2 represents the squared residuals, which are the squared differences between
The term (yi − y ​ ​ ​

^i ).
observed values (yi ) and predicted values (y ​ ​ ​

2
∑ni=1 (yi ​ ​ − y^i ) is the Residual Sum of Squares (RSS), which quantifies the total deviation of
​ ​

observed values from the predicted values.


Dividing RSS by n − 2 adjusts for the loss of degrees of freedom due to estimating parameters (the
slope and intercept).

3. Interpretation of se ​

A smaller se indicates that the data points are closer to the regression line, meaning the model has

better predictive accuracy.


A larger se suggests that the observed values deviate more from the regression line, indicating the

model has poorer predictive accuracy.

For example:

If se ​ = 2, on average, the observed data points deviate by about 2 units from the regression line.

4. Relationship to R2 (Coefficient of Determination)

The R2 value in regression quantifies the proportion of variation in the dependent variable explained by
the independent variable(s). While R2 shows how well the model explains variability, se gives the ​

absolute measure of prediction error.

If R2 is high, se tends to be low, indicating a good fit. Conversely, a low R2 often correlates with a higher

se .

5. Use of Standard Error of Regression

It helps in constructing prediction intervals, indicating the range within which the dependent
variable is likely to fall for a given independent variable.
It assists in hypothesis testing, such as testing the significance of regression coefficients.

In summary, se is a key metric in regression analysis for assessing how well the regression model fits the

data by measuring the average prediction error.

Earlier we noted that, given Xi , σ2 represents the (conditional) variance of both ui and
Yi . Therefore, the standard error of the estimate can also be called the (conditional)
standard deviation of ui and Yi . Of course, as usual, σ2
Y and σY represent, respectively, the
unconditional variance and unconditional standard deviation of Y.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 3/9
Conditional and Unconditional Variance in Regression

In the context of regression analysis:

1. σ 2 is the conditional variance of the error term (ui ) and the dependent variable (Yi ) given the
​ ​

independent variable (Xi ). ​

2. σY2 is the unconditional variance of the dependent variable (Y ), which is not conditional on any

specific values of X .

Let’s elaborate on these points:

1. Conditional Variance and Standard Deviation (σ 2 and σ )

When we say σ 2 represents the conditional variance of ui and Yi , it refers to the variance of these
​ ​

variables after considering the effect of the independent variable (Xi ). ​

In a linear regression model:

Y i = β 0 + β 1 Xi + u i
​ ​ ​ ​

Here:

ui is the error term (random deviation of actual Y from the regression line).

Yi is the dependent variable.


Conditional variance is:

Var(Yi ∣Xi ) = Var(ui ∣Xi ) = σ 2


​ ​ ​ ​

This means that, for a fixed Xi , the variability of Yi around the regression line (or equivalently the
​ ​

variability of ui ) is captured by σ 2 , and its square root σ is the conditional standard deviation.

2. Unconditional Variance and Standard Deviation (σY2 and σY ) ​ ​

The unconditional variance (σY2 ) is the variance of Y across all possible values of X , without

considering any specific X .

Unconditionally:

Var(Y ) = σY2 ​

In regression, the unconditional variance of Y can be decomposed into:

Var(Y ) = β12 ⋅ Var(X) + σ 2


Here:

The term β12 ⋅ Var(X) is the portion of variance in Y explained by X .


The σ 2 represents the portion of variance in Y not explained by X (i.e., the residual variance).

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 4/9
Thus, σ 2 captures the unexplained variability in Y , while σY2 is the total variability of Y .

3. Standard Error of the Estimate as Conditional Standard Deviation

The standard error of the estimate (se ) provides an estimate of σ , the conditional standard

deviation of ui and Yi .
​ ​

In practice, se is used because the true σ is typically unknown and must be estimated from sample

data.

4. Relationship Between Conditional and Unconditional Measures

The conditional variance σ 2 is typically smaller than the unconditional variance σY2 , as it reflects the

variability in Y after accounting for the effect of X .


σY2 includes both the variability explained by the regression model and the residual variability (i.e.,

σ 2 ).

In summary:

Conditional variance (σ 2 ) focuses on variability around the regression line (or conditional on X ).
Unconditional variance (σY2 ) captures the overall variability of Y in the population, regardless of

X.

As n increases, the precision with which β2 can be estimated also increases. (Why?)

The precision with which β2 (or any regression coefficient) can be estimated increases as the sample size
(n) increases because of the following reasons:

1. Variance of the Estimator Decreases with n

^2 ) is inversely proportional to the sample size


In regression analysis, the variance of an estimator (e.g., β ​ ​

n. Specifically, for simple linear regression:

σ2
Var(β^2 ) = n ˉ )2
∑i=1 (Xi − X
​ ​ ​

​ ​

Where:

σ 2 : Variance of the error term (ui ).


Xi : Values of the independent variable.


Xˉ : Mean of X .

As n increases:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 5/9
n
1. The sum ∑i=1 (Xi ​ ​
ˉ )2 grows larger because more data points provide more information about
−X
the variability of X .
^2 ).
2. The denominator of the variance formula becomes larger, reducing Var(β ​ ​

^2 is more precise.
Lower variance means the estimator β ​ ​

2. Law of Large Numbers

The Law of Large Numbers ensures that as n increases, sample statistics (e.g., sample mean and
sample variance) converge to their population counterparts. This reduces the variability in the estimates
of regression coefficients, leading to greater precision.

3. Central Limit Theorem

^2 becomes more
The Central Limit Theorem implies that as n increases, the sampling distribution of β ​ ​

tightly concentrated around the true value β2 . This makes the estimate more reliable and reduces the

standard error.
^2 is given by:
The standard error of β ​ ​

SE(β^2 ) = ​ ​
Var(β^2 )
​ ​ ​

^2 ) decreases, indicating increased precision.


As n increases, SE(β ​ ​

4. More Data Reduces Noise

A larger sample size provides more information about the relationship between X and Y , reducing the
influence of random noise or outliers in the data. With more observations, the regression line becomes a
better representation of the underlying true relationship.

5. Confidence Intervals Narrow


^2 ).
As n increases, the confidence intervals for β2 become narrower due to the reduction in SE(β ​ ​

Narrower confidence intervals indicate more precise estimates of β2 . ​

Summary

The precision of β2 increases with n because:


1. The variance of the estimator decreases.


2. Sample statistics converge to population parameters.
3. The sampling distribution becomes more concentrated around the true value.
4. Random noise has less influence with more data points.
5. Confidence intervals become narrower, reflecting reduced uncertainty.

In practical terms, larger samples provide more robust and reliable estimates, minimizing the impact of
variability and randomness.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 6/9
how are the OLS estimators BLUE?

The Ordinary Least Squares (OLS) estimators are considered BLUE, which stands for Best Linear
Unbiased Estimators, under the assumptions of the Classical Linear Regression Model (CLRM). Let's
break this down step by step to understand why OLS estimators have this property.

Key Points in BLUE

1. Best: The OLS estimators have the smallest variance among all linear and unbiased estimators
(they are efficient).
2. Linear: The OLS estimators are linear functions of the dependent variable (Yi ). ​

3. Unbiased: The expected value of the OLS estimators equals the true population parameters,
meaning they are correct on average.

OLS Assumptions Ensuring BLUE Property

The Gauss-Markov Theorem guarantees that OLS estimators are BLUE if the following assumptions
hold:

1. Linear Model: The relationship between the dependent variable (Y ) and independent variables (
X1 , X2 , … , Xk ) is linear:
​ ​ ​

Yi = β0 + β1 X1i + β2 X2i + ⋯ + βk Xki + ui


​ ​ ​ ​ ​ ​ ​ ​ ​

2. Exogeneity (Unbiasedness Assumption): The expected value of the error term (ui ) is zero, and it is ​

uncorrelated with the independent variables:

E(ui ) = 0,
​ E(ui Xji ) = 0 ∀j
​ ​

3. Homoscedasticity: The variance of the error term is constant across all observations:

Var(ui ) = σ 2​ ∀i

4. No Autocorrelation: The error terms are uncorrelated across observations:

Cov(ui , uj ) = 0 for i =
​ ​ j

5. Full Rank (No Perfect Multicollinearity): The independent variables are not perfectly correlated,
ensuring the regression coefficients can be uniquely estimated.

6. Normality (optional for small samples): While not required for the Gauss-Markov Theorem,
normality of ui ensures hypothesis testing and inference are valid.

Why OLS Estimators are BLUE

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 7/9
1. Linearity

^j ) are linear combinations of the dependent variable Y , derived by


The OLS estimators (β ​

minimizing the sum of squared residuals:

β^ = (X ′ X)−1 X ′ Y

This is a linear function of Y , ensuring the linearity condition.

2. Unbiasedness

Under the assumption E(ui ) ​


= 0 and X being exogenous:

E(β^) = β

This means the OLS estimators correctly estimate the true population parameters on average.

3. Efficiency (Best)

Among all linear and unbiased estimators, OLS estimators have the smallest variance. The variance
^ is given by:
of β ​

Var(β^) = σ 2 (X ′ X)−1

If the OLS assumptions hold, no other linear and unbiased estimator can achieve a smaller
variance than this.

Gauss-Markov Theorem

The Gauss-Markov Theorem states:

Under assumptions 1–5 of the CLRM, the OLS estimators are the Best Linear Unbiased
Estimators (BLUE).

This means:

1. The OLS estimators are unbiased (E(β^) = β ).


2. Among all linear unbiased estimators, they have the minimum variance.

Limitations

OLS estimators are BLUE under the Gauss-Markov assumptions. However:

1. If the assumptions are violated (e.g., heteroscedasticity, autocorrelation, or endogeneity), the OLS
estimators may no longer be BLUE.
2. For small samples, normality of errors may be needed to ensure valid statistical inference.

Summary

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 8/9
OLS estimators are BLUE because they are linear, unbiased, and efficient (minimum variance) under the
Gauss-Markov assumptions. This makes OLS a robust and widely-used method for estimating regression
coefficients in linear models.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 9/9

You might also like