Sampling Distribution and SE
Sampling Distribution and SE
The standard error is nothing but the standard deviation of the sampling distribution of the
estimator, and the sampling distribution of an estimator is simply a probability or frequency
distribution
of the estimator, that is, a distribution of the set of values of the estimator obtained from all
possible
samples of the same size from a given population. Sampling distributions are used to draw
inferences
about the values of the population parameters on the basis of the values of the estimators
calculated. Explain this explicitly
from one or more samples.
Let’s break this down step by step to explain sampling distributions and the role of the standard error
explicitly.
A population includes all individuals or items of interest in a study. For example, all people in a
city.
A sample is a subset of the population, selected for study. For example, 100 people randomly
chosen from the city.
2. What is an Estimator?
An estimator is a statistic (a function of the sample data) used to infer a population parameter.
Examples include:
If you repeatedly take many samples of the same size from a population and compute an estimator
(e.g., sample mean or proportion) for each sample, these values will vary. The distribution of these
values is called the sampling distribution of the estimator.
For instance:
Suppose the population mean (μ) is 50, and you take samples of size 10. Each sample will have a
ˉ), but if you calculate x
different sample mean (x ˉ for all possible samples, their distribution forms
the sampling distribution of xˉ.
4. Standard Error
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 1/9
The standard error is the standard deviation of the sampling distribution of an estimator. It quantifies
how much an estimator (e.g., the sample mean) is expected to vary from sample to sample.
Mathematically:
σ
Standard Error of the Mean (SE) =
n
Where:
The formula shows that the standard error decreases as the sample size (n) increases, meaning larger
samples give more stable estimators.
One sample: Based on one sample, we compute an estimator (e.g., the sample mean) and
ˉ is close to μ, we
compare it to the expected behavior of the sampling distribution. For example, if x
infer the sample is representative.
In both cases, the standard error provides insight into the reliability of our estimators by reflecting how
much we expect them to deviate from the true population parameter.
The Standard Error of Estimate (SE), also known as the Standard Error of the Regression (se ),
measures the average distance that the observed data points fall from the regression line in a
regression analysis. It provides a sense of how well the regression line represents the data.
n
∑i=1 (yi − y^i )2
se =
n−2
Where:
y^i : Predicted values of the dependent variable from the regression equation.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 2/9
n: Number of observations (data points).
n − 2: Degrees of freedom in simple linear regression (due to the estimation of two parameters:
slope and intercept).
^i )2 represents the squared residuals, which are the squared differences between
The term (yi − y
^i ).
observed values (yi ) and predicted values (y
2
∑ni=1 (yi − y^i ) is the Residual Sum of Squares (RSS), which quantifies the total deviation of
3. Interpretation of se
A smaller se indicates that the data points are closer to the regression line, meaning the model has
For example:
If se = 2, on average, the observed data points deviate by about 2 units from the regression line.
The R2 value in regression quantifies the proportion of variation in the dependent variable explained by
the independent variable(s). While R2 shows how well the model explains variability, se gives the
If R2 is high, se tends to be low, indicating a good fit. Conversely, a low R2 often correlates with a higher
se .
It helps in constructing prediction intervals, indicating the range within which the dependent
variable is likely to fall for a given independent variable.
It assists in hypothesis testing, such as testing the significance of regression coefficients.
In summary, se is a key metric in regression analysis for assessing how well the regression model fits the
Earlier we noted that, given Xi , σ2 represents the (conditional) variance of both ui and
Yi . Therefore, the standard error of the estimate can also be called the (conditional)
standard deviation of ui and Yi . Of course, as usual, σ2
Y and σY represent, respectively, the
unconditional variance and unconditional standard deviation of Y.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 3/9
Conditional and Unconditional Variance in Regression
1. σ 2 is the conditional variance of the error term (ui ) and the dependent variable (Yi ) given the
2. σY2 is the unconditional variance of the dependent variable (Y ), which is not conditional on any
specific values of X .
When we say σ 2 represents the conditional variance of ui and Yi , it refers to the variance of these
Y i = β 0 + β 1 Xi + u i
Here:
ui is the error term (random deviation of actual Y from the regression line).
This means that, for a fixed Xi , the variability of Yi around the regression line (or equivalently the
variability of ui ) is captured by σ 2 , and its square root σ is the conditional standard deviation.
The unconditional variance (σY2 ) is the variance of Y across all possible values of X , without
Unconditionally:
Var(Y ) = σY2
Here:
The σ 2 represents the portion of variance in Y not explained by X (i.e., the residual variance).
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 4/9
Thus, σ 2 captures the unexplained variability in Y , while σY2 is the total variability of Y .
The standard error of the estimate (se ) provides an estimate of σ , the conditional standard
deviation of ui and Yi .
In practice, se is used because the true σ is typically unknown and must be estimated from sample
data.
The conditional variance σ 2 is typically smaller than the unconditional variance σY2 , as it reflects the
σ 2 ).
In summary:
Conditional variance (σ 2 ) focuses on variability around the regression line (or conditional on X ).
Unconditional variance (σY2 ) captures the overall variability of Y in the population, regardless of
X.
As n increases, the precision with which β2 can be estimated also increases. (Why?)
The precision with which β2 (or any regression coefficient) can be estimated increases as the sample size
(n) increases because of the following reasons:
σ2
Var(β^2 ) = n ˉ )2
∑i=1 (Xi − X
Where:
Xˉ : Mean of X .
As n increases:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 5/9
n
1. The sum ∑i=1 (Xi
ˉ )2 grows larger because more data points provide more information about
−X
the variability of X .
^2 ).
2. The denominator of the variance formula becomes larger, reducing Var(β
^2 is more precise.
Lower variance means the estimator β
The Law of Large Numbers ensures that as n increases, sample statistics (e.g., sample mean and
sample variance) converge to their population counterparts. This reduces the variability in the estimates
of regression coefficients, leading to greater precision.
^2 becomes more
The Central Limit Theorem implies that as n increases, the sampling distribution of β
tightly concentrated around the true value β2 . This makes the estimate more reliable and reduces the
standard error.
^2 is given by:
The standard error of β
SE(β^2 ) =
Var(β^2 )
A larger sample size provides more information about the relationship between X and Y , reducing the
influence of random noise or outliers in the data. With more observations, the regression line becomes a
better representation of the underlying true relationship.
^2 ).
As n increases, the confidence intervals for β2 become narrower due to the reduction in SE(β
Summary
In practical terms, larger samples provide more robust and reliable estimates, minimizing the impact of
variability and randomness.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 6/9
how are the OLS estimators BLUE?
The Ordinary Least Squares (OLS) estimators are considered BLUE, which stands for Best Linear
Unbiased Estimators, under the assumptions of the Classical Linear Regression Model (CLRM). Let's
break this down step by step to understand why OLS estimators have this property.
1. Best: The OLS estimators have the smallest variance among all linear and unbiased estimators
(they are efficient).
2. Linear: The OLS estimators are linear functions of the dependent variable (Yi ).
3. Unbiased: The expected value of the OLS estimators equals the true population parameters,
meaning they are correct on average.
The Gauss-Markov Theorem guarantees that OLS estimators are BLUE if the following assumptions
hold:
1. Linear Model: The relationship between the dependent variable (Y ) and independent variables (
X1 , X2 , … , Xk ) is linear:
2. Exogeneity (Unbiasedness Assumption): The expected value of the error term (ui ) is zero, and it is
E(ui ) = 0,
E(ui Xji ) = 0 ∀j
3. Homoscedasticity: The variance of the error term is constant across all observations:
Var(ui ) = σ 2 ∀i
Cov(ui , uj ) = 0 for i =
j
5. Full Rank (No Perfect Multicollinearity): The independent variables are not perfectly correlated,
ensuring the regression coefficients can be uniquely estimated.
6. Normality (optional for small samples): While not required for the Gauss-Markov Theorem,
normality of ui ensures hypothesis testing and inference are valid.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 7/9
1. Linearity
β^ = (X ′ X)−1 X ′ Y
2. Unbiasedness
E(β^) = β
This means the OLS estimators correctly estimate the true population parameters on average.
3. Efficiency (Best)
Among all linear and unbiased estimators, OLS estimators have the smallest variance. The variance
^ is given by:
of β
Var(β^) = σ 2 (X ′ X)−1
If the OLS assumptions hold, no other linear and unbiased estimator can achieve a smaller
variance than this.
Gauss-Markov Theorem
Under assumptions 1–5 of the CLRM, the OLS estimators are the Best Linear Unbiased
Estimators (BLUE).
This means:
2. Among all linear unbiased estimators, they have the minimum variance.
Limitations
1. If the assumptions are violated (e.g., heteroscedasticity, autocorrelation, or endogeneity), the OLS
estimators may no longer be BLUE.
2. For small samples, normality of errors may be needed to ensure valid statistical inference.
Summary
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 8/9
OLS estimators are BLUE because they are linear, unbiased, and efficient (minimum variance) under the
Gauss-Markov assumptions. This makes OLS a robust and widely-used method for estimating regression
coefficients in linear models.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 9/9