Simple Linear Regression: From Wikipedia, The Free Encyclopedia
Simple Linear Regression: From Wikipedia, The Free Encyclopedia
Okun’s law in macroeconomics is an example of the simple linear regression. Here the dependent variable (GDP
growth) is presumed to be in a linear relationship with the changes in the unemployment rate.
In statistics, simple linear regression is the least squares estimator of a linear regression
model with a single predictor variable. In other words, simple linear regression fits a straight line
through the set of n points in such a way that makes the sum of squared residuals of the model
(that is, vertical distances between the points of the data set and the fitted line) as small as
possible.
The adjective simple refers to the fact that this regression is one of the simplest in statistics. The
fitted line has the slope equal to the correlation between yand x corrected by the ratio of standard
deviations of these variables. The intercept of the fitted line is such that it passes through the
center of mass (x, y) of the data points.
Other regression methods besides the simple Ordinary least squares (OLS) also exist (see Linear
model). In particular when one wants to do regression “by eye”, people usually tend to draw a
slightly steeper line, closer to the one produced by the total least squares method. This occurs
because it is more natural for one’s mind to consider the orthogonal distances from the
observations to the regression line, rather than the vertical ones as OLS method does.
Contents
[hide]
○ 1.1 Properties
• 2 Confidence intervals
○ 2.1 Normality
assumption
○ 2.2 Asymptotic
assumption
• 3 Numerical example
• 4 See also
which would provide a “best” fit for the data points. Here the “best” will be understood as in
the least-squares approach: such a line that minimizes the sum of squared residuals of the
linear regression model. In other words, numbers α and β solve the following minimization
problem:
Find , where
Using simple calculus it can be shown that the values of α and β that minimize the
objective function Q are
where rxy is the sample correlation coefficient between x and y, sx is the standard
deviation of x, and sy is correspondingly the standard deviation of y. Horizontal
bar over a variable means the sample average of that variable. For
example:
yields
This shows the role rxy plays in the regression line of standardized
data points.
1. The line goes through the “center of mass” point (x, y).
includes a constant:
zero:
where
estimator
Using this t-statistic we can construct a confidence interval
for β:
where
[edit]Asymptotic
assumption
The alternative second
assumption states that when the
number of points in the dataset
is “large enough”, the Law of
large numbers and the Central
limit theorem become
applicable, and then the
distribution of the estimators is
approximately normal. Under
this assumption all formulas
derived in the previous section
remain valid, with the only
exception that the
quantile t*n−2 of student-
t distribution is replaced with the
quantile q* of the standard
normal distribution. Occasionally
the fraction 1⁄(n−2) is replaced
with 1⁄n. When n is large such
change does not alter the
results considerably.
[edit]Numerical
example
As an example we shall conside
the data set from the Ordinary
least squares article. This data
set gives average weights for
humans as a function of their
height in the population of
American women of age 30–39.
Although the OLS article argues
that it would be more
appropriate to run a quadratic
regression for this data, we will
not do so and fit the simple
linear regression instead.
x Heigh
1.47 1.50 1.52 1.55 1.57 1.60 1.63 1.65 1.68 1.70 1.73 1.75 1.78 1.80 1.83
i t (m)
Weig
y 52.2 53.1 54.4 55.8 57.2 58.5 59.9 61.2 63.1 64.4 66.2 68.1 69.9 72.1 74.4
ht
i 1 2 8 4 0 7 3 9 1 7 8 0 2 9 6
(kg)
The 0.975
quantile of
Student’s t-
distribution with
13 degrees of
freedom is
t*13 = 2.1604,
and thus
confidence
intervals
for α and β are
[edit]See
also
OLS/P
roofs
—
deriva
tion of
all
formul
as
used
in this
article
in
gener
al
multidi
mensi
onal
case
Linear
model
—
altern
ative
regres
sion
metho
ds
which
can
be
applie
d in
this
contex
t
Demin
g
regres
sion
—
orthog
onal
linear
regres
sion