0% found this document useful (0 votes)

51 views

Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation

This document provides an overview of simple linear regression models and the estimation of their parameters. It discusses two approaches: 1) "plug-in" estimates that estimate parameters based on sample statistics approximating population parameters; and 2) least squares estimates that minimize the mean squared error between the model and data. It shows that the least squares estimates are equivalent to the plug-in estimates for simple linear regression. It then derives the bias and variance of the slope parameter estimate β1 and shows it is an unbiased estimator of the true slope.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views

Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Lecture 4: Simple Linear Regression Models, with

Hints at Their Estimation

36-401, Fall 2017, Section B

1 The Simple Linear Regression Model

Let’s recall the simple linear regression model from last time. This is a statistical
model with two variables X and Y , where we try to predict Y from X. The
assumptions of the model are as follows:
1. The distribution of X is arbitrary (and perhaps X is even non-random).

2. If X = x, then Y = β0 + β1 x + , for some constants (“coefficients”,

“parameters”) β0 and β1 , and some random noise variable .
3. E [|X = x] = 0 (no matter what x is), Var [|X = x] = σ 2 (no matter
what x is).

4. is uncorrelated across observations.

To elaborate, with multiple data points, (X1 , Y1 ), (X2 , Y2 ), . . . (Xn , Yn ), then
the model says that, for each i ∈ 1 : n,

Yi = β0 + β1 Xi + i (1)

where the noise variables i all have the same expectation (0) and the same
variance (σ 2 ), and Cov [i , j ] = 0 (unless i = j, of course).

1.1 “Plug-In” Estimates

In lecture 1, we saw that the optimal linear predictor of Y from X has slope
β1 = Cov [X, Y ] /Var [X], and intercept β0 = E [Y ] − β1 E [X]. A common tactic
in devising estimators is to use what’s sometimes called the “plug-in principle”,
where we find equations for the parameters which would hold if we knew the
full distribution, and “plug in” the sample versions of the population quantities.
We saw this in the last lecture, where we estimated β1 by the ratio of the sample
covariance to the sample variance:

c1 = cXY
β (2)
s2X

1
We also saw, in the notes to the last lecture, that so long as the law of large
numbers holds,
c1 → β1
β (3)
as n → ∞. It follows easily that
c0 = Y − β
β c1 X (4)
will also converge on β0 .

1.2 Least Squares Estimates

An alternative way of estimating the simple linear regression model starts from
the objective we are trying to reach, rather than from the formula for the slope.
Recall, from lecture 1, that the true optimal slope and intercept are the ones
which minimize the mean squared error:
(β0 , β1 ) = argmin E (Y − (b0 + b1 X))2

(5)
(b0 ,b1 )

This is a function of the complete distribution, so we can’t get it from data,

but we can approximate it with data. The in-sample, empirical or training
MSE is
n
1X
M\ SE(b0 , b1 ) ≡ (yi − (b0 + b1 xi ))2 (6)
n i=1
Notice that this is a function of b0 and b1 ; it is also, of course, a function of
the data, (x1 , y1 ), (x2 , y2 ), . . . (xn , yn ), but we will generally suppress that in our
notation.
If our samples are all independent, for any fixed (b0 , b1 ), the law of large
numbers tells us that M \ SE(b0 , b1 ) → M SE(b0 , b1 ) as n → ∞. So it doesn’t
seem unreasonable to try minimizing the in-sample error, which we can compute,
as a proxy for minimizing the true MSE, which we can’t. Where does it lead
us?
Start by taking the derivatives with respect to the slope and the intercept:
n
∂M
\ SE 1X
= (yi − (b0 + b1 xi ))(−2) (7)
∂b0 n i=1
n
∂M
\ SE 1X
= (yi − (b0 + b1 xi ))(−2xi ) (8)
∂b0 n i=1

Set these to zero at the optimum (β̂0 , β̂1 ):

n
1X
(yi − (β̂0 + β̂1 xi )) = 0 (9)
n i=1
n
1X
(yi − (β̂0 + β̂1 xi ))(xi ) = 0
n i=1

2
These are often called the normal equations for least-squares estimation, or
the estimating equations: a system of two equations in two unknowns, whose
solution gives the estimate. Many people would, at this point, remove the factor
of 1/n, but I think it makes it easier to understand the next steps:

y − β̂0 − β̂1 x = 0 (10)

xy − β̂0 x − β̂1 x2 = 0 (11)

The first equation, re-written, gives

β̂0 = y − β̂1 x (12)

Substituting this into the remaining equation,

0 = xy − ȳx̄ + β̂1 x̄x̄ − β̂1 x2 (13)

0 = cXY − β̂1 s2X (14)
cXY
β̂1 = (15)
s2X

That is, the least-squares estimate of the slope is our old friend the plug-in
estimate of the slope, and thus the least-squares intercept is also the plug-in
intercept.

Going forward The equivalence between the plug-in estimator and the least-
squares estimator is a bit of a special case for linear models. In some non-linear
models, least squares is quite feasible (though the optimum can only be found
numerically, not in closed form); in others, plug-in estimates are more useful
than optimization.

1.3 Bias, Variance and Standard Error of Parameter Es-

timates
Whether we think of it as deriving from pluging-in or from least squares, we
work out some of the properties of this estimator of the coefficients, using the
model assumptions. We’ll start with the slope, β̂1 .

3
cXY
β̂1 = (16)
s2X
1
Pn
n i=1 xi yi − x̄ȳ
= (17)
s2X
1
Pn
n xi (β0 + β1 xi + 1 ) − x̄(β0 + β1 x̄ + ¯)
i=1
= (18)
s2X
n
β0 x̄ + β1 x2 + n1 i=1 xi i − x̄β0 − β1 x̄2 − x̄¯
P

= 2
(19)
sx
Pn
β1 s2X + n1 i=1 xi i − x̄¯
= 2 (20)
s
1
Pn X
xi i − x̄¯

= β1 + n i=1 2 (21)
sX

= n−1 i x̄i ,
P
Since x̄¯
1
Pn
(xi − x̄)i
βˆ1 = β1 + n i=1 2 (22)
sX

This representation of the slope estimate shows that it is equal to the true
slope (β1 ) plus something which depends on the noise terms (the i , and their
sample average ¯). We’ll use this to find the expected value and the variance of
the estimator β̂1 .
In the next couple of paragraphs, I am going to treat the xi as non-random
variables. This is appropriate in “designed” or “controlled” experiments, where
we get to chose their value. In randomized experiments or in observational stud-
ies, obviously the xi aren’t necessarily fixed;h however, these
i expressions will be
correct for the conditional expectation E β̂1 |x1 , . . . xn and conditional vari-
h i
ance Var β̂1 |x1 , . . . xn , and I will come back to how we get the unconditional
expectation and variance.

Expected value and bias Recall that E [i |Xi ] = 0, so

n
1X
(xi − x̄)E [i ] = 0 (23)
n i=1

Thus,
h i
E β̂1 = β1 (24)

Since the bias of an estimator is the difference between its expected value
and the truth, β̂1 is an unbiased estimator of the optimal slope.

4
(To repeat what I’m sure you remember from mathematical
h statistics:
i “bias”
here is a technical term, meaning no more and no less than E β̂1 −β1 . An unbi-
ased estimator could still make systematic mistakes — for instance, it could un-
derestimate 99% of the time, provided that the 1% of the time it over-estimates,
it does so by much more than it under-estimates. Moreover, unbiased estimators
are not necessarily superior to biased ones: the total error depends on both the
bias of the estimator and its variance, and there are many situations where you
can remove lots of bias at the cost of adding a little variance. Least squares
for simple linear regression happens not to be one of them, but you shouldn’t
expect that as a general rule.)
Turning to the intercept,
h i h i
E β̂0 = E Y − β̂1 X (25)
h i
= β0 + β1 X − E β̂1 X (26)
= β0 + β1 X − β1 X (27)
= β0 (28)
so it, too, is unbiased.

Variance and Standard Error Using the formula for the variance of a sum
from lecture 1, and the model assumption that all the i are uncorrelated with
each other,
1
Pn
i=1 (xi − x̄)i
h i
n
Var β̂1 = Var β1 + (29)
s2X
1 Pn
i=1 (xi − x̄)i

n
= Var (30)
s2X
1
P n 2
2 i=1 (xi − x̄) Var [i ]
= n (31)
(s2X )2
σ2 2
n sX
= (32)
(s2X )2
2
σ
= (33)
ns2X
In words, this says that the variance of the slope estimate goes up as the
noise around the regression line (σ 2 ) gets bigger, and goes down as we have
more observations (n), which are further spread out along the horizontal axis
(s2X ); it should not be surprising that it’s easier to work out the slope of a line
from many, well-separated points on the line than from a few points smushed
together.
The standard error of an estimator is just its standard deviation, or the
square root of its variance:
σ
se(β̂1 ) = p 2 (34)
nsX

5
I will leave working out the variance of β̂0 as an exercise.

Unconditional-on-X Properties The last few paragraphs, as I said, have

looked at the expectation and variance of β̂1 conditional on x1 , . . . xn , either
because the x’s really are non-random (e.g., controlled by us), or because we’re
just interested in conditional inference.
h i If we do care
h iabout unconditional
h prop-i
erties, then we still need to find E β̂1 and Var β̂1 , not just E β̂1 |x1 , . . . xn
h i
and Var β̂1 |x1 , . . . xn . Fortunately, this is easy, so long as the simple linear
regression model holds.
To get the unconditional expectation, we use the “law of total expectation”:
h i h h ii
E β̂1 = E E β̂1 |X1 , . . . Xn (35)
= E [β1 ] = β1 (36)

That is, the estimator is unconditionally unbiased.

To get the unconditional variance, we use the “law of total variance”:
h i h h ii h h ii
Var βˆ1 = E Var βˆ1 |X1 , . . . Xn + Var E β̂1 |X1 , . . . Xn (37)
2
σ
= E + Var [β1 ] (38)
ns2X
σ2

1
= E 2 (39)
n sX

1.4 Parameter Interpretation; Causality

Two of the parameters are easy to interpret.
σ 2 is the variance of the noise around the regression line; σ is a typical distance
of a point from the line. (“Typical” here in a special sense, it’s the root-
mean-squared distance, rather than, say, the average absolute distance.)
β0 is the simply the expected value of Y when X is 0, E [Y |X = 0]. The point
X = 0 usually has no special significance, but this setting does ensure
that the line goes through the point (E [X] , E [Y ]).
The interpretation of the slope is both very straightforward and very tricky.
Mathematically, it’s easy to convince yourself that, for any x

β1 = E [Y |X = x] − E [Y |X = x − 1] (40)

or, for any x1 , x2 ,

E [Y |X = x2 ] − E [Y |X = x1 ]
β1 = (41)
x2 − x1
This is just saying that the slope of a line is “rise/run”.

6
The tricky part is that we have a very strong, natural tendency to interpret
this as telling us something about causation — “If we change X by 1, then
on average Y will change by β1 ”. This interpretation is usually completely
unsupported by the analysis. If I use an old-fashioned mercury thermometer,
the height of mercury in the tube usually has a nice linear relationship with the
temperature of the room the thermometer is in. This linear relationship goes
both ways, so we could regress temperature (Y ) on mercury height (X). But if
I manipulate the height of the mercury (say, by changing the ambient pressure,
or shining a laser into the tube, etc.), changing the height X will not, in fact,
change the temperature outside.
The right way to interpret β1 is not as the result of a change, but as an
expected difference. The correct catch-phrase would be something like “If we
select two sets of cases from the un-manipulated distribution where X differs
by 1, we expect Y to differ by β1 .” This covers the thermometer example, and
every other I can think of. It is, I admit, much more inelegant than “If X
changes by 1, Y changes by β1 on average”, but it has the advantage of being
true, which the other does not.
There are circumstances where regression can be a useful part of causal
inference, but we will need a lot more tools to grasp them; that will come
towards the end of 402.

2 The Gaussian-Noise Simple Linear Regression

Model
We have, so far, assumed comparatively little about the noise term . The
advantage of this is that our conclusions apply to lots of different situations;
the drawback is that there’s really not all that much more to say about our
estimator βb or our predictions than we’ve already gone over. If we made more
detailed assumptions about , we could make more precise inferences.
There are lots of forms of distributions for which we might contemplate,
and which are compatible with the assumptions of the simple linear regression
model (Figure 1). The one which has become the most common over the last
two centuries is to assume follows a Gaussian distribution.
The result is the Gaussian-noise simple linear regression model1 :
1. The distribution of X is arbitrary (and perhaps X is even non-random).

2. If X = x, then Y = β0 + β1 x + , for some constants (“coefficients”,

“parameters”) β0 and β1 , and some random noise variable .
3. ∼ N (0, σ 2 ), independent of X.
1 Our textbook, rather old-fashionedly, calls this the “normal error” model rather than

“Gaussian noise”. I dislike this: “normal” is an over-loaded word in math, while “Gaussian”
is (comparatively) specific; “error” made sense in Gauss’s original context of modeling, specif-
ically, errors of observation, but is misleading generally; and calling Gaussian distributions
“normal” suggests they are much more common than they really are.

7
1.0

1.0

1.0
0.8

0.8

0.8
0.6

0.6

0.6
0.4

0.4

0.4
0.2

0.2

0.2
0.0

0.0

0.0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

ε ε ε
1.0

1.0

1.0
0.8

0.8

0.8
0.6

0.6

0.6
0.4

0.4

0.4
0.2

0.2

0.2
0.0

0.0

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

ε ε ε

par(mfrow=c(2,3))
curve(dnorm(x), from=-3, to=3, xlab=expression(epsilon), ylab="", ylim=c(0,1))
curve(exp(-abs(x))/2, from=-3, to=3, xlab=expression(epsilon), ylab="",
ylim=c(0,1))
curve(sqrt(pmax(0,1-x^2))/(pi/2), from=-3, to=3, xlab=expression(epsilon),
ylab="", ylim=c(0,1))
curve(dt(x,3), from=-3, to=3, xlab=expression(epsilon), ylab="", ylim=c(0,1))
curve(dgamma(x+1.5, shape=1.5, scale=1), from=-3, to=3,
xlab=expression(epsilon), ylab="", ylim=c(0,1))
curve(0.5*dgamma(x+1.5, shape=1.5, scale=1) +
0.5*dgamma(0.5-x, shape=0.5, scale=1), from=-3,
to=3, xlab=expression(epsilon), ylab="", ylim=c(0,1))
par(mfrow=c(1,1))

Figure 1: Some possible noise distributions for the simple linear regression model,
since all have E [] = 0, and could get any variance by scaling. (The model is even
compatible with each observation taking from a different distribution.) From top left
to bottom right: Gaussian; double-exponential (“Laplacian”); “circular” distribution;
t with 3 degrees of freedom; a gamma distribution (shape 1.5, scale 1) shifted to have
8
mean 0; mixture of two gammas with shape 1.5 and shape 0.5, each off-set to have
expectation 0. The first three were all used as error models in the 18th and 19th
centuries. (See the source file for how to get the code below the figure.)
4. is independent across observations.
You will notice that these assumptions are strictly stronger than those of the
simple linear regression model. More exactly, the first two assumptions are the
same, while the third and fourth assumptions of the Gaussian-noise model imply
the corresponding assumptions of the other model. This means that everything
we have done so far directly applies to the Gaussian-noise model. On the other
hand, the stronger assumptions let us say more. They tell us, exactly, the
probability distribution for Y given X, and so will let us get exact distributions
for predictions and for other inferential statistics.

Why the Gaussian noise model? Why should we think that the noise
around the regression line would follow a Gaussian distribution, independent of
X? There are two big reasons.
1. Central limit theorem The noise might be due to adding up the effects
of lots of little random causes, all nearly independent of each other and
of X, where each of the effects are of roughly similar magnitude. Then
the central limit theorem will take over, and the distribution of the sum
of effects will indeed be pretty Gaussian. For Gauss’s original context,
X was (simplifying) “Where is such-and-such-a-planet in space?”, Y was
“Where does an astronomer record the planet as appearing in the sky?”,
and noise came from defects in the telescope, eye-twitches, atmospheric
distortions, etc., etc., so this was pretty reasonable. It is clearly not a
universal truth of nature, however, or even something we should expect
to hold true as a general rule, as the name “normal” suggests.
2. Mathematical convenience Assuming Gaussian noise lets us work out a
very complete theory of inference and prediction for the model, with lots
of closed-form answers to questions like “What is the optimal estimate of
the variance?” or “What is the probability that we’d see a fit this good
from a line with a non-zero intercept if the true line goes through the
origin?”, etc., etc. Answering such questions without the Gaussian-noise
assumption needs somewhat more advanced techniques, and much more
advanced computing; we’ll get to it towards the end of the class.

2.1 Visualizing the Gaussian Noise Model

The Gaussian noise model gives us not just an expected value for Y at each
x, but a whole conditional distribution for Y at each x. To visualize it, then,
it’s not enough to just sketch a curve; we need a three-dimensional surface,
showing, for each combination of x and y, the probability density of Y around
that y given that x. Figure 2 illustrates.

2.2 Maximum Likelihood vs. Least Squares

As you remember from your mathematical statistics class, the likelihood of
a parameter value on a data set is the probability density at the data under

9
4

0
y

−2

−4

−1 0 1 2 3
0.6 0.4 0.2 0.0 x
p(y | x)

x.seq <- seq(from=-1, to=3, length.out=150)

y.seq <- seq(from=-5, to=5, length.out=150)
cond.pdf <- function(x,y) { dnorm(y, mean=10-5*x, sd=0.5) }
z <- outer(x.seq, y.seq, cond.pdf)
persp(x.seq,y.seq,z, ticktype="detailed", phi=75, xlab="x",
ylab="y", zlab=expression(p(y|x)), cex.axis=0.8)

Figure 2: Illustrating how the conditional pdf of Y varies as a function of X, for

a hypothetical Gaussian noise simple linear regression where β0 = 10, β1 = −5, and
σ 2 = (0.5)2 . The perspective is adjusted so that we are looking nearly straight down
from above on the surface. (Can you find a better viewing angle?) See help(persp)
for the 3D plotting (especially the examples), and help(outer) for the outer function,
which takes all combinations of elements from two vectors and pushes them through
a function. How would you modify this so that the regression line went through the
origin with a slope of 4/3 and a standard deviation of 5?

10
those parameters. We could not work with the likelihood with the simple linear
regression model, because it didn’t specify enough about the distribution to let
us calculate a density. With the Gaussian-noise model, however, we can write
down a likelihood2 By the model’s assumptions, if think the parameters are the
parameters are b0 , b1 , s2 (reserving the Greek letters for their true values), then
Y |X = x ∼ N (b0 + b1 x, s2 ), and Yi and Yj are independent given Xi and Xj ,
so the over-all likelihood is
n
Y 1 (yi −(b0 +b1 xi ))2
√ e− 2s2 (42)
i=1 2πs2

As usual, we work with the log-likelihood, which gives us the same information3
but replaces products with sums:
n
n n 1 X
L(b0 , b1 , s2 ) = − log 2π − s− 2 (yi − (b0 + b1 xi ))2 (43)
2 log 2s i=1

We recall from mathematical statistics that when we’ve got a likelihood func-
tion, we generally want to maximize it. That is, we want to find the parameter
values which make the data we observed as likely, as probable, as the model
will allow. (This may not be very likely; that’s another issue.) We recall from
calculus that one way to maximize is to take derivatives and set them to zero.

n
∂L 1 X
= − 2(yi − (b0 + b1 xi ))(−1) (44)
∂b0 2s2 i=1
n
∂L 1 X
= − 2(yi − (b0 + b1 xi ))(−xi ) (45)
∂b1 2s2 i=1

Notice that when we set these derivatives to zero, all the multiplicative
constants — in particular, the prefactor of 2s12 — go away. We are left with
n
X
yi − (β
c0 + β
c1 xi ) = 0 (46)
i=1
n
X
(yi − (β
c0 + β
c1 xi ))xi = 0 (47)
i=1

These are, up to a factor of 1/n, exactly the equations we got from the method
of least squares (Eq. 9). That means that the least squares solution is the
maximum likelihood estimate under the Gaussian noise model; this is no coin-
cidence4 .
2 Strictly speaking, this is a “conditional” (on X) likelihood; but only pedants use the
adjective in this context.
3 Why is this?
4 It’s no coincidence because, to put it somewhat anachronistically, what Gauss did was

ask himself “for what distribution of the noise would least squares maximize the likelihood?”.

11
Now let’s take the derivative with respect to s:
n
∂L n 1 X
= − +2 3 (yi − (b0 + b1 xi ))2 (48)
∂s s 2s i=1

b3 , we get
Setting this to 0 at the optimum, including multiplying through by σ
n
c2 = 1
X
σ (yi − (β c1 xi ))2
c0 + β (49)
n i=1

Notice that the right-hand side is just the in-sample mean squared error.

Other models Maximum likelihood estimates of the regression curve coin-

cide with least-squares estimates when the noise around the curve is additive,
Gaussian, of constant variance, and both independent of X and of other noise
terms. These were all assumptions we used in setting up a log-likelihood which
was, up to constants, proportional to the (negative) mean-squared error. If
any of those assumptions fail, maximum likelihood and least squares estimates
can diverge, though sometimes the MLE solves a “generalized” least squares
problem (as we’ll see later in this course).

Exercises
To think through, not to hand it.

1. Show that if E [|X = x] = 0 for all x, then Cov [X, ] = 0. Would this
still be true if E [|X = x] = a for some other constant a?

2. Find the variance of βˆ0 . Hint: Do you need to worry about covariance
between Y and βˆ1 ?

Case B Notes
No ratings yet
Case B Notes
4 pages
Applied Linear Regression Models 4th Ed Note
No ratings yet
Applied Linear Regression Models 4th Ed Note
46 pages
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
16 pages
CatalogoPGT25 PDF
100% (5)
CatalogoPGT25 PDF
4 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Math644 Chapter 1 Part1
No ratings yet
Math644 Chapter 1 Part1
5 pages
Definition of Simple Linear Regression
No ratings yet
Definition of Simple Linear Regression
9 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
RegEstimationLS_ML_StatColumbia
No ratings yet
RegEstimationLS_ML_StatColumbia
44 pages
3 SimpleLinearRegression
No ratings yet
3 SimpleLinearRegression
30 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Notes2
No ratings yet
Notes2
16 pages
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
No ratings yet
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
8 pages
Introduction To Mathematical Modeling: Simple Linear Regression
No ratings yet
Introduction To Mathematical Modeling: Simple Linear Regression
21 pages
Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness
No ratings yet
Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness
23 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
8. Linear Regression
No ratings yet
8. Linear Regression
29 pages
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
No ratings yet
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
7 pages
Bias-Variance Tradeoffs: 1 Single Sample MLE
No ratings yet
Bias-Variance Tradeoffs: 1 Single Sample MLE
7 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Lec3 ppt2019
No ratings yet
Lec3 ppt2019
18 pages
Chapter3-Goodness of Fit Tests
No ratings yet
Chapter3-Goodness of Fit Tests
24 pages
EC2C4__Econometrics_II (11)
No ratings yet
EC2C4__Econometrics_II (11)
56 pages
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
No ratings yet
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
9 pages
Properties of Least Square Estimation
No ratings yet
Properties of Least Square Estimation
3 pages
SimpleLinearRegression 150107
No ratings yet
SimpleLinearRegression 150107
25 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
Linear Regression
No ratings yet
Linear Regression
108 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Reading 4
No ratings yet
Reading 4
15 pages
Properties of The OLS Estimator: Quantitative Methods 2
No ratings yet
Properties of The OLS Estimator: Quantitative Methods 2
57 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Econometric Theory: Module - Ii
No ratings yet
Econometric Theory: Module - Ii
11 pages
Simple-Linear-Regression-Model-3 24
No ratings yet
Simple-Linear-Regression-Model-3 24
87 pages
Derivation of BLUE Property of OLS Estimators
100% (2)
Derivation of BLUE Property of OLS Estimators
4 pages
Standard Errors For Regression Equations
No ratings yet
Standard Errors For Regression Equations
4 pages
Chapter 2: Simple Linear Regression
No ratings yet
Chapter 2: Simple Linear Regression
58 pages
Method of Moment
No ratings yet
Method of Moment
53 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
SLRM note
No ratings yet
SLRM note
15 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Notes 2
No ratings yet
Notes 2
15 pages
Reg02
No ratings yet
Reg02
46 pages
Simple Linear Regression.: 29.1 Method of Least Squares
No ratings yet
Simple Linear Regression.: 29.1 Method of Least Squares
4 pages
Simple Linear Regression.: 29.1 Method of Least Squares
No ratings yet
Simple Linear Regression.: 29.1 Method of Least Squares
4 pages
ECONF241 GaussMarkov Theorem
No ratings yet
ECONF241 GaussMarkov Theorem
25 pages
Simple Linear Regression Analysis - Final
No ratings yet
Simple Linear Regression Analysis - Final
46 pages
Chapter 9 Simple Linear Regression and Correlation (1) (1)
No ratings yet
Chapter 9 Simple Linear Regression and Correlation (1) (1)
56 pages
Chap02-5 (Autosaved)
No ratings yet
Chap02-5 (Autosaved)
66 pages
Lecture 2: Simple Linear Regression Model: Recap
No ratings yet
Lecture 2: Simple Linear Regression Model: Recap
5 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
Unit - III
No ratings yet
Unit - III
4 pages
Lecture5 Module2 Anova 1
No ratings yet
Lecture5 Module2 Anova 1
9 pages
L3 SLR model 3
No ratings yet
L3 SLR model 3
16 pages
Least Squares Estimation PDF
No ratings yet
Least Squares Estimation PDF
5 pages
Lecture-4
No ratings yet
Lecture-4
11 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
UnivariateRegression 3
No ratings yet
UnivariateRegression 3
81 pages
CH 11 Slides
No ratings yet
CH 11 Slides
41 pages
Introduction to Bessel Functions
From Everand
Introduction to Bessel Functions
Frank Bowman
2.5/5 (1)
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
Sparse Additive Models: University of California, Berkeley, USA
No ratings yet
Sparse Additive Models: University of California, Berkeley, USA
22 pages
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
Boosting: I I I I
No ratings yet
Boosting: I I I I
5 pages
Differential Privacy: 1 N I 1 N N
No ratings yet
Differential Privacy: 1 N I 1 N N
7 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
No ratings yet
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
2 pages
Nonparametric Classification 10/36-702: 1 1 N N N I I
No ratings yet
Nonparametric Classification 10/36-702: 1 1 N N N I I
20 pages
Support Vector Machines
No ratings yet
Support Vector Machines
5 pages
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
22 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
10/36-702 Statistical Machine Learning Homework #2 Solutions
No ratings yet
10/36-702 Statistical Machine Learning Homework #2 Solutions
11 pages
Online Learning: T T T T T T T T
No ratings yet
Online Learning: T T T T T T T T
8 pages
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
No ratings yet
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
12 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
19 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
HW7
No ratings yet
HW7
1 page
Manifold Estimation, Hidden Structure and Dimension Reduction
No ratings yet
Manifold Estimation, Hidden Structure and Dimension Reduction
39 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
No ratings yet
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
15 pages
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
No ratings yet
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
35 pages
Data Analysis Exam 1 36-401, Section B
No ratings yet
Data Analysis Exam 1 36-401, Section B
3 pages
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
No ratings yet
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
3 pages
Lecture 8: Inference 36-401, Fall 2015, Section B
No ratings yet
Lecture 8: Inference 36-401, Fall 2015, Section B
16 pages
Lecture 9: Predictive Inference
No ratings yet
Lecture 9: Predictive Inference
10 pages
Nonparametric Regression
No ratings yet
Nonparametric Regression
24 pages
1 Review
No ratings yet
1 Review
7 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
8 pages
King of My Heart Taylor Swift
No ratings yet
King of My Heart Taylor Swift
3 pages
Risk Assessment
No ratings yet
Risk Assessment
2 pages
The Ombudsman Component
No ratings yet
The Ombudsman Component
48 pages
Supreme Court: Malpractice As A Notary. - in 1941 or Five
No ratings yet
Supreme Court: Malpractice As A Notary. - in 1941 or Five
3 pages
AAA CulinaryStudyGuides Thet
No ratings yet
AAA CulinaryStudyGuides Thet
103 pages
PDS - SikaSwell-P Profiles - en - 06.12.2010
No ratings yet
PDS - SikaSwell-P Profiles - en - 06.12.2010
5 pages
Executive Driver-1
No ratings yet
Executive Driver-1
2 pages
Buy and Sell Indicator
No ratings yet
Buy and Sell Indicator
9 pages
Msds For Calgon PT TG en PDF
No ratings yet
Msds For Calgon PT TG en PDF
7 pages
Velar D300 - Price List - W.E.F. 03.04.2018
No ratings yet
Velar D300 - Price List - W.E.F. 03.04.2018
1 page
HDFC ERGO General Insurance Company Limited: Policy No. 2312 1001 0405 1200 000
No ratings yet
HDFC ERGO General Insurance Company Limited: Policy No. 2312 1001 0405 1200 000
2 pages
Chapter 8 Sampling and Confidence Intervals
No ratings yet
Chapter 8 Sampling and Confidence Intervals
38 pages
C1 - M2 - Long-Term Debt Instruments+answer Key
No ratings yet
C1 - M2 - Long-Term Debt Instruments+answer Key
5 pages
Introduction To XML: The Two Problems
No ratings yet
Introduction To XML: The Two Problems
39 pages
Detection of Underground Cable Fault Using Arduino and GSM
No ratings yet
Detection of Underground Cable Fault Using Arduino and GSM
1 page
Practice Questions Inverse LT and DE
No ratings yet
Practice Questions Inverse LT and DE
2 pages
Fiberglass
No ratings yet
Fiberglass
9 pages
BOL/GENIL Architecture For CRM IC Web Client
No ratings yet
BOL/GENIL Architecture For CRM IC Web Client
1 page
18EC0449-Introduction To IOT
No ratings yet
18EC0449-Introduction To IOT
5 pages
TCS Online
No ratings yet
TCS Online
2 pages
Triple Talaq Act 2019
No ratings yet
Triple Talaq Act 2019
2 pages
Class 12 Syllabus 2019-2020 Isc
100% (1)
Class 12 Syllabus 2019-2020 Isc
4 pages
Notes For Class 11 Ev2 Syllabus Chapter
No ratings yet
Notes For Class 11 Ev2 Syllabus Chapter
28 pages
Igc1 Mock 1. Solved Paper
No ratings yet
Igc1 Mock 1. Solved Paper
6 pages
2.analog Electronics PDF
No ratings yet
2.analog Electronics PDF
258 pages
Leave and License Agreement: Particulars Amount Paid GRN/Transaction Id Date
No ratings yet
Leave and License Agreement: Particulars Amount Paid GRN/Transaction Id Date
5 pages
Inside The Global Network of Neo-Nazis Recruiting in The UK
No ratings yet
Inside The Global Network of Neo-Nazis Recruiting in The UK
1 page
NIH Design Requirements Manual - 2011
No ratings yet
NIH Design Requirements Manual - 2011
909 pages

Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation

Uploaded by

Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation

Uploaded by

Lecture 4: Simple Linear Regression Models, with

Hints at Their Estimation

1 The Simple Linear Regression Model

2. If X = x, then Y = β0 + β1 x + , for some constants (“coefficients”,

4.  is uncorrelated across observations.

1.1 “Plug-In” Estimates

1.2 Least Squares Estimates

This is a function of the complete distribution, so we can’t get it from data,

Set these to zero at the optimum (β̂0 , β̂1 ):

y − β̂0 − β̂1 x = 0 (10)

The first equation, re-written, gives

β̂0 = y − β̂1 x (12)

Substituting this into the remaining equation,

0 = xy − ȳx̄ + β̂1 x̄x̄ − β̂1 x2 (13)

1.3 Bias, Variance and Standard Error of Parameter Es-

Expected value and bias Recall that E [i |Xi ] = 0, so

Unconditional-on-X Properties The last few paragraphs, as I said, have

That is, the estimator is unconditionally unbiased.

1.4 Parameter Interpretation; Causality

or, for any x1 , x2 ,

2 The Gaussian-Noise Simple Linear Regression

2. If X = x, then Y = β0 + β1 x + , for some constants (“coefficients”,

2.1 Visualizing the Gaussian Noise Model

2.2 Maximum Likelihood vs. Least Squares

x.seq <- seq(from=-1, to=3, length.out=150)

Figure 2: Illustrating how the conditional pdf of Y varies as a function of X, for

Other models Maximum likelihood estimates of the regression curve coin-

You might also like

2. If X = x, then Y = β0 + β1 x + , for some constants (“coefficients”,

4. is uncorrelated across observations.

Expected value and bias Recall that E [i |Xi ] = 0, so

2. If X = x, then Y = β0 + β1 x + , for some constants (“coefficients”,