0% found this document useful (0 votes)

151 views232 pages

STAT 135: Linear Regression: Joan Bruna

This document is a summary of Joan Bruna's STAT 135 class on linear regression. It provides 4 examples of how linear regression can be used: 1) modeling brain weight based on head size, 2) modeling egg flight time based on weight and height, 3) modeling population growth over time, and 4) modeling factors like SAT scores and GPA that influence salary. It explains that linear regression involves observations, a parametric model, and a criteria to assess model fit. It focuses on the least squares approach which minimizes the sum of squared errors between the observed and predicted values.

Uploaded by

hoalongkiem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

151 views232 pages

STAT 135: Linear Regression: Joan Bruna

Uploaded by

hoalongkiem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 232

STAT 135: Linear Regression

Joan Bruna

Department of Statistics
UC, Berkeley

May 1, 2015

Joan Bruna STAT 135: Linear Regression

Introduction: Example 1

We measure the Brain weight W (grams) and head size S (cubic

cm) for 237 adults.
(Source: R.J. Gladstone (1905). ”A Study of the Relations of the
Brain to to the Size of the Head”, Biometrika, Vol. 4, pp105-123).

Joan Bruna STAT 135: Linear Regression

Introduction: Example 1

We measure the Brain weight W (grams) and head size S (cubic

cm) for 237 adults.
(Source: R.J. Gladstone (1905). ”A Study of the Relations of the
Brain to to the Size of the Head”, Biometrika, Vol. 4, pp105-123).
A reasonable model is

W ≈ β0 + β1 S .

Joan Bruna STAT 135: Linear Regression

Introduction: Example 1

We measure the Brain weight W (grams) and head size S (cubic

cm) for 237 adults.
(Source: R.J. Gladstone (1905). ”A Study of the Relations of the
Brain to to the Size of the Head”, Biometrika, Vol. 4, pp105-123).
A reasonable model is

W ≈ β0 + β1 S .

Joan Bruna STAT 135: Linear Regression

Introduction: Example 1

Joan Bruna STAT 135: Linear Regression

Introduction: Example 1

Is the data consistent with our model? ie, is our model correct?

Joan Bruna STAT 135: Linear Regression

Introduction: Example 1

Is the data consistent with our model? ie, is our model correct?
How can we estimate the parameters of our model? Precision?

Joan Bruna STAT 135: Linear Regression

Introduction: Example 2

Say we measure the flight time T (in seconds) of an egg of weight

W grams thrown from H meters.
What is a reasonable model for T ≈ F (W , H) ??

Joan Bruna STAT 135: Linear Regression

Introduction: Example 2

F (W , H) ≈

Joan Bruna STAT 135: Linear Regression

Introduction: Example 2

√ 1
F (W , H) ≈ β H , (with β = p ).
9.8/2

How√can we combine/test different features of our measurements

(eg H, H, H 2 , etc) ?
Joan Bruna STAT 135: Linear Regression
Introduction: Example 3
World population P as a function of time Y :

Joan Bruna STAT 135: Linear Regression

Introduction: Example 3
World population P as a function of time Y :

Linear model of the form

P = β0 + β1 Y

does not look very good. Alternative?

Joan Bruna STAT 135: Linear Regression
Introduction: Example 3
A more reasonable model is that of exponential (constant) growth:

P(Y ) = γ0 e γ1 (Y −Y0 ) ,
where γ0 is the population at year Y0 .

How to estimate the growth rate from the data?

Joan Bruna STAT 135: Linear Regression

Introduction: Example 3
A more reasonable model is that of exponential (constant) growth:

P(Y ) = γ0 e γ1 (Y −Y0 ) ,
where γ0 is the population at year Y0 .

How to estimate the growth rate from the data? Simple idea:
transform the data to reveal the linear dependency:

log P = log γ0 + γ1 Y − γ1 Y0 = γ˜0 + β1 Y ,

with γ˜0 = log γ0 − γ1 Y0 .

Joan Bruna STAT 135: Linear Regression

Introduction: Example 3
A more reasonable model is that of exponential (constant) growth:

P(Y ) = γ0 e γ1 (Y −Y0 ) ,
where γ0 is the population at year Y0 .

How to estimate the growth rate from the data? Simple idea:
transform the data to reveal the linear dependency:

log P = log γ0 + γ1 Y − γ1 Y0 = γ˜0 + β1 Y ,

with γ˜0 = log γ0 − γ1 Y0 .

Current growth rate is estimated at 1.1%.
More sophisticated models have variable growth:

Joan Bruna STAT 135: Linear Regression

Introduction: Example 4

Often we need multiple factors to explain a given set of

measurements.
Average salary y from X = { SAT, GPA, University, major,
Zip}.

Joan Bruna STAT 135: Linear Regression

Introduction: Example 4

Often we need multiple factors to explain a given set of

measurements.
Average salary y from X = { SAT, GPA, University, major,
Zip}.
In digital images, pixel intensity y = I (u, v ) from the
neighboring intensities
X = {I (u 0 , v 0 ) u 0 6= u, v 0 6= v , |u − u 0 |, |v − v 0 | ≤ k}:

Joan Bruna STAT 135: Linear Regression

Introduction

In these examples,

p−1
X
y ≈ f (x1 , . . . , xp−1 , β0 , . . . , βp−1 ) = β0 + βk xk .
k=1

We will attempt to fit a function f with p parameters on n

observations.

Joan Bruna STAT 135: Linear Regression

Introduction

In these examples,

p−1
X
y ≈ f (x1 , . . . , xp−1 , β0 , . . . , βp−1 ) = β0 + βk xk .
k=1

We will attempt to fit a function f with p parameters on n

observations.
Many models can be cast as linear via appropriate data
transformations (see previous examples).

Joan Bruna STAT 135: Linear Regression

Introduction

In these examples,

p−1
X
y ≈ f (x1 , . . . , xp−1 , β0 , . . . , βp−1 ) = β0 + βk xk .
k=1

We will attempt to fit a function f with p parameters on n

observations.
Many models can be cast as linear via appropriate data
transformations (see previous examples).
Different statistical regimes as a function of p and number of
observations n:
p
When → 0, “easy” statistics.
n
p
When → C > 0, much harder (but more interesting!).
n

Joan Bruna STAT 135: Linear Regression

Regression vs Anova

In previous Chapter, we were interested in the question

“Is factor A (or B) influencing measurement Y ?”

Joan Bruna STAT 135: Linear Regression

Regression vs Anova

In previous Chapter, we were interested in the question

“Is factor A (or B) influencing measurement Y ?”

Now we are much more ambitious:

“How are factors Xj influencing measurement Y ?”

Joan Bruna STAT 135: Linear Regression

Linear Regression 101

The regression problem has three ingredients:

Joan Bruna STAT 135: Linear Regression

Linear Regression 101

The regression problem has three ingredients:

Observations (yi , xk,i ), i = 1 . . . n, k = 1 . . . p.

Joan Bruna STAT 135: Linear Regression

Linear Regression 101

The regression problem has three ingredients:

Observations (yi , xk,i ), i = 1 . . . n, k = 1 . . . p.
A parametrized fitting model F (x, β), with β ∈ Rp .

Joan Bruna STAT 135: Linear Regression

Linear Regression 101

The regression problem has three ingredients:

Observations (yi , xk,i ), i = 1 . . . n, k = 1 . . . p.
A parametrized fitting model F (x, β), with β ∈ Rp .
A criteria to assess how good is the approximation
yi ≈ F (xi , β) (also known as loss/cost function).
We will restrict ourselves to linear models: F (x, β) is linear in x
and β.

Joan Bruna STAT 135: Linear Regression

Linear Regression 101

The regression problem has three ingredients:

Joan Bruna STAT 135: Linear Regression

Least Squares

Given a cost function `(x, y ) satisfying `(x, y ) ≥ 0 and `(x, x) = 0,

we are interested in
n
X
min E (β) = `(yi , F (xi , β)) .
β∈Rp
i=1

Different choices for ` yield different statistical properties.

There exist one cost that greatly simplifies things:
`(x, y ) = |x − y |2 .

Joan Bruna STAT 135: Linear Regression

Least Squares

Given a cost function `(x, y ) satisfying `(x, y ) ≥ 0 and `(x, x) = 0,

we are interested in
n
X
min E (β) = `(yi , F (xi , β)) .
β∈Rp
i=1

Different choices for ` yield different statistical properties.

There exist one cost that greatly simplifies things:
`(x, y ) = |x − y |2 .
In that case, the system of equations

∂E (β)
= 0 , k = 1...p
∂βk
is Linear.

Joan Bruna STAT 135: Linear Regression

Linear Regression 101

Let us review the simple affine case (p = 2):

n
X
E (β0 , β1 ) = (yi − β0 − β1 xi )2 .
i=1

Joan Bruna STAT 135: Linear Regression

Linear Regression 101

Let us review the simple affine case (p = 2):

n
X
E (β0 , β1 ) = (yi − β0 − β1 xi )2 .
i=1

Then
∂E X
= −2 (yi − β0 − β1 xi ) ,
∂β0
i=1
∂E X
= −2 xi (yi − β0 − β1 xi ) .
∂β1
i=1

Joan Bruna STAT 135: Linear Regression

Linear Regression 101

Setting to zero and solving the 2 × 2 system we obtain

2 (
P P P P
xii i y i ) − ( i xi ) ( i xi yi )
βˆ0 = P 2 P 2
,
n i xi − ( i xi )
P P P
n ( i xi yi ) − ( i xi ) ( i yi )
βˆ1 = P 2 P 2
n i xi − ( i xi )

Example of brain weight. We obtain

W ≈ 325 + 0.26S .

Joan Bruna STAT 135: Linear Regression

Linear Regression 101

Setting to zero and solving the 2 × 2 system we obtain

2 (
P P P P
xii i y i ) − ( i xi ) ( i xi yi )
βˆ0 = P 2 P 2
,
n i xi − ( i xi )
P P P
n ( i xi yi ) − ( i xi ) ( i yi )
βˆ1 = P 2 P 2
n i xi − ( i xi )

Example of brain weight. We obtain

W ≈ 325 + 0.26S .

We will see how to build confidence intervals of these parameters

Joan Bruna STAT 135: Linear Regression

Simple Linear Regression

We introduce a statistical model for our observations:

yi = β0 + β1 xi + i , i = 1 . . . n ,
with i ∼ N (0, σ 2 ) iid.

How well can we recover the parameters of the model β0 and β1

from data?

Joan Bruna STAT 135: Linear Regression

Bias and Variance of Simple Linear Regression

Theorem
Under the previous assumptions, we have

E β̂j = βj , (j = 0, 1)

and
σ2 xi2 nσ 2
P
var βˆ0 = i
, var βˆ1 =
xi2 + ( x i )2 xi2 + ( xi )2
P P P P
n i i n i i

Unbiasedness only requires E () = 0.

Variance decreases with n, depends upon Signal (x) to noise
(σ 2 ) ratio.
Important: This theorem assumes iid errors.

Joan Bruna STAT 135: Linear Regression

Regression and Correlation

Recall the previous regression coefficients in the affine case:

P 2 P P P
ˆ i xi ( i yi ) − ( i xi ) ( i xi yi )
β0 = P 2 P 2
,
n i xi − ( i xi )
P P P
ˆ n ( i xi yi ) − ( i xi ) ( i yi )
β1 = P 2 P 2
n i xi − ( i xi )

Joan Bruna STAT 135: Linear Regression

Regression and Correlation

Recall the previous regression coefficients in the affine case:

P 2 P P P
ˆ i xi ( i yi ) − ( i xi ) ( i xi yi )
β0 = P 2 P 2
,
n i xi − ( i xi )
P P P
ˆ n ( i xi yi ) − ( i xi ) ( i yi )
β1 = P 2 P 2
n i xi − ( i xi )

We can re-write these solutions as

P
ˆ ˆ ˆ (xi − x)(yi − y )
β0 = y − β1 x , β1 = i P 2
.
i (xi − x)

Joan Bruna STAT 135: Linear Regression

Regression and Correlation

If we denote
1X 1X 1X
sx = (xi −x)2 , sy = (yi −y )2 , sxy = (xi −x)(yi −y )
n n n
i i i

The correlation coefficient between x and y is

sxy
ρ= √ .
sx sy

Joan Bruna STAT 135: Linear Regression

Regression and Correlation

If we denote
1X 1X 1X
sx = (xi −x)2 , sy = (yi −y )2 , sxy = (xi −x)(yi −y )
n n n
i i i

The correlation coefficient between x and y is

sxy
ρ= √ .
sx sy
r
sy
It results that ρ = βˆ1 , or
sx
r
ˆ sy
β1 = ρ .
sx

Joan Bruna STAT 135: Linear Regression

Regression and Correlation

If we denote
1X 1X 1X
sx = (xi −x)2 , sy = (yi −y )2 , sxy = (xi −x)(yi −y )
n n n
i i i

The correlation coefficient between x and y is

sxy
ρ= √ .
sx sy
r
sy
It results that ρ = βˆ1 , or
sx
r
ˆ sy
β1 = ρ .
sx

In particular, slope is 0 if and only if x and y are uncorrelated.

Joan Bruna STAT 135: Linear Regression

Regression and Correlation

Let us rewrite the regression equation:

ŷ = βˆ0 + βˆ1 x

Joan Bruna STAT 135: Linear Regression

Regression and Correlation

Let us rewrite the regression equation:

ŷ = βˆ0 + βˆ1 x

r r
sy sy
ŷ = y − ρ x +ρ x
sx sx

Joan Bruna STAT 135: Linear Regression

Regression and Correlation

Let us rewrite the regression equation:

ŷ = βˆ0 + βˆ1 x

r r
sy sy
ŷ = y − ρ x +ρ x
sx sx

ŷ − y x −x
√ =ρ √ .
sy sx

Joan Bruna STAT 135: Linear Regression

Regression and Correlation

Let us rewrite the regression equation:

ŷ = βˆ0 + βˆ1 x

r r
sy sy
ŷ = y − ρ x +ρ x
sx sx

ŷ − y x −x
√ =ρ √ .
sy sx

Very Important Fact: |ρ| ≤ 1.

Joan Bruna STAT 135: Linear Regression

Galton Experiment (1885)

Study of the heights of of fathers and their sons.

Joan Bruna STAT 135: Linear Regression

Galton Experiment (1885)

Study of the heights of of fathers and their sons. He found that

The children of larger than average parents tend to be smaller than
their parents,
The children of shorter than average parents tend to be larger than
their parents.

Joan Bruna STAT 135: Linear Regression

Galton Experiment (1885)

Study of the heights of of fathers and their sons. He found that

The children of larger than average parents tend to be smaller than
their parents,
The children of shorter than average parents tend to be larger than
their parents.
Why?
Joan Bruna STAT 135: Linear Regression
Linear Regression in Matrix Form

We denoted the data as

yi , xi,1 , xi,2 , . . . , xi,p , i = 1 . . . n .

Joan Bruna STAT 135: Linear Regression

Linear Regression in Matrix Form

We denoted the data as

yi , xi,1 , xi,2 , . . . , xi,p , i = 1 . . . n .

Our regression model was

X
ŷi = βk xi,k , i = 1 . . . n .
k≤p

Joan Bruna STAT 135: Linear Regression

Linear Regression in Matrix Form

We denoted the data as

yi , xi,1 , xi,2 , . . . , xi,p , i = 1 . . . n .

Our regression model was

X
ŷi = βk xi,k , i = 1 . . . n .
k≤p

Let us write

β = (β1 , . . . , βp ) ∈ Rp , ŷ = (ŷ1 , . . . , ŷn ) ∈ Rn , (X )i,k = xi,k ∈ Rn×p .

Joan Bruna STAT 135: Linear Regression

Linear Regression in Matrix Form

We denoted the data as

yi , xi,1 , xi,2 , . . . , xi,p , i = 1 . . . n .

Our regression model was

X
ŷi = βk xi,k , i = 1 . . . n .
k≤p

Let us write

β = (β1 , . . . , βp ) ∈ Rp , ŷ = (ŷ1 , . . . , ŷn ) ∈ Rn , (X )i,k = xi,k ∈ Rn×p .

Then

ŷ = X β .

Joan Bruna STAT 135: Linear Regression

Linear Regression in Matrix Form

Our loss function was

n
1X X
E (β) = (yi − βk xi,k )2 .
2
i=1 k

Joan Bruna STAT 135: Linear Regression

Linear Regression in Matrix Form

Our loss function was

n
1X X
E (β) = (yi − βk xi,k )2 .
2
i=1 k

This is the squared Euclidean norm of the vector y − X β, therefore

1
E (β) = ky − X βk2 .
2

Joan Bruna STAT 135: Linear Regression

Computing derivatives in Rn

If
E (x) : Rn −→ R ,
the derivative or gradient of E with respect to x is written

∂E
∇x E ∈ Rn , with (∇x E )i = .
∂xi

Example: Say
1 1X 2
E (x) = kxk2 = xi .
2 2
i

Joan Bruna STAT 135: Linear Regression

Computing derivatives in Rn

If
E (x) : Rn −→ R ,
the derivative or gradient of E with respect to x is written

∂E
∇x E ∈ Rn , with (∇x E )i = .
∂xi

Example: Say
1 1X 2
E (x) = kxk2 = xi .
2 2
i

Then
∇x E = x .

Joan Bruna STAT 135: Linear Regression

Vector Chain Rule

Say you have a function

E = F (y ) , y = G (x) ,

with
E ∈ R , y ∈ Rm , x ∈ Rn .

Joan Bruna STAT 135: Linear Regression

Vector Chain Rule

Say you have a function

E = F (y ) , y = G (x) ,

with
E ∈ R , y ∈ Rm , x ∈ Rn .
Question: How to compute the gradient of E with respect to x?

Joan Bruna STAT 135: Linear Regression

Vector Chain Rule

Say you have a function

E = F (y ) , y = G (x) ,

with
E ∈ R , y ∈ Rm , x ∈ Rn .
Question: How to compute the gradient of E with respect to x?
A: Chain Rule is defined as usual:
T
∂G
∇x E = (y ) ∇y F .
∂x

Joan Bruna STAT 135: Linear Regression

Back to Regression

Recall that our objective function is

1
E (β) = ky − X βk2 .
2

Joan Bruna STAT 135: Linear Regression

Back to Regression

Recall that our objective function is

1
E (β) = ky − X βk2 .
2
Let’s compute painlessly the least squares solution:
1
E (β) = kzk2 , with z = y − X β .
2

Joan Bruna STAT 135: Linear Regression

Back to Regression

Recall that our objective function is

1
E (β) = ky − X βk2 .
2
Let’s compute painlessly the least squares solution:
1
E (β) = kzk2 , with z = y − X β .
2
Now we apply the chain rule:

∇z E = z and (∂β z) = −X .

Joan Bruna STAT 135: Linear Regression

Back to Regression

Recall that our objective function is

1
E (β) = ky − X βk2 .
2
Let’s compute painlessly the least squares solution:
1
E (β) = kzk2 , with z = y − X β .
2
Now we apply the chain rule:

∇z E = z and (∂β z) = −X .

Therefore
∇x E = −X T z = −X T (y − X β) .

Joan Bruna STAT 135: Linear Regression

The Normal Equations

It results that

∇x E = 0 ⇔ X T X βb = X T y ⇔ βb = (X T X )−1 X T y .

Joan Bruna STAT 135: Linear Regression

The Normal Equations

It results that

∇x E = 0 ⇔ X T X βb = X T y ⇔ βb = (X T X )−1 X T y .

The matrix X † = (X T X )−1 X T is called the pseudoinverse of X .

Joan Bruna STAT 135: Linear Regression

The Normal Equations

It results that

∇x E = 0 ⇔ X T X βb = X T y ⇔ βb = (X T X )−1 X T y .

The matrix X † = (X T X )−1 X T is called the pseudoinverse of X .

Lemma
The normal equations have a unique solution if and only if
rank(X ) = p.

Joan Bruna STAT 135: Linear Regression

Revisit Affine Case p = 2
 
1 x1
 1 x2 
When fitting a straight line, we had X = 
 
.. .. 
 . . 
1 xn

Joan Bruna STAT 135: Linear Regression

Revisit Affine Case p = 2
 
1 x1
 1 x2 
When fitting a straight line, we had X = 
 
.. .. 
 . . 
1 xn
Thus
 X   X 
n xi yi
T  , X T y =  Xi
   
X X = X i
X  , and
 x i xi2   xi yi 
i i i

 X X 
xi2 − xi
1
(X T X )−1
 i i

= P 2  X  .
n i xi − ( i xi )2  −
P
xi n 
i

Joan Bruna STAT 135: Linear Regression

Multivariate Random Variables

In many situations, it will be easier to study random variables

jointly. In particular, in linear regression.

Joan Bruna STAT 135: Linear Regression

Multivariate Random Variables

In many situations, it will be easier to study random variables

jointly. In particular, in linear regression.
We define a random vector Y as Y = (Y1 , . . . , Yn ), where each Yi
is a random variable, with

E (Yi ) = µi , cov (Yi , Yj ) = σi,j .

Joan Bruna STAT 135: Linear Regression

Multivariate Random Variables

In many situations, it will be easier to study random variables

jointly. In particular, in linear regression.
We define a random vector Y as Y = (Y1 , . . . , Yn ), where each Yi
is a random variable, with

E (Yi ) = µi , cov (Yi , Yj ) = σi,j .

More compactly, we write

E (Y ) = µ = (µ1 , . . . , µn ) ∈ Rn , ΣY ,Y ∈ Rn×n , ΣY ,Y i,j = σi,j .

µ is the mean of Y and Σ is the covariance matrix of Y .

Joan Bruna STAT 135: Linear Regression

Multivariate Random Variables

In many situations, it will be easier to study random variables

jointly. In particular, in linear regression.
We define a random vector Y as Y = (Y1 , . . . , Yn ), where each Yi
is a random variable, with

E (Yi ) = µi , cov (Yi , Yj ) = σi,j .

More compactly, we write

E (Y ) = µ = (µ1 , . . . , µn ) ∈ Rn , ΣY ,Y ∈ Rn×n , ΣY ,Y i,j = σi,j .

µ is the mean of Y and Σ is the covariance matrix of Y .

Q: What happens with E (Y ) and ΣY under linear transformations?

Joan Bruna STAT 135: Linear Regression

Linearity of Mean and Covariance

Lemma
Let Y be a random vector with mean µ and covariance Σ. Then, if
Z = b + AY is a linear transformation, we have

E (Z ) = b + Aµ , and ΣZ ,Z = AΣAT .

Joan Bruna STAT 135: Linear Regression

Example

Suppose we have Y1 , . . . , Yn modeling the temperature (in Celsius)

at every second n, with Yi = µi + i and E () = 0, Σ = σ 2 1.

Joan Bruna STAT 135: Linear Regression

Example

Suppose we have Y1 , . . . , Yn modeling the temperature (in Celsius)

at every second n, with Yi = µi + i and E () = 0, Σ = σ 2 1.
Say we want the average temperature Z every minute, and in
Farenheit!.

Joan Bruna STAT 135: Linear Regression

Example

Suppose we have Y1 , . . . , Yn modeling the temperature (in Celsius)

at every second n, with Yi = µi + i and E () = 0, Σ = σ 2 1.
Say we want the average temperature Z every minute, and in
Farenheit!.
First, every coordinate is transformed according to
18
Faren = 50 + (Celsius − 10).
10
Then, the minute averages can be computed with a matrix of the
form  
1 ... 1 0 ... 0 ...
1  0 ... 0 1 ... 1 ... 

A=  .. .. .. .. .. .. ..  .
60  . . . . . . . 
0 ... 0 ... 1 ... 1

Joan Bruna STAT 135: Linear Regression

Example

Thus
Z = b + AY ,
and

E (Z ) = b + AE (Y ) = b + A(µ + E ()) = b + Aµ .

Joan Bruna STAT 135: Linear Regression

Example

Thus
Z = b + AY ,
and

E (Z ) = b + AE (Y ) = b + A(µ + E ()) = b + Aµ .

Also,
ΣZ = σ 2 AAT .

Joan Bruna STAT 135: Linear Regression

Random Quadratic forms

Another operation we have encountered often is sums of squares of

random variables.

Joan Bruna STAT 135: Linear Regression

Random Quadratic forms

Another operation we have encountered often is sums of squares of

random variables.
These are examples of what is called quadratic forms
X
x T Ax = ai,i 0 xi xi 0 .
i,i 0

Joan Bruna STAT 135: Linear Regression

Random Quadratic forms

Another operation we have encountered often is sums of squares of

random variables.
These are examples of what is called quadratic forms
X
x T Ax = ai,i 0 xi xi 0 .
i,i 0

Lemma
Under the same assumptions as the previous theorem, let
X = Y T AY ∈ R. Then

E (X ) = Tr (AΣ) + µT Aµ .

Joan Bruna STAT 135: Linear Regression

Important Example
We consider X
s= (Xi − X )2 ,
i

where Xi are uncorrelated with common mean µ.

Joan Bruna STAT 135: Linear Regression

Important Example
We consider X
s= (Xi − X )2 ,
i

where Xi are uncorrelated with common mean µ.

Can we see s as the squared norm of a vector of the form AX ?

Joan Bruna STAT 135: Linear Regression

Important Example
We consider X
s= (Xi − X )2 ,
i

where Xi are uncorrelated with common mean µ.

Can we see s as the squared norm of a vector of the form AX ?
1
Observe that X = 1T X , with 1 = (1, . . . , 1).
n

Joan Bruna STAT 135: Linear Regression

Important Example
We consider X
s= (Xi − X )2 ,
i

where Xi are uncorrelated with common mean µ.

Can we see s as the squared norm of a vector of the form AX ?
1
Observe that X = 1T X , with 1 = (1, . . . , 1). Thus
n
1 T
(X , . . . , X ) = 11 X , and
n
1
A = Id − 11T .
n

Joan Bruna STAT 135: Linear Regression

Important Example
We consider X
s= (Xi − X )2 ,
i

where Xi are uncorrelated with common mean µ.

Can we see s as the squared norm of a vector of the form AX ?
1
Observe that X = 1T X , with 1 = (1, . . . , 1). Thus
n
1 T
(X , . . . , X ) = 11 X , and
n
1
A = Id − 11T .
n
We then have
X
(Xi − X )2 = kAX k2 = X T AT AX .
i

Joan Bruna STAT 135: Linear Regression

Example (continued)

The matrix A is called a projection matrix. It satisfies

AT = A,
A2 =

Joan Bruna STAT 135: Linear Regression

Example (continued)

The matrix A is called a projection matrix. It satisfies

AT = A,
A2 = A.

Joan Bruna STAT 135: Linear Regression

Example (continued)

The matrix A is called a projection matrix. It satisfies

AT = A,
A2 = A.
Thus
X T AT AX = X T AX
and
E X T AX = σ 2 Tr (A) + µT Aµ .

Joan Bruna STAT 135: Linear Regression

Example (continued)

The matrix A is called a projection matrix. It satisfies

AT = A,
A2 = A.
Thus
X T AT AX = X T AX
and
E X T AX = σ 2 Tr (A) + µT Aµ .

Since µ is a constant vector, Aµ = 0 and

s = σ 2 Tr (A) = σ 2 (n − 1) .

Joan Bruna STAT 135: Linear Regression

Cross-Covariance of Random Vectors

Given random vectors Y ∈ Rn and Z ∈ Rm , the cross-covariance

ΣX ,Y ∈ Rn×m is the matrix

(ΣX ,Y )i,j = cov (Yi , Zj ) .

Joan Bruna STAT 135: Linear Regression

Cross-Covariance of Random Vectors

Given random vectors Y ∈ Rn and Z ∈ Rm , the cross-covariance

ΣX ,Y ∈ Rn×m is the matrix

(ΣX ,Y )i,j = cov (Yi , Zj ) .

Lemma
Let X ∈ Rn be random vector with covariance Σ. If Y = AX ∈ Rp
and Z = BX ∈ Rm are linear transformations of X , where A and B
are constant matrices, then

ΣY ,Z = AΣB T .

Joan Bruna STAT 135: Linear Regression

Mean and Covariance of Least Squares Estimates

Recall our statistical model:

Y = Xβ + ,
with
X ∈ Rn×p known and fixed,
β ∈ Rp unknown parameters,
∈ Rn with E () = 0 and Σ = σ 2 1.

Joan Bruna STAT 135: Linear Regression

Mean and Covariance of Least Squares Estimates

Recall our statistical model:

Y = Xβ + ,
with
X ∈ Rn×p known and fixed,
β ∈ Rp unknown parameters,
∈ Rn with E () = 0 and Σ = σ 2 1.

Estimation of β: Given y = (y1 , . . . , yn ) observations, the least

squares estimate is

β̂ = X † y = (X T X )−1 X T y .

Joan Bruna STAT 135: Linear Regression

Mean and Covariance of Least Squares Estimates

Theorem
If E () = 0, then
E β̂ = β .

Joan Bruna STAT 135: Linear Regression

Mean and Covariance of Least Squares Estimates

Theorem
If E () = 0, then
E β̂ = β .

Theorem
If E () = 0 and Σ = σ 2 1, then

Σβ̂ = σ 2 (X T X )−1 .

Joan Bruna STAT 135: Linear Regression

Example: affine case.

Recall that
 X X 
xi2 − xi
1
(X T X )−1
 
= P 2  X i i  .
P 2 
n i xi − ( i xi ) − xi n 
i

Joan Bruna STAT 135: Linear Regression

Example: affine case.

Recall that
 X X 
xi2 − xi
1
(X T X )−1
 
= P 2  X i i  .
P 2 
n i xi − ( i xi ) − xi n 
i

It results that
 
var βˆ0 cov βˆ0 , βˆ1
Σβ̂ =  
cov βˆ0 , βˆ1 var βˆ1
2P 2
−σ 2 i xi
P
1 σ P i xi
= P 2
xi )2 −σ 2 i xi σ2n
P
n x −(
i i i

Joan Bruna STAT 135: Linear Regression

Linear Regression and Maximum-Likelihood

Theorem
Under the iid Gaussian statistical model Y ∼ N (βX , σ 2 Id),

β̂ = X † y

is the maximum likelihood estimator for β.

Joan Bruna STAT 135: Linear Regression

Estimation of σ 2

In most situations, we do not know σ 2 in advance.

MLE estimation of σ 2 under the Gaussian model is
1
σˆ2 MLE = kY − X β̂k2 .
n

Joan Bruna STAT 135: Linear Regression

Estimation of σ 2

In most situations, we do not know σ 2 in advance.

MLE estimation of σ 2 under the Gaussian model is
1
σˆ2 MLE = kY − X β̂k2 .
n
However, this is biased and only applies to Gaussian errors.

Joan Bruna STAT 135: Linear Regression

Estimation of σ 2

In most situations, we do not know σ 2 in advance.

MLE estimation of σ 2 under the Gaussian model is
1
σˆ2 MLE = kY − X β̂k2 .
n
However, this is biased and only applies to Gaussian errors.
How to generalize/improve?

Joan Bruna STAT 135: Linear Regression

Estimation of σ 2

Let us introduce the vector of residuals.

Definition
The residuals of the linear regression are

ê = Y − Ŷ = Y − X β̂

Joan Bruna STAT 135: Linear Regression

Estimation of σ 2

Let us introduce the vector of residuals.

Definition
The residuals of the linear regression are

ê = Y − Ŷ = Y − X β̂

Theorem
Under the assumption that errors are uncorrelated with constant
variance σ 2 ,
kêk2
s2 =
n−p
is an unbiased estimator of σ 2 : E s 2 = σ 2 .

Joan Bruna STAT 135: Linear Regression

Estimation of σ 2

We first write the residuals as

Y − Ŷ = Y − X β̂ = (Id − X (X T X )−1 X T )Y .

Joan Bruna STAT 135: Linear Regression

Estimation of σ 2

We first write the residuals as

Y − Ŷ = Y − X β̂ = (Id − X (X T X )−1 X T )Y .

The matrix PX ⊥ = Id − X (X T X )−1 X T is a projection matrix

onto the orthogonal subspace of X .

Joan Bruna STAT 135: Linear Regression

Estimation of σ 2

We first write the residuals as

Y − Ŷ = Y − X β̂ = (Id − X (X T X )−1 X T )Y .

The matrix PX ⊥ = Id − X (X T X )−1 X T is a projection matrix

onto the orthogonal subspace of X .
In particular, we have PXT⊥ = PX ⊥ and PX2 ⊥ = PX ⊥ .

Joan Bruna STAT 135: Linear Regression

Estimation of σ 2

It results that
X
kêk2 = (Yi − Ŷi )2 = kY − Ŷ k2 = Y T PX ⊥ Y ,
i

Joan Bruna STAT 135: Linear Regression

Estimation of σ 2

It results that
X
kêk2 = (Yi − Ŷi )2 = kY − Ŷ k2 = Y T PX ⊥ Y , so
i

E kêk2 = E (Y )T PX ⊥ E (Y ) + σ 2 tr (PX ⊥ ) .

Joan Bruna STAT 135: Linear Regression

Estimation of σ 2

It results that
X
kêk2 = (Yi − Ŷi )2 = kY − Ŷ k2 = Y T PX ⊥ Y , so
i

E kêk2 = E (Y )T PX ⊥ E (Y ) + σ 2 tr (PX ⊥ ) .

We have
PX ⊥ E (Y ) = PX ⊥ X β = 0 .

Joan Bruna STAT 135: Linear Regression

Estimation of σ 2

It results that
X
kêk2 = (Yi − Ŷi )2 = kY − Ŷ k2 = Y T PX ⊥ Y , so
i

E kêk2 = E (Y )T PX ⊥ E (Y ) + σ 2 tr (PX ⊥ ) .

We have
PX ⊥ E (Y ) = PX ⊥ X β = 0 .
tr (PX ⊥ ) = tr (Id) − tr (X (X T X )−1 X T ) =
n − tr ((X T X )−1 X T X ) = n − p .

Joan Bruna STAT 135: Linear Regression

Estimation of σ 2

It results that
X
kêk2 = (Yi − Ŷi )2 = kY − Ŷ k2 = Y T PX ⊥ Y , so
i

E kêk2 = E (Y )T PX ⊥ E (Y ) + σ 2 tr (PX ⊥ ) .

We have
PX ⊥ E (Y ) = PX ⊥ X β = 0 .
tr (PX ⊥ ) = tr (Id) − tr (X (X T X )−1 X T ) =
n − tr ((X T X )−1 X T X ) = n − p .
So E kêk2 = σ 2 (n − p).

Joan Bruna STAT 135: Linear Regression

Assessing Fit with Residuals

How can we determine whether a regression model is “good” ?

Joan Bruna STAT 135: Linear Regression

Assessing Fit with Residuals

How can we determine whether a regression model is “good” ?

Using that ê = PX ⊥ Y , we can compute how the residuals are
correlated:

Σê,ê = PX ⊥ ΣY ,Y PXT⊥ = σ 2 PX ⊥ .

Residuals are correlated and variances are not uniform.

Joan Bruna STAT 135: Linear Regression

Assessing Fit with Residuals

How can we determine whether a regression model is “good” ?

Using that ê = PX ⊥ Y , we can compute how the residuals are
correlated:

Σê,ê = PX ⊥ ΣY ,Y PXT⊥ = σ 2 PX ⊥ .

Residuals are correlated and variances are not uniform.

We can standardize the residuals with

Y − Ŷi Yi − Ŷi
pi = p .
s PX ⊥ (i, i) s 1 − PX (i, i)

Joan Bruna STAT 135: Linear Regression

Assessing Fit with Residuals

We also have that

Lemma
If the errors have covariance matrix σ 2 Id, then the residuals ê are
uncorrelated from the fitted values Ŷ .

Joan Bruna STAT 135: Linear Regression

Assessing Fit with Residuals

We also have that

Lemma
If the errors have covariance matrix σ 2 Id, then the residuals ê are
uncorrelated from the fitted values Ŷ .
Remarks:
we can check visually that the residuals and fitted values are
not linearly related.
we can also check visually the assumption of constant
variance.

Joan Bruna STAT 135: Linear Regression

Example: Brain Weight vs Head Size
Recall the data: Brain Weight vs Head Size:

We perform affine regression to obtain

Ŵ = 325.6 + 0.26S .

Joan Bruna STAT 135: Linear Regression

Example: Brain Weight vs Head Size
We plot the error residuals vs the predicted values:

Joan Bruna STAT 135: Linear Regression

Example: Brain Weight vs Head Size
We plot the error residuals vs the predicted values:

No apparent remaining correlation.

Variance seems uniform across predictor,

Joan Bruna STAT 135: Linear Regression

Example: Brain Weight vs Head Size
We plot the error residuals vs the predicted values:

No apparent remaining correlation.

Variance seems uniform across predictor,
So the simple regression model appears valid for this dataset.
Joan Bruna STAT 135: Linear Regression
Inference about regression coefficients

We saw that if Y ∼ N (X β, σ 2 Id), then

β̂ = X † Y ∼ N (β, σ 2 (X T X )−1 ) .

Joan Bruna STAT 135: Linear Regression

Inference about regression coefficients

We saw that if Y ∼ N (X β, σ 2 Id), then

β̂ = X † Y ∼ N (β, σ 2 (X T X )−1 ) .

Q: How to test hypothesis about β and construct confidence

intervals?

Joan Bruna STAT 135: Linear Regression

Inference about regression coefficients

Denote by C = (X T X )−1 the inverse empiric covariance of X .

Joan Bruna STAT 135: Linear Regression

Inference about regression coefficients

Denote by C = (X T X )−1 the inverse empiric covariance of X .

Under normality assumption, we have that

β̂k − βk
∀k , p ∼ tn−p .
s Ck,k

Joan Bruna STAT 135: Linear Regression

Inference about regression coefficients

Denote by C = (X T X )−1 the inverse empiric covariance of X .

Under normality assumption, we have that

β̂k − βk
∀k , p ∼ tn−p .
s Ck,k

If n large enough, CLT gives approximate normality for β̂ even

if errors are not Gaussian.

Joan Bruna STAT 135: Linear Regression

Inference about regression coefficients

Denote by C = (X T X )−1 the inverse empiric covariance of X .

Under normality assumption, we have that

β̂k − βk
∀k , p ∼ tn−p .
s Ck,k

If n large enough, CLT gives approximate normality for β̂ even

if errors are not Gaussian.
A 100(1p − α)% confidence interval for βi is
β̂i ± s Ci,i tn−p (1 − α/2).

Joan Bruna STAT 135: Linear Regression

Inference about regression coefficients

Denote by C = (X T X )−1 the inverse empiric covariance of X .

Under normality assumption, we have that

β̂k − βk
∀k , p ∼ tn−p .
s Ck,k

If n large enough, CLT gives approximate normality for β̂ even

if errors are not Gaussian.
A 100(1p − α)% confidence interval for βi is
β̂i ± s Ci,i tn−p (1 − α/2).
A test for the null hypothesis H0 : βi = β0 is performed
β̂i − β0
using t = p , whose null distribution is tn−p .
s Ci,i

Joan Bruna STAT 135: Linear Regression

Important Remark: Signal vs Noise

The covariance of our estimators β̂ is

Σβ̂ = σ 2 (X T X )−1 .

Joan Bruna STAT 135: Linear Regression

Important Remark: Signal vs Noise

The covariance of our estimators β̂ is

Σβ̂ = σ 2 (X T X )−1 .

σ 2 is the amount of noise ; so the stronger the noise, the

worse.

Joan Bruna STAT 135: Linear Regression

Important Remark: Signal vs Noise

The covariance of our estimators β̂ is

Σβ̂ = σ 2 (X T X )−1 .

σ 2 is the amount of noise ; so the stronger the noise, the

worse.
(X T X )−1 is the inverse covariance of the signal X .

Joan Bruna STAT 135: Linear Regression

Important Remark: Signal vs Noise

The covariance of our estimators β̂ is

Σβ̂ = σ 2 (X T X )−1 .

σ 2 is the amount of noise ; so the stronger the noise, the

worse.
(X T X )−1 is the inverse covariance of the signal X . So the
stronger the signal, the better.

Joan Bruna STAT 135: Linear Regression

Example: Egg flight time

Joan Bruna STAT 135: Linear Regression

Example: Egg flight time

We can nevertheless attempt a linear regression...

Joan Bruna STAT 135: Linear Regression

Example: Egg flight time
We plot the error residuals vs the predicted values:

Joan Bruna STAT 135: Linear Regression

Example: Egg flight time
We plot the error residuals vs the predicted values:

There is a (nonlinear) dependency between residuals and

predicted values.
The regression model seems poorly adapted in this case.
Joan Bruna STAT 135: Linear Regression
Important Example
Q: When there is a relationship of the form Y = aX b , but we do
not know b, how to estimate it?

Joan Bruna STAT 135: Linear Regression

Important Example
Q: When there is a relationship of the form Y = aX b , but we do
not know b, how to estimate it?
A: log-log data transformation: log Y = log a + b log X .

Joan Bruna STAT 135: Linear Regression

Important Example
Q: When there is a relationship of the form Y = aX b , but we do
not know b, how to estimate it?
A: log-log data transformation: log Y = log a + b log X .

We estimate b̂ = 0.48 ≈ 1/2.

However, we observe that the variance of the error residuals is
not constant.

Joan Bruna STAT 135: Linear Regression

Important Example
Q: When there is a relationship of the form Y = aX b , but we do
not know b, how to estimate it?
A: log-log data transformation: log Y = log a + b log X .

We estimate b̂ = 0.48 ≈ 1/2.

However, we observe that the variance of the error residuals is
not constant.
What can we do to improve the fit?
Joan Bruna STAT 135: Linear Regression
Transforming Data to Improve the Fit
√
Now that we have figured out that Y ∝ X 0.48 ≈ X , let’s try to
figure out the gravitational constant.

Joan Bruna STAT 135: Linear Regression

Transforming Data to Improve the Fit
√
Now that we have figured out that Y ∝ X 0.48 ≈ X , let’s try to
figure out the gravitational constant.
First Idea: Regress Y 2 against X :

Joan Bruna STAT 135: Linear Regression

Transforming Data to Improve the Fit
√
Now that we have figured out that Y ∝ X 0.48 ≈ X , let’s try to
figure out the gravitational constant.
First Idea: Regress Y 2 against X :

Yˆ2 = β0 + β1 X , withβ0 ∈ (−0.3 ± 0.55) , β1 ∈ (0.21 ± 0.01) .

√ p
Remember that Y = α X , with αp = 2/9.8 = 0.452. We thus
obtain a 100(1 − α) CI for α with β1 = (0.45, 0.4728).

Joan Bruna STAT 135: Linear Regression

Transforming Data to Improve the Fit
√
Now that we have figured out that Y ∝ X 0.48 ≈ X , let’s try to
figure out the gravitational constant.
First Idea: Regress Y 2 against X :

Yˆ2 = β0 + β1 X , withβ0 ∈ (−0.3 ± 0.55) , β1 ∈ (0.21 ± 0.01) .

√ p
Remember that Y = α X , with αp = 2/9.8 = 0.452. We thus
obtain a 100(1 − α) CI for α with β1 = (0.45, 0.4728). Can we
do better?
Joan Bruna STAT 135: Linear Regression
Transforming Data to Improve the Fit
The previous transformation amplifies the noise for large X .

Joan Bruna STAT 135: Linear Regression

Transforming Data to Improve the Fit
The previous transformation amplifies the
√ noise for large X .
We can rather try to regress Y against X :

Joan Bruna STAT 135: Linear Regression

Transforming Data to Improve the Fit
The previous transformation amplifies the
√ noise for large X .
We can rather try to regress Y against X :

√
Ŷ = β0 + β1 X , withβ0 ∈ (−0.03 ± 0.13) , β1 ∈ (0.459 ± 0.018) .

Joan Bruna STAT 135: Linear Regression

Transforming Data to Improve the Fit
The previous transformation amplifies the
√ noise for large X .
We can rather try to regress Y against X :

√
Ŷ = β0 + β1 X , withβ0 ∈ (−0.03 ± 0.13) , β1 ∈ (0.459 ± 0.018) .

Now the variance of the residuals is more uniform across

samples.
The corresponding CI for α now is (0.44, 0.47). Better than
before?

Joan Bruna STAT 135: Linear Regression

Transforming Data to Improve the Fit
The previous transformation amplifies the
√ noise for large X .
We can rather try to regress Y against X :

√
Ŷ = β0 + β1 X , withβ0 ∈ (−0.03 ± 0.13) , β1 ∈ (0.459 ± 0.018) .

Now the variance of the residuals is more uniform across

samples.
The corresponding CI for α now is (0.44, 0.47). Better than
before?
Whereas the noise now behaves better, the signal does not.
Joan Bruna STAT 135: Linear Regression
Role of Outliers
GDP per capita and Internet Usage (source worldbank.org)

Looks OK? Joan Bruna STAT 135: Linear Regression

Role of Outliers

Outliers are extreme values that greatly influence the rest of the
fitted model parameters.

Joan Bruna STAT 135: Linear Regression

Role of Outliers

Outliers are extreme values that greatly influence the rest of the
fitted model parameters. The quadratic loss is very sensitive to
such extreme values:
Outliers in X (Monaco). How would β̂ change if we removed
an “extreme” observation (x ∗ , y ∗ ) ?
n
X n
X
(X T X )k,l = xi,k xi,l = xk∗ xl∗ + xi,k xi,l ,
i=1 xi 6 x∗
=

X n
X
(X T Y )k = xi,k yi = xk∗ y ∗ + xi,k yi .
i xi 6=x ∗

Joan Bruna STAT 135: Linear Regression

Role of Outliers

X n
X
(X T Y )k = xi,k yi = xk∗ y ∗ + xi,k yi .
i xi 6=x ∗

Outliers are over-emphasized via the least squares criterion.

Joan Bruna STAT 135: Linear Regression

Role of Outliers

Model misfits. Suppose that one observation (eg Iceland) does not
follow the specified model. Q: How much is it going to degrade
the overall fit?

Joan Bruna STAT 135: Linear Regression

Role of Outliers

Model misfits. Suppose that one observation (eg Iceland) does not
follow the specified model. Q: How much is it going to degrade
the overall fit?
Suppose y∗ = x∗ β + + γ with |γ| ||.

Joan Bruna STAT 135: Linear Regression

Role of Outliers

Model misfits. Suppose that one observation (eg Iceland) does not
follow the specified model. Q: How much is it going to degrade
the overall fit?
Suppose y∗ = x∗ β + + γ with |γ| ||. Then

β̂ = X † Y = X † Ỹ + γX∗†

Joan Bruna STAT 135: Linear Regression

Role of Outliers

Model misfits. Suppose that one observation (eg Iceland) does not
follow the specified model. Q: How much is it going to degrade
the overall fit?
Suppose y∗ = x∗ β + + γ with |γ| ||. Then

β̂ = X † Y = X † Ỹ + γX∗†

The influence of a model misfit depends upon where it

happens.

Joan Bruna STAT 135: Linear Regression

Regression coefficients and Conditioning

We have seen that if the noise is uncorrelated, then

Σβ = σ 2 (X T X )−1 .

Q: What happens when the features xk are correlated to each

other?

Joan Bruna STAT 135: Linear Regression

Regression coefficients and Conditioning

We have seen that if the noise is uncorrelated, then

Σβ = σ 2 (X T X )−1 .

Q: What happens when the features xk are correlated to each

other?
A: The total uncertainty can be measured with the trace
X
σ2 Ci,i , C = (X T X )−1 .
i

Joan Bruna STAT 135: Linear Regression

Regression coefficients and Conditioning

If the features xk are very correlated to each other, the matrix C is

ill-conditioned, meaning that
X
Ci,i
i

is large relative to the norm of X .

Joan Bruna STAT 135: Linear Regression

Regression coefficients and Conditioning

If the features xk are very correlated to each other, the matrix C is

ill-conditioned, meaning that
X
Ci,i
i

is large relative to the norm of X .

Interpretation of regression coefficients is not always a good
idea.
This does not mean regression is unreliable!

Joan Bruna STAT 135: Linear Regression

Prediction

So far, we have concentrated in modeling our observations, ie

explaining the observed variability in Y through linear
combinations of X ’s.

Joan Bruna STAT 135: Linear Regression

Prediction

So far, we have concentrated in modeling our observations, ie

explaining the observed variability in Y through linear
combinations of X ’s.
Suppose now we have estimated a regression model

Ŷ = X β̂ ,

and we observe a new value of features X = x∗ for a new sample.

Q: How to predict the outcome Y∗ ?

Joan Bruna STAT 135: Linear Regression

Prediction

A reasonable estimate for Y∗ is

Yˆ∗ = x∗ β̂ .

Its variance is

var Yˆ∗ = x∗ Σβ̂ x∗T = σ 2 x∗ (X T X )−1 x∗T ,

and we can estimate it by replacing σ 2 with s 2 .

Joan Bruna STAT 135: Linear Regression

Prediction

It is instructive to look at the variance of the predicted outcome as

a function of the number of features p.

Joan Bruna STAT 135: Linear Regression

Prediction

It is instructive to look at the variance of the predicted outcome as

a function of the number of features p.
The average variance across the data-points (x1 , . . . , xn ) is

n
σ2 X
1X
var Ŷ (xi ) = xi (X T X )−1 xiT
n n
i i=1
σ2 σ2
= Tr (X (X T X )−1 X T ) = Tr (Idp×p )
n n
p
= σ2 .
n

Joan Bruna STAT 135: Linear Regression

Prediction

It is instructive to look at the variance of the predicted outcome as

a function of the number of features p.
The average variance across the data-points (x1 , . . . , xn ) is

n
σ2 X
1X
var Ŷ (xi ) = xi (X T X )−1 xiT
n n
i i=1
σ2 σ2
= Tr (X (X T X )−1 X T ) = Tr (Idp×p )
n n
p
= σ2 .
n

So the variance increases with the number of covariates

of the model.

Joan Bruna STAT 135: Linear Regression

Prediction Confidence Intervals

Q: Can we construct a confidence interval for Y∗ ?

Joan Bruna STAT 135: Linear Regression

Prediction Confidence Intervals

Q: Can we construct a confidence interval for Y∗ ?

Recall that the model is

Y = Xβ + = θ + .

We can construct a CI for θ with

q
Yˆ∗ ± s x∗ (X T X )−1 x∗T tn−p (1 − α/2) .

Joan Bruna STAT 135: Linear Regression

Prediction Confidence Intervals

Q: Can we construct a confidence interval for Y∗ ?

Recall that the model is

Y = Xβ + = θ + .

We can construct a CI for θ with

q
Yˆ∗ ± s x∗ (X T X )−1 x∗T tn−p (1 − α/2) .

A CI for Y∗ needs to account for the extra uncertainty from

∼ N (0, σ 2 ):

Joan Bruna STAT 135: Linear Regression

Prediction Confidence Intervals

Q: Can we construct a confidence interval for Y∗ ?

Recall that the model is

Y = Xβ + = θ + .

We can construct a CI for θ with

q
Yˆ∗ ± s x∗ (X T X )−1 x∗T tn−p (1 − α/2) .

A CI for Y∗ needs to account for the extra uncertainty from

∼ N (0, σ 2 ):

An approximate 100(1 − α) prediction interval for Y∗ is

q
x∗ β̂ ± s (x∗ (X T X )−1 x∗T +1)tn−p (1 − α/2)

Joan Bruna STAT 135: Linear Regression

Example
Let us try to model CO2 emissions in France in the last 50 years
from some economic indicators, such as GDP growth, GDP per
capita and Exports of goods and services:

Source: Worldbank.org
Joan Bruna STAT 135: Linear Regression
Example

We attempt to predict CO2 at year t from all the indicators at year

t − 1 and economic indicators at year t:

CO2 (t) = β1 CO2 (t − 1) + β2 GDP(t − 1) + · · · + β7 Exports(t) .

Joan Bruna STAT 135: Linear Regression

Example

We attempt to predict CO2 at year t from all the indicators at year

t − 1 and economic indicators at year t:

CO2 (t) = β1 CO2 (t − 1) + β2 GDP(t − 1) + · · · + β7 Exports(t) .

We form the feature matrix X of size 50 × 7 and the response

vector Y of size 50 × 1. Then

β̂ = (X T X )−1 Y , and
\
GDP(t + 1) = β̂1 CO2 (t) + βˆ2 GDP(t) + · · · + β̂7 Exports(t + 1) .
Also
[ 2
kGDP − GDPk
s2 = .
50 − 7

Joan Bruna STAT 135: Linear Regression

Example
We construct the confidence interval of the predicted GDP as
q
x∗ β̂ ± s (x∗ (X T X )−1 x∗T +1)tn−p (1 − α/2) .

Joan Bruna STAT 135: Linear Regression

Example
We construct the confidence interval of the predicted GDP as
q
x∗ β̂ ± s (x∗ (X T X )−1 x∗T +1)tn−p (1 − α/2) .

Joan Bruna STAT 135: Linear Regression

Example
We construct the confidence interval of the predicted GDP as
q
x∗ β̂ ± s (x∗ (X T X )−1 x∗T +1)tn−p (1 − α/2) .

According to our model, this year’s CO2 emissions for France will
be
5.57 ± 0.78 ( metric tons per capita) .
Joan Bruna STAT 135: Linear Regression
Model Selection

Q: Given a dataset with many covariates, how many should we use

to predict/model a given response with small risk?

Joan Bruna STAT 135: Linear Regression

Image Regression Example
Let us try to predict a pixel value from its Neighbors:

Joan Bruna STAT 135: Linear Regression

Image Regression Example
Let us try to predict a pixel value from its Neighbors:

Model images as locally smooth functions:

X
Yi = βj xj,i + i ,
j∈N(i,δ)

where N(i, δ): Neighborhood of size δ centered at i.

Joan Bruna STAT 135: Linear Regression
Example
We estimate the coefficients β using Least Squares on a given
image xtr and then we test it on a different image xte :

Figure: Left: Estimated Ŷ using δ = 5. Right: Estimated regression

coefficients β̂.

We evaluate the model using the prediction error:

X X X
R(δ) = (Yi − Ŷi )2 = (Yi − β̂j xj,i )2 .
i i j∈N(i,δ)

Joan Bruna STAT 135: Linear Regression

Example
We see what happens as we vary δ:

Joan Bruna STAT 135: Linear Regression

Example
We see what happens as we vary δ:

So, as we make the model bigger (increase δ), the training

error always decrases.
But the prediction error does NOT. Why?
Joan Bruna STAT 135: Linear Regression
Overfitting

Suppose a model

Y = X β + , β ∈ Rp , X ∈ R1×p ,
with E () = 0, Σ = σ 2 Id, and observations (xi , yi ) , i = 1 . . . n.

Joan Bruna STAT 135: Linear Regression

Overfitting

Suppose a model

Y = X β + , β ∈ Rp , X ∈ R1×p ,
with E () = 0, Σ = σ 2 Id, and observations (xi , yi ) , i = 1 . . . n.

Say we pick a subset S ⊂ {1 . . . p} of features:

c
Y = XS β S + XS c β S + ,

and we perform linear regression only using features from S:

Joan Bruna STAT 135: Linear Regression

Overfitting

Suppose a model

Y = X β + , β ∈ Rp , X ∈ R1×p ,
with E () = 0, Σ = σ 2 Id, and observations (xi , yi ) , i = 1 . . . n.

Say we pick a subset S ⊂ {1 . . . p} of features:

c
Y = XS β S + XS c β S + ,

and we perform linear regression only using features from S:

ŶS (x) = xS βˆS , with βˆS = XS† Y .

Joan Bruna STAT 135: Linear Regression

Overfitting

Suppose a model

Y = X β + , β ∈ Rp , X ∈ R1×p ,
with E () = 0, Σ = σ 2 Id, and observations (xi , yi ) , i = 1 . . . n.

Say we pick a subset S ⊂ {1 . . . p} of features:

c
Y = XS β S + XS c β S + ,

and we perform linear regression only using features from S:

ŶS (x) = xS βˆS , with βˆS = XS† Y .

Q: What is the bias and the variance of ŶS (x) ?

Joan Bruna STAT 135: Linear Regression

Bias-Variance Trade-off

The bias of ŶS is

E ŶS (x) − xβ = xS E βˆS − xβ
c
= xS XS† (XS β S + XS c β S ) − xβ
c c
= xS β S + xS (XS† XS c )β S − xS β S − xS c β S
c
= xS (XS† XS c ) − xS c β S .

Joan Bruna STAT 135: Linear Regression

Bias-Variance Trade-off

The bias of ŶS is

E ŶS (x) − xβ = xS E βˆS − xβ
c
= xS XS† (XS β S + XS c β S ) − xβ
c c
= xS β S + xS (XS† XS c )β S − xS β S − xS c β S
c
= xS (XS† XS c ) − xS c β S .

The variance of ŶS (x) is

var ŶS (x) = σ 2 xST (XST XS )−1 xS .

So, as |S| increases,

The bias of YˆS (x) decreases,
but its variance increases.
Joan Bruna STAT 135: Linear Regression
Bias-Variance Trade-off

So, how to pick a good trade-off?

Joan Bruna STAT 135: Linear Regression

Bias-Variance Trade-off

So, how to pick a good trade-off?

Remember, we want to optimize the prediction error, or test error,
of the model, evaluated at the observed data-points:
n
X
R(S) = E (ŶS (xi ) − Yi∗ )2 = E kŶS − Y ∗ k2 ,
i=1

where Yi∗ is a future observation at data-point xi .

Joan Bruna STAT 135: Linear Regression

Bias-Variance Trade-off

So, how to pick a good trade-off?

Remember, we want to optimize the prediction error, or test error,
of the model, evaluated at the observed data-points:
n
X
R(S) = E (ŶS (xi ) − Yi∗ )2 = E kŶS − Y ∗ k2 ,
i=1

where Yi∗ is a future observation at data-point xi .

We could think of looking at the expected residual error (ie,
training error) as a guide:

Xn
2 2
E R̂tr (S) = E (ŶS (xi ) − Yi ) = E kŶS − Y k .
i=1

Joan Bruna STAT 135: Linear Regression

Bias-Variance Trade-off

It turns out that the training error is a biased estimator of the test
error:
Theorem

E R̂tr (S) = R(S) − 2Tr (ΣŶ ,Y ) .

Joan Bruna STAT 135: Linear Regression

Bias-Variance Trade-off

It turns out that the training error is a biased estimator of the test
error:
Theorem

E R̂tr (S) = R(S) − 2Tr (ΣŶ ,Y ) .

Remarks:
The data is being used twice: to fit the model and then to
estimate the risk.

Joan Bruna STAT 135: Linear Regression

Bias-Variance Trade-off

It turns out that the training error is a biased estimator of the test
error:
Theorem

E R̂tr (S) = R(S) − 2Tr (ΣŶ ,Y ) .

Remarks:
The data is being used twice: to fit the model and then to
estimate the risk.
The Cross-covariance between Ŷ and Y increases as the
model becomes more complex.

Joan Bruna STAT 135: Linear Regression

Bias-Variance Trade-off

It turns out that the training error is a biased estimator of the test
error:
Theorem

E R̂tr (S) = R(S) − 2Tr (ΣŶ ,Y ) .

Remarks:
The data is being used twice: to fit the model and then to
estimate the risk.
The Cross-covariance between Ŷ and Y increases as the
model becomes more complex.
How to estimate the risk more reliably, ie how to choose the
best model size?

Joan Bruna STAT 135: Linear Regression

Cross-Validation
Rather than using data twice, we can organize it differently:

Joan Bruna STAT 135: Linear Regression

Cross-Validation
Rather than using data twice, we can organize it differently:

Joan Bruna STAT 135: Linear Regression

k-fold Cross-Validation
Why not repeat with different splittings to improve the risk
estimate?

Joan Bruna STAT 135: Linear Regression

k-fold Cross-Validation
Why not repeat with different splittings to improve the risk
estimate?

Joan Bruna STAT 135: Linear Regression

k-fold Cross-Validation
Why not repeat with different splittings to improve the risk
estimate?

1 X
R̂ = R̂k .
K
k≤K

Joan Bruna STAT 135: Linear Regression

Logistic Regression

In many situations, we are naturally interested in predicting a

binary (or categorical outcome).

Joan Bruna STAT 135: Linear Regression

Logistic Regression

In many situations, we are naturally interested in predicting a

binary (or categorical outcome).
Patient has diabetes or not.
Handwritten digit is in {0, . . . , 9}.
Message is Spam or not.
...

Joan Bruna STAT 135: Linear Regression

Logistic Regression

In the simple binary setting, observations (Xi , Yi ) are modeled as

Bernoulli trials:
Yi |Xi ∼ Bern(p(Xi )) .
Q: How can we model the dependency between Xi and p(Xi )?

Joan Bruna STAT 135: Linear Regression

Logistic Regression
We can use the logistic model:
T
eβ x
p(x, β) = P(Y = 1|X = x) = .
1 + e βT x

Joan Bruna STAT 135: Linear Regression

Logistic Regression
We can use the logistic model:
T
eβ x
p(x, β) = P(Y = 1|X = x) = .
1 + e βT x

et
The function f (t) = is the logistic function:
1 + et

Joan Bruna STAT 135: Linear Regression

MLE of Logistic Regression
Given observations (xi , yi ), i = 1, . . . , n, the likelihood of the
model is
n
Y
lik(β) = p(xi , β)yi (1 − p(xi , β))1−yi
i=1
n T
!yi 1−yi
Y e β xi 1
=
i=1
1 + e β T xi 1 + e β T xi
n T
Y e yi β xi
= ,
i=1
1 + e β T xi

So the log-likelihood becomes

n
Tx
X
`(β) = yi β T xi − log(1 + e β i
).
i=1

Joan Bruna STAT 135: Linear Regression

Solving Logistic Regression

Q: How to obtain the MLE β̂?

Joan Bruna STAT 135: Linear Regression

Solving Logistic Regression

Q: How to obtain the MLE β̂?

Let’s start by computing the gradient of `(β):

n T
X xi e β xi
∇`(β) = xi yi −
i=1
1 + e β T xi
n
X
= xi (yi − p(xi , β)) .
i=1

Joan Bruna STAT 135: Linear Regression

Solving Logistic Regression

Q: How to obtain the MLE β̂?

Let’s start by computing the gradient of `(β):

n T
X xi e β xi
∇`(β) = xi yi −
i=1
1 + e β T xi
n
X
= xi (yi − p(xi , β)) .
i=1

Setting ∇`(β) = 0 results in a system of p non-linear

equations.

Joan Bruna STAT 135: Linear Regression

Solving Logistic Regression

Q: How to obtain the MLE β̂?

Let’s start by computing the gradient of `(β):

n T
X xi e β xi
∇`(β) = xi yi −
i=1
1 + e β T xi
n
X
= xi (yi − p(xi , β)) .
i=1

Setting ∇`(β) = 0 results in a system of p non-linear

equations.
No closed form solution for β̂.

Joan Bruna STAT 135: Linear Regression

Solving Logistic Regression

Q: How to obtain the MLE β̂?

Let’s start by computing the gradient of `(β):

n T
X xi e β xi
∇`(β) = xi yi −
i=1
1 + e β T xi
n
X
= xi (yi − p(xi , β)) .
i=1

Setting ∇`(β) = 0 results in a system of p non-linear

equations.
No closed form solution for β̂.
We need to rely on iterative methods!

Joan Bruna STAT 135: Linear Regression

The Newton Algorithm

Iterative scheme from the 17th Century.

Joan Bruna STAT 135: Linear Regression

The Newton Algorithm

Iterative scheme from the 17th Century. If f is a differentiable real

function, we can find a solution for f (t) = 0 iteratively via

f (tn )
tn+1 = tn − .
f 0 (tn )

Joan Bruna STAT 135: Linear Regression

The Newton Algorithm

Iterative scheme from the 17th Century. If f is a differentiable real

function, we can find a solution for f (t) = 0 iteratively via

f (tn )
tn+1 = tn − .
f 0 (tn )

In our setting, we obtain

−1
∂ 2 `(β)

n+1 n
β =β − ∇`(β) .
∂β∂β T
∂ 2 `(β)
The matrix is called the Hessian of `.
∂β∂β T

Joan Bruna STAT 135: Linear Regression

Iterative Reweighted Least Squares

If we define the vector

p = (p(x1 , β), p(x2 , β), . . . , p(xn , β)) ,

we have
∇`(β) = X T (y − p) ,
and
∂ 2 `(β)
= −X T WX ,
∂β∂β T
with W a diagonal matrix Wi,i = p(xi , β)(1 − p(xi , β).

Joan Bruna STAT 135: Linear Regression

Iterative Reweighted Least Squares

The Newton step thus becomes

β n+1 = β n + (X T WX )−1 X T (y − p)
= (X T WX )−1 X T W (X β n + W −1 (y − p))
= (X T WX )−1 X T Wz ,

with z = X β n + W −1 (y − p).

Joan Bruna STAT 135: Linear Regression

Iterative Reweighted Least Squares

The Newton step thus becomes

β n+1 = β n + (X T WX )−1 X T (y − p)
= (X T WX )−1 X T W (X β n + W −1 (y − p))
= (X T WX )−1 X T Wz ,

with z = X β n + W −1 (y − p).

Each Newton step is a reweighted least squares step.

At each iteration p changes, since it depends upon β.
We can initialize the algorithm with β = 0.
In R, the package glmnet implements this algorithm.

Joan Bruna STAT 135: Linear Regression

Asymptotic Properties of regression coefficients

Since β̂ obtained by IRLS approximates a the MLE,

asymptotic MLE theory tells us that if the model is correct,
then β̂ is consistent:

β̂ → β (n → ∞).

Joan Bruna STAT 135: Linear Regression

Asymptotic Properties of regression coefficients

Since β̂ obtained by IRLS approximates a the MLE,

asymptotic MLE theory tells us that if the model is correct,
then β̂ is consistent:

β̂ → β (n → ∞).

Moreover, the distribution of β̂ converges to

N (β, (X T WX )−1 ) .

We will use this approximation to do inference on β.

Joan Bruna STAT 135: Linear Regression

Example: South African Heart Disease

We consider an example from (Hastie& Tibshirani).

Aim of the study: establish the intensity of heart disease factors in
rural Western Cape, South Africa. Response variable is the
presence or absence of myocardial infarction.

Joan Bruna STAT 135: Linear Regression

Example: South African Heart Disease
0 10 20 30 0.0 0.4 0.8 0 50 100

220
o o o o o oo o o oo
oo ooo o
o oo o o
o ooo o
o ooo oooo o
o o
o
o ooo
ooo o ooo o o o o oo o ooo o
o o oo oo oo ooo
ooo oo ooo ooo
ooooo oooooo oo o o ooooooo ooooooo oooo o
oooo o o o oooooooo oooooo o
oooooooo
oo o ooo
oo
ooooooooooo
ooooooo o oo o oooo oooooooooo o oo
ooo oo oooo o oooo
oooooo
oo
ooo

160
o oooooo oo oooo oooo ooooo o o o oo
oo oooo ooooo oo ooooo o o ooooooo oooooo
sbp o
o o
ooo o o o ooo o ooo o ooooooo
o o oo ooooooo oo ooooooo o o
oooooooo oo o oooooo o o o
ooo
oooooo oo o o oo oo
oo
o oooo
o
oo
oo
ooooo
ooo ooooo ooo o oooooooooo o
ooooo ooooooooo
o ooo
ooo oooo
oooooo
oooo oooo
ooooo o
o o
oo
oo ooo
o
oo ooo
ooooo
oo oo
oo
o
oo ooooooo
oooooooooo ooooooooo
oo oooo o oo
ooooooo ooooo oo o o
o oo o o oooooooooo ooooooooooo
oo o
ooooo
oo ooooooo
o
oooo ooo
ooooo ooo
ooo oo ooo
ooooo oo
o
oo
o o o
oooo
o o
o o ooooooo
oo o oo o
oooooooo oo
o
oo
o
o
oo o
o o
oo ooooo ooo
o o ooo o
ooooo o o
o oo
oo
o o
o ooo
o o o oo o
o o oo oo
ooooo
o o
ooooooo
oooo ooooooo
o oooo
oooo oo
ooo ooo
o o o
o o
oo ooo
o
o o o o o o o
o
o o oo
o o
ooo
oo ooooo
o o ooooo
oooooo
oooooo
ooo oooo
ooo
oooooo
ooooo ooooooo
oo ooo oo
oo ooo
o ooooo o oooo oooooooooooooo ooo ooo o ooo oo oooooo
oooo ooo o o
oooooooo ooooo ooooo
ooooo
oooo ooooooo ooo o
oo ooo
o ooo oooo ooo oooo
o
o
ooo o oooooo o oo oo oo o o
oo o
ooo
o o o
o o oo
oo
o
o
oo
oo
ooo
o o
ooooooo oo ooo o o
o oo
o o
ooo o
ooo ooooo
oo oooooooooo ooo
oo oo
ooooo
o
oo oo
oo ooo
oooo
o
oo o oooooo ooo o oooo
ooo o o o
oooo oooo o oo
o o o oooooo o oooo
oo o
o ooooo o
ooooooo o o o o o oooo o oo
oooooooooooooooooo o o o oo o o
ooooo o
o
ooooooo oooo ooooo o o oo o
ooooo o
oooooo
o o o o
oooo o o oo o o oo o o
o o o oo o oooo o o o o oo

100
oooo oo ooo
o o o ooo oooo o o
o

30
o o o o o
o o o o o
o o o
o o o o
20 oo oo o o o
o ooo o ooo o ooooo o ooo o o o
o ooo
o
o oo oo o tobacco ooooo o oo
o ooo oooo o
o o o o o o oo
o oooo o ooooooo oooooo oooo ooo o oooo
o oo ooo oo ooooo ooo
oo oooooo
oooooooo
oo o
oooo ooo oo o
o
o ooooooooooooo o o
ooooooooooo o oooo
o oo oo o o o ooo oo o
10

ooooooo o o o o oo oooo
o ooooooo ooooooo o oo oo o o o
o o o o o o o o o
o o o
ooooo ooooo ooooo
o oooo oooooooo oo ooooo oo ooooooooo
oo oo ooo
o
o o
o
o
o ooo
oo o o
oooo oooo oooo
o oooo
o
ooooo
oooooo oooooo o o
o ooo ooo
o
ooo o
o oo
ooooo oo o
oooooooo
ooo oooo
oo oo o o
oooooo
oooo
o
ooooooo o
ooo ooo
o o
oo oo
oo
oo
o oooooo ooooooo
ooooo ooooooooo o o o o o oo
o oo
o oo
oo o ooooo
oo
oooooo oooo
ooo
oo ooooo
o o ooo
oooooooo
oo oo
oo oo oo ooo o
oo oooo oo ooo ooooooo
o oo oo oooooooo oooooo
o oo oo o o
o oooo
o o oo ooo
oo
oooo
o
oooooo oooo
o
o o o oooo
oooo ooooo
ooo oo
oo
oo
o
oo o
oooo ooo
oo oo o o
oo o ooo
oo o oooooooooooo oooooo
oo
ooo o oo oooooooooooooooo o
o o o
ooo o
ooooo
ooo
o o
oooo
ooo
oooo ooo ooo o o
ooooooooooooo o o oooooo o
ooooooooo o
o
ooo oooo
ooo o o
oooooo o o
ooo
o o
oooooooo oo o ooo o
oooo
oo o o
ooooooo
oo
oo
ooooooo
oo ooooo
oo
ooo oo
ooo oo
ooo
o oo ooo
oooo ooooooooooooooo ooo oo o oo
ooo
oooo
oooo oo
oo
o ooo ooooo
oo ooooooooooooooooooo oo
o oo ooo oooooooo ooooooo o o
ooooooo
oo oo
ooo oooo o oo
ooo
oo o o
oo ooooo ooooooo ooooooo ooo oo oooooooooooo ooo
oooooooooooooooooooo oooo
0

o
oo o o o oo ooo o o o
o o o o o o

10 14
o o o o o o
ooo o o oooo
o o oooo o oo oo oo o o ooo o o
o oo o oo oooo oo oo o ooo oooo o o
o o oo
ooo oo
o ooooooo ooooo
o ooo o oooo oooo oooooo o
o
oooooooo o
o
o oo o
ooooo ooo o o o o o o oo ooo
ooo
o ooo
ldl ooo ooo o o ooo ooooooooo o o oo oo o oo ooooooooooooooooooooooo
oooo o oo
oooooo o oo o
ooooooooooo oo o oo oooo
oooooooooo ooo o o
ooo o o oo oooooooooo
o ooo oo
o
oo
oooo o o o ooooooooo o o ooooooo o
ooooooooooo ooo o oooooooo ooooooo
oooo ooo oo
ooooooo
o
oo
o oooo ooo ooooo o oo
oooooo
oooo o ooo
o oooo ooooo
oooo o ooooooooooo oooooooo oo oooooo oo
oooooooooooo ooooooo

6
o o o o o ooooooooo oo oo o
o o o o
o o oo oooo o oo
o o o oo oooooo ooooooooooo o o oo oo
ooo o
oo
oo o
ooo
o
ooo
ooooooo o o o o ooo
ooooooooooooo
ooooooooo
oooo oo o o o o oo
o ooo
oooooooooooo
oooo oooo oo o o o o o o
o o ooo
oo o o
ooo
o o
ooooo oo o
oooo
ooo o o
o ooo oo oo o
oooo ooo
o oo oo
oo
ooo ooo
ooooo
ooooo oo o o oooo ooooooooooo ooooooooooo
o ooo
oooooooooo o
oo o o ooo oo
o oo
oo oo
ooo
ooooo oooo
o oo
o
oo
oooo oo
oo
o ooo oo ooo o ooooo
oo ooo
oo oooooooooo
oo oooooooo
oo o o o o
oo
o o
o o
oo
o oooo
o oooooooo
oo
oo
oooo
oo ooo oo
oo
ooooooooo
oo oo ooo o
oo o
oo
oooo
oooo oooooooo
oooooooo ooo ooo
ooooooooo
oooo oooooo
o oo oooo
oooooooooo ooooooo
o oooo ooooooooo ooooo
oooooo oooo
oo oooo ooo ooooooo ooo o oooooooooo
ooooooooooo
o oo
ooo
ooo
o o
o o
ooo oooo
oooo ooo
oooooo
o oo oo ooooooo oooo oo o o o o o
ooo o ooooooo oo o o o oo oo
ooooooooooo
oo oooooo oooooooooo ooo ooo
o o
o oooooooo ooo oo

2
ooooooo oo o o o oo o o
o
oooooo
ooo
oooo oo
ooooo
oooo ooooooo
ooooo oooooooooooooooooooo ooo
o oo
ooo
ooo
o oooooo
oo
oo oooooo
oooo oo
ooooooooooooo o o ooooo
ooo oo
oo ooooooo
oooo
o oo
oooo
oooooooooooo oooo
ooo
oo
o o oo oooo
oo
oo
oo ooooo
oo
oooo
oo oooooo
oo
ooo
ooo
oooo oooo
ooooo ooooooooo ooooo
o
oo oooooooo oooooooooooooooooo ooooo oo oo oooo
oooooooooooo ooooo
ooooo oooooo
oo
ooooooooooo
oo oooooooooooooo
oo ooo
oo oooo
oo
0.8

famhist
0.4
0.0

45
o
o o oo oo oo o oo
o o oo oo ooo ooo o oo o oo oo o
oooooo o o
oo oooooo oooo oooo oooooooo o oo

35
o o o o oo o oo ooooo o oo o o o o o o
ooo o ooooooooooooo o
oo ooo
o oo o oooooooo o ooo ooo
oooooooooooooo ooooo
o o
oo
oo
o ooo oo oooooo oo ooooo ooooooooooooooo o o oooo
oooo oo
oo
oooo
oo o ooooooo ooo ooooooo o o oo ooo o oo oooo ooo
oo o o o o oo
ooooooo o o
oo o o obesity o o o o
oo ooooo o ooooooo oo
oooo o ooooooo oo oo oo ooooooooo
ooooo oooo
ooooo o oooo o oooo ooo
ooo oo
o ooooooo oo
ooo ooo ooo oooooooo oooooooooooo o o
o oo oo
ooo o oo
o
oo ooo
oooooo oooooo
ooooooooooooooo
o o oo
oo o o
o
oo
ooo o
oo
oooooooooo
o
oooooooo
o
oo
oooo o o oooooooooooo oooooooo
o oooooo oooooooooo o
ooooo
ooooo
o oo oo oooo
oo ooooo oo o
oooo ooo o oo o o o
oo ooo ooooooo ooooo o ooo
oo o
oo oooooo
ooooooo o o o oooooo o
ooo o
ooo o
ooooo oooo ooooooooooooooooo
o oo oooooo o oo oo
oo
o oooo
ooooo
ooooo ooooo
o ooo
oo ooo

25
oo o oo o ooo o o oooooo o oo
oo ooooo oooo ooooooo ooo ooooooo
ooo ooooo
ooo
oooo o
o oooo
o
o oo
ooooo oooo
oo oooooooooooo
o ooooooo o o o o
oo o
ooooo ooo
oooooo
o oo oo
o
oo oooo o ooo oooo
oo
o
o
oo
o
ooo oo
oo
oooo ooo oo
o oo
ooooooooo oo o
oo o
oo oo oo
o oooo o ooo oooooooo
oooooooooo o o o oooo o
oo
oooooo
oooooooo
ooooooooo
oo
oooo
o
ooooooooo
oooooooo
ooooooo ooo oo ooo oooo ooo
o ooooooooooo o o ooooo oooo oooo
o
oooooo
ooooooooo
ooo oooooooo o
oooo ooooo o oo
o oo
o oo
ooooo ooo
oooo
ooo ooooo ooo oo oo ooooo o
oooooo
ooooooooooo oooo oo o oooooooooo
oooooooo o o o
o o o
ooo
o oooo ooooo oooo
oo oo

15
o o o o o o
ooo o oo o oo o oo o o o
o o o oo o o
oo o
100

o o oo o o o ooo o o o
oo ooooo o o o o o o ooo ooo oo oooo oo o ooo oo o oo o
ooooo ooooo oo oo oo oooooo o ooooooooo oo o o
o
o ooo oooooo
o o
o alcohol oo ooo oooooo ooooo
oo oo oooo o ooo o o oo
oooooooo oo o oooooooooooooooo o o
ooo
o o
oo o oooo oooo ooo oo o o
oooooo oo oo oooooooooooo
50

oooo oooooooooooo o oo o ooo o

o o oo ooooo
o o o o
oooooooooooooooooo oo o o o o o
o
ooooo
ooo o ooo o oooo o oo
ooooooooo ooo
ooooooooooo oo o
o
oo o oo o o ooo oooo oo
oooooooo ooo
o o ooooooooooooooooooo ooo
ooo
ooo
oooooo oooooooooooooooo o oooo ooooo
oo oo oooo ooooooo o oo oooooo oooooooooooo oooo
o ooooooo oo oo oooooo
oo o ooo
ooo o oooo
oooooooooooooo ooo oooooo
oo
o oo
ooo
oooo ooo oooo
ooo
oo ooo oo ooooooooo
ooooooo oooo o oo oooooo oooooooo ooooooooooooooo o oooo ooooo oo
oo
o o oooo
ooooo oo oooo o o
ooo oo
o
oo o o oo
o o o o oo oo oo oooooooo oo ooooo
oooooooooo oo
ooo oo ooo
oooooo oooo
oooo oooooooo
ooooo oooo oo
oo
ooooooooo
ooo o ooo
oo
o
ooo
oooo
ooooooooo oo
ooo
oo
ooooooooo ooooooooo
o o ooo
oo ooooo oooo
oo ooooooooooooo o oo
oooo oo
ooooooooooooo
oo oo
o ooooo
oooo ooooo
oo oo
oo ooooooooooo o
oo ooo
oo o
o o
ooo
oo o
o o oooo
oo oo
o
oooo
oo
oo
o
oooo
ooo
ooo
oo
o
o
oooooo
oo oo o ooooo
ooooooo ooo o
oo ooooooooooo
ooooo o oo
oooooooooooooo
oo
oo oo
oo
oo
o
oooooo
oo
o
ooo ooo
oooo oo
oooo
oooo oooooo
oooo ooooo
oo
oooo
oo
oooo
0

ooo o o o oo
o ooo oooo o o o o o o o o o oo o o o o o o oooo o oooo oo oooo ooo o oooooooo o ooo ooo o o oo ooooooo oooo oo oooo oooo oooooo o o
o o o o o o o o o o o o o o o
o o oooooo o o o o o o o o o o o oo oo

60
o oo o oooooo oooo ooooooooo o o
oooooooo
oo
oo
oooooo
ooooo
oooooo oooo
oooo oo oo
oooo o ooo o oooo ooooooooo oo ooooooooo o oo oo o oo oo
oo oo o
ooooo oo
ooo ooo ooo o oo oooo o
oooooo oooooooooo oo
o ooooo oo oo
oo oooo ooo
oooo ooo
ooo o
ooooo
ooo
oooooooo
oo
oo oo oo o
ooooooooo oo oo
oo o o o o o oo o o o o
oo o
o o o o o o o o
o o ooo oooo o
o oo ooooooooooooo ooooo oo
ooooooooooo o oo
o o o
o oo o o ooooo ooooo oo oooo ooooo o o ooo ooooooo
oooooooooo
oo
oooo
o
ooooooooooooo ooo o o o o
o o oooo
o ooo o oo
ooooo ooo
o
oooooooooooooo o o
ooo ooo
ooo o ooooooo ooo oo
ooo ooooooooo oo oooooooo ooo oo o
oooooo
o oooooo
ooooooooo
ooooo o
ooo o oo ooooo oo
oooo oooooooooooo o ooooo
ooooo oooooo o
ooo
ooo oo
ooo
o oo
o
o oo o oo ooo
ooooooooooooooo o
o o o oo oooo oo
ooooooo ooo
oo ooo o ooo
ooo
oo oooo oooo oo o o oooo oooo oo oo oo o oooo
oo oooooo o
oooo o oo o
oo ooooo o o o
o oo ooo
oo o
ooooooo ooo
oo oo o

40
ooooo oo ooooooooooo ooooooo oo
o oo o ooo oooooo o ooo ooooooo oo
o ooo ooo
ooooo o
o o oo
o oo ooo
o oo
oo o oo
ooooooooo oooooooo oo oo o oo oo
oooo ooooo o oo age
ooo oo oooo
o
oooooooooo oooo ooo o o o
oo o o o ooooooooooo
ooooo
ooooooo oo o
oooooooo oooooo oo oooo
o o
oooooooooo oooo
o
oooo
o oo ooo oo oo o ooo ooooo o ooooo o oooo
ooooooooo
o o o oooo ooo
oo
ooooooo oo o
ooo ooo
ooo o o o
ooooooooo o o o o o
o
o o oo
ooo o o
oooo o o oooo ooooooo o oo oo o oo ooooo oooooooo oooooo o
oo ooooo oooooooooo oo o
ooo o o o oooo o oo oo o ooooooooo o o oo o

20
oooo ooo
o oooo oo oo
o ooooo ooooooooooooo ooo oo oo o o o o o oooo
ooooo o
oooo ooooooooo ooooo o
o oo ooooooooooooo o
o ooo o
oo ooooo
ooooo
oo ooooooo oo o oo oo
100 160 220 2 6 10 14 15 25 35 45 20 40 60

FIGURE 4.12. A scatterplot matrix of the South African heart disease data.
Each plot shows a pair ofJoan
risk Bruna
factors, and STAT
the cases and
135: controls
Linear are color coded
Regression
Example: South African Heart disease

We fit a logistic regression model using IRLS

β̂ std(
c β̂)
(intercept) -4.13 0.964
sbp 0.006 0.006
tobacco 0.080 0.026
ldl 0.185 0.057
famhist 0.939 0.225
obesity -0.035 0.029
alcohol 0.001 0.004
age 0.043 0.010
(results from Hastie&Tibshirani)

Q: Are these numbers statistically significant?

Joan Bruna STAT 135: Linear Regression

Inference About Regression Coefficients

The Z -score is simply the ratio

βˆk
.
c βˆk )
std(

Joan Bruna STAT 135: Linear Regression

Inference About Regression Coefficients

The Z -score is simply the ratio

βˆk
.
c βˆk )
std(

Asymptotic normality means that if n is large, then

βˆk − βk
∼ N (0, 1) .
c βˆk )
std(

Joan Bruna STAT 135: Linear Regression

Inference About Regression Coefficients

The Z -score is simply the ratio

βˆk
.
c βˆk )
std(

Asymptotic normality means that if n is large, then

βˆk − βk
∼ N (0, 1) .
c βˆk )
std(

The Wald Test tests the null hypothesis βk = 0. Reject the null
hypothesis if
|βˆk |
Zk = ≥ z(1 − α/2) .
c βˆk )
std(

Joan Bruna STAT 135: Linear Regression

Back to Example

We compute the Z -scores:

β̂ std(
c β̂) Z -score
(intercept) -4.13 0.964 -4.28
sbp 0.006 0.006 1.023
tobacco 0.080 0.026 3.034
ldl 0.185 0.057 3.22
famhist 0.939 0.225 4.178
obesity -0.035 0.029 -1.18
alcohol 0.001 0.004 0.136
age 0.043 0.010 4.184
(results from Hastie&Tibshirani)

Blood pressure and obesity are not significant. Why?

Joan Bruna STAT 135: Linear Regression

Back to Example

We compute the Z -scores:

Blood pressure and obesity are not significant. Why?

Moreover, obesity correlates negatively with heart disease?

Joan Bruna STAT 135: Linear Regression

Extension to Multi-Class Regression

What if we need to perform a multi-class categorization?

Examples:
Different mutations.
Image classification.

Joan Bruna STAT 135: Linear Regression

Extension to Multi-Class Regression

What if we need to perform a multi-class categorization?

Joan Bruna STAT 135: Linear Regression

Extension to Multi-Class Regression

What if we need to perform a multi-class categorization?

Examples:
Different mutations.
Image classification.
Replace the Bernouilli model with a Multinomail model with K
classes:
P(Yi = k|Xi ) = θk (Xi ) , (k = 1, . . . K ) ,
X
with θk ∈ [0, 1] and θk = 1.
k
The Softmax function is the generalization of the logistic function:
T
e βk x
θk (x) = PK .
βkT x
j=1 e

Joan Bruna STAT 135: Linear Regression

Extension to Multi-Class Regression

What if we need to perform a multi-class categorization?

...but that is another story!

Joan Bruna STAT 135: Linear Regression

Cross Impact Presentation Victor Le Coz
No ratings yet
Cross Impact Presentation Victor Le Coz
35 pages
Linear Prediction of Speech: D. Markel A. H. Gray, JR
No ratings yet
Linear Prediction of Speech: D. Markel A. H. Gray, JR
299 pages
Meta-Analysis of Test Accuracy Studies in Stata - V1.1 April 2016
No ratings yet
Meta-Analysis of Test Accuracy Studies in Stata - V1.1 April 2016
30 pages
3 Multiple Linear Regression: Estimation and Properties: Ezequiel Uriel Universidad de Valencia Version: 09-2013
100% (1)
3 Multiple Linear Regression: Estimation and Properties: Ezequiel Uriel Universidad de Valencia Version: 09-2013
37 pages
Koch I. Analysis of Multivariate and High-Dimensional Data 2013
100% (17)
Koch I. Analysis of Multivariate and High-Dimensional Data 2013
532 pages
StatLearning2r PDF
No ratings yet
StatLearning2r PDF
267 pages
STAT3006: Tutorial 1: Sample Solutions
No ratings yet
STAT3006: Tutorial 1: Sample Solutions
10 pages
Convexity Examples: CE 377K Stephen D. Boyles Spring 2015
No ratings yet
Convexity Examples: CE 377K Stephen D. Boyles Spring 2015
11 pages
10th International Probabilistic Workshop
No ratings yet
10th International Probabilistic Workshop
398 pages
Lavaan
No ratings yet
Lavaan
104 pages
Regression With One Regressor-Hypothesis Tests and Confidence Intervals
100% (1)
Regression With One Regressor-Hypothesis Tests and Confidence Intervals
53 pages
Stress Testing and Risk Integration in Banks: University of Passau
No ratings yet
Stress Testing and Risk Integration in Banks: University of Passau
53 pages
Convex Functions and Optimization
No ratings yet
Convex Functions and Optimization
20 pages
Paper 10
No ratings yet
Paper 10
21 pages
A Step-By-Step Guide To The Black-Litterman Model Incorporating User-Specified Confidence Levels
No ratings yet
A Step-By-Step Guide To The Black-Litterman Model Incorporating User-Specified Confidence Levels
34 pages
A Practioners Toolkit On Valuation
No ratings yet
A Practioners Toolkit On Valuation
6 pages
Convexity and Differentiable Functions: R R R R R R R R R R R R R R R R
No ratings yet
Convexity and Differentiable Functions: R R R R R R R R R R R R R R R R
5 pages
VAR, SVAR and SVEC Models: Implementation Within R Package Vars
No ratings yet
VAR, SVAR and SVEC Models: Implementation Within R Package Vars
32 pages
ch12 0
No ratings yet
ch12 0
82 pages
Pca PDF
No ratings yet
Pca PDF
33 pages
Stationary Stochastic Process
No ratings yet
Stationary Stochastic Process
47 pages
Investment Analysis and Portfolio Management: Gareth Myles
No ratings yet
Investment Analysis and Portfolio Management: Gareth Myles
29 pages
2.linear Regression
No ratings yet
2.linear Regression
49 pages
C++ Armadillo Specifications
No ratings yet
C++ Armadillo Specifications
15 pages
Multiple Regression Model - Matrix Form
No ratings yet
Multiple Regression Model - Matrix Form
22 pages
Educational Mismatch in Recent University Graduates The Role of Labour Mobilityjournal of Youth Studies
No ratings yet
Educational Mismatch in Recent University Graduates The Role of Labour Mobilityjournal of Youth Studies
24 pages
Defining Model 1 (Null Model) With PASW Menu Commands: Models: Specified Subjects and Repeated
No ratings yet
Defining Model 1 (Null Model) With PASW Menu Commands: Models: Specified Subjects and Repeated
18 pages
Package Amelia': February 19, 2015
No ratings yet
Package Amelia': February 19, 2015
23 pages
Investment Analysis and Portfolio Management: Gareth Myles
No ratings yet
Investment Analysis and Portfolio Management: Gareth Myles
44 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
HW 03 Sol
No ratings yet
HW 03 Sol
9 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Investment Analysis and Portfolio Management: Gareth Myles
No ratings yet
Investment Analysis and Portfolio Management: Gareth Myles
32 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
Model Qpaper Ge
No ratings yet
Model Qpaper Ge
10 pages
MIT18 650F16 Regression
No ratings yet
MIT18 650F16 Regression
44 pages
Smart Tissue Anastomosis Robot (STAR) : Accuracy Evaluation For Supervisory Suturing Using Near-Infrared Fluorescent Markers
No ratings yet
Smart Tissue Anastomosis Robot (STAR) : Accuracy Evaluation For Supervisory Suturing Using Near-Infrared Fluorescent Markers
6 pages
Lecture 06
No ratings yet
Lecture 06
55 pages
Investment Analysis and Portfolio Management: Gareth Myles
No ratings yet
Investment Analysis and Portfolio Management: Gareth Myles
26 pages
Investment Analysis and Portfolio Management: Gareth Myles
No ratings yet
Investment Analysis and Portfolio Management: Gareth Myles
26 pages
02 - Basic Perf Measures
No ratings yet
02 - Basic Perf Measures
14 pages
Proving Things: 1 Using and Proving Implications and Equivalences
No ratings yet
Proving Things: 1 Using and Proving Implications and Equivalences
5 pages
Investment Analysis and Portfolio Management: Gareth Myles
No ratings yet
Investment Analysis and Portfolio Management: Gareth Myles
20 pages
Dynamic Linear Models, Recursive Least Squares and Steepest-Descent Learning
No ratings yet
Dynamic Linear Models, Recursive Least Squares and Steepest-Descent Learning
11 pages
UMVUE Statmat 2 2022
No ratings yet
UMVUE Statmat 2 2022
43 pages
1 - GDE Presentation PDF
No ratings yet
1 - GDE Presentation PDF
23 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Linear Regression
No ratings yet
Linear Regression
56 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Topic 7 Linear Regreation CHP14
No ratings yet
Topic 7 Linear Regreation CHP14
21 pages
6th Lecture Note 108335647 230518 203102
No ratings yet
6th Lecture Note 108335647 230518 203102
35 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
63 pages
Chapter 2: Properties of The Regression Coefficients and Hypothesis Testing
No ratings yet
Chapter 2: Properties of The Regression Coefficients and Hypothesis Testing
16 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Promoting The Green Economy at Country Level: An Example in Vietnam
No ratings yet
Promoting The Green Economy at Country Level: An Example in Vietnam
8 pages
Valuation2 Oct6014
No ratings yet
Valuation2 Oct6014
7 pages
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
No ratings yet
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
60 pages
CH 12 Sol
No ratings yet
CH 12 Sol
5 pages
Review Lecture
No ratings yet
Review Lecture
44 pages
Performance Analysis of GNSS Multipath Mitigation Using Antenna Arrays
No ratings yet
Performance Analysis of GNSS Multipath Mitigation Using Antenna Arrays
15 pages
Be Thoughtful About The Language You Use
No ratings yet
Be Thoughtful About The Language You Use
1 page
Lecture 3 Slides
No ratings yet
Lecture 3 Slides
60 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Recruitment Channels
No ratings yet
Recruitment Channels
1 page
Older Workers
No ratings yet
Older Workers
1 page
Younger Manager
No ratings yet
Younger Manager
1 page
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
No ratings yet
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
23 pages
Estad Istica II Chapter 4: Simple Linear Regression
No ratings yet
Estad Istica II Chapter 4: Simple Linear Regression
46 pages
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
No ratings yet
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
12 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Simple Linear Regression Analysis - Final
No ratings yet
Simple Linear Regression Analysis - Final
46 pages
Linear Regression
No ratings yet
Linear Regression
97 pages
A Practitioners Toolkit On Valuation: Frans de Roon, Joy Van Der Veer
No ratings yet
A Practitioners Toolkit On Valuation: Frans de Roon, Joy Van Der Veer
4 pages
Financial Statement Analysis and Equity Valuation: Included in Study
No ratings yet
Financial Statement Analysis and Equity Valuation: Included in Study
3 pages
Restaurant Linear Regression 07 PDF
No ratings yet
Restaurant Linear Regression 07 PDF
1 page
3 SimpleLinearRegression
No ratings yet
3 SimpleLinearRegression
30 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Introduction To Curve Fitting
No ratings yet
Introduction To Curve Fitting
10 pages
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
No ratings yet
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
15 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Chap01-3 (Autosaved)
No ratings yet
Chap01-3 (Autosaved)
51 pages
13 Chapter14
No ratings yet
13 Chapter14
28 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
FCDS - RA ch4 Sp21
No ratings yet
FCDS - RA ch4 Sp21
18 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
Reg 02
No ratings yet
Reg 02
46 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Module 3 EDA
No ratings yet
Module 3 EDA
14 pages
Particle Filtering For Enhanced Parameter Estimation in Bilinear Systems Under Colored Noise
No ratings yet
Particle Filtering For Enhanced Parameter Estimation in Bilinear Systems Under Colored Noise
20 pages
2015 Rayer RoFMsep15-QR pps2-7
No ratings yet
2015 Rayer RoFMsep15-QR pps2-7
6 pages
LM Ques PPR
No ratings yet
LM Ques PPR
8 pages
Method For Determining The Risk Profile of Investors Based On The Relationship of Two Stock Investing Problems
No ratings yet
Method For Determining The Risk Profile of Investors Based On The Relationship of Two Stock Investing Problems
8 pages
Gaussian States Quick Reference
No ratings yet
Gaussian States Quick Reference
8 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Fe5209 3 Ay 2024
No ratings yet
Fe5209 3 Ay 2024
59 pages
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
No ratings yet
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
110 pages
EE311 Lecture Ch9 Regression
No ratings yet
EE311 Lecture Ch9 Regression
15 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Tutorial2 SLR
No ratings yet
Tutorial2 SLR
10 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
Chapter 8 Linear Regression
No ratings yet
Chapter 8 Linear Regression
22 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
54 pages
ch12 0
No ratings yet
ch12 0
43 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
Reading 2 Capital Market Expectations, Part 2 Forecasting Asset Class Returns - Answers
No ratings yet
Reading 2 Capital Market Expectations, Part 2 Forecasting Asset Class Returns - Answers
22 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)

STAT 135: Linear Regression: Joan Bruna

Uploaded by

STAT 135: Linear Regression: Joan Bruna

Uploaded by

STAT 135: Linear Regression

Joan Bruna STAT 135: Linear Regression

We measure the Brain weight W (grams) and head size S (cubic

Joan Bruna STAT 135: Linear Regression

We measure the Brain weight W (grams) and head size S (cubic

Joan Bruna STAT 135: Linear Regression

We measure the Brain weight W (grams) and head size S (cubic

Joan Bruna STAT 135: Linear Regression

Joan Bruna STAT 135: Linear Regression

Joan Bruna STAT 135: Linear Regression

Joan Bruna STAT 135: Linear Regression

Say we measure the flight time T (in seconds) of an egg of weight

Joan Bruna STAT 135: Linear Regression

Joan Bruna STAT 135: Linear Regression

How√can we combine/test different features of our measurements

Joan Bruna STAT 135: Linear Regression

Linear model of the form

does not look very good. Alternative?

How to estimate the growth rate from the data?

Joan Bruna STAT 135: Linear Regression

log P = log γ0 + γ1 Y − γ1 Y0 = γ˜0 + β1 Y ,

with γ˜0 = log γ0 − γ1 Y0 .

Joan Bruna STAT 135: Linear Regression

log P = log γ0 + γ1 Y − γ1 Y0 = γ˜0 + β1 Y ,

with γ˜0 = log γ0 − γ1 Y0 .

Joan Bruna STAT 135: Linear Regression

Often we need multiple factors to explain a given set of

Joan Bruna STAT 135: Linear Regression

Often we need multiple factors to explain a given set of

Joan Bruna STAT 135: Linear Regression

We will attempt to fit a function f with p parameters on n

Joan Bruna STAT 135: Linear Regression

We will attempt to fit a function f with p parameters on n

Joan Bruna STAT 135: Linear Regression

We will attempt to fit a function f with p parameters on n

Joan Bruna STAT 135: Linear Regression

In previous Chapter, we were interested in the question

“Is factor A (or B) influencing measurement Y ?”

Joan Bruna STAT 135: Linear Regression

In previous Chapter, we were interested in the question

“Is factor A (or B) influencing measurement Y ?”

Now we are much more ambitious:

Joan Bruna STAT 135: Linear Regression

The regression problem has three ingredients:

Joan Bruna STAT 135: Linear Regression

The regression problem has three ingredients:

Joan Bruna STAT 135: Linear Regression

The regression problem has three ingredients:

Joan Bruna STAT 135: Linear Regression

The regression problem has three ingredients:

Joan Bruna STAT 135: Linear Regression

The regression problem has three ingredients:

Joan Bruna STAT 135: Linear Regression

Given a cost function `(x, y ) satisfying `(x, y ) ≥ 0 and `(x, x) = 0,

Different choices for ` yield different statistical properties.

Joan Bruna STAT 135: Linear Regression

Given a cost function `(x, y ) satisfying `(x, y ) ≥ 0 and `(x, x) = 0,

Different choices for ` yield different statistical properties.

Joan Bruna STAT 135: Linear Regression

Let us review the simple affine case (p = 2):

Joan Bruna STAT 135: Linear Regression

Let us review the simple affine case (p = 2):

Joan Bruna STAT 135: Linear Regression

Setting to zero and solving the 2 × 2 system we obtain

Example of brain weight. We obtain

Joan Bruna STAT 135: Linear Regression

Setting to zero and solving the 2 × 2 system we obtain

Example of brain weight. We obtain

We will see how to build confidence intervals of these parameters

Joan Bruna STAT 135: Linear Regression

We introduce a statistical model for our observations:

How well can we recover the parameters of the model β0 and β1

Joan Bruna STAT 135: Linear Regression

Unbiasedness only requires E () = 0.

Joan Bruna STAT 135: Linear Regression

Recall the previous regression coefficients in the affine case:

Unbiasedness only requires E () = 0.