Linear Review 1
Linear Review 1
Presidency University
February, 2025
Guessing the value of a variable
I If we choose fˆn (x) to be complicated functions then Bias 2 (fˆn (x)) is small
but Var (fˆn (x)) increases considerably.
Bias- variance Trade-off
I Although σx2 is beyond control, Bias 2 (fˆn (x)) and Var (fˆn (x)) depends on
the choice of fˆn (x) which makes the situation interesting.
I If we choose fˆn (x) to be complicated functions then Bias 2 (fˆn (x)) is small
but Var (fˆn (x)) increases considerably.
I We note that even an unbiased estimator fˆn (x) may not be admissible
because of large variance.
Bias- variance Trade-off
I Although σx2 is beyond control, Bias 2 (fˆn (x)) and Var (fˆn (x)) depends on
the choice of fˆn (x) which makes the situation interesting.
I If we choose fˆn (x) to be complicated functions then Bias 2 (fˆn (x)) is small
but Var (fˆn (x)) increases considerably.
I We note that even an unbiased estimator fˆn (x) may not be admissible
because of large variance.
I If we choose fˆn (x) to be complicated functions then Bias 2 (fˆn (x)) is small
but Var (fˆn (x)) increases considerably.
I We note that even an unbiased estimator fˆn (x) may not be admissible
because of large variance.
I If we choose fˆn (x) to be complicated functions then Bias 2 (fˆn (x)) is small
but Var (fˆn (x)) increases considerably.
I We note that even an unbiased estimator fˆn (x) may not be admissible
because of large variance.
I If we choose fˆn (x) to be complicated functions then Bias 2 (fˆn (x)) is small
but Var (fˆn (x)) increases considerably.
I We note that even an unbiased estimator fˆn (x) may not be admissible
because of large variance.
f (x) = α + β sin(γx)
Example: Bias variance tradeoff in action
f (x) = α + β sin(γx)
f (x) = α + β sin(γx)
f (x) = α + β sin(γx)
I In fact here the MSE of the model fˆ1 (x) = f0 is less than the
MSE of the unbiased model fˆ2 (x) = α̂ + β̂ sin(γx) (assuming
γ to be known).
Example (contd.)
1.6
1.2
y
0.8
0.4
x
Example (Contd.)
y = Xβ +
where y = (y1 , y2 , ..., yn ) is the response vector and β = (b1 , ..., bp ) is the
vector of parameters and
E (y |X ) = X β and Var (y |X ) = σ 2 In .
Example: Simple Linear Regression
y = Xθ +
1 x1 x12 · · · x1p
β0
1 x2 x 2 · · · x p β1
2 2
X = . . .. .. .. and θ = .. .
.. .. . . . .
2 p
1 xn xn · · · xn βp
Example: Multiple Regression
F = {f : f (x) = β0 + β1 x1 + .... + βp xp }
y = α + β1 x1 + β2 x2 + ... + βk xk +
Using dummy for all levels
I Now suppose in the same situation we use k dummy variables x1 , x2 , ..., xk
instead of k − 1 variables x1 , x2 , ..., xk−1 and fit a model as
y = α + β1 x1 + β2 x2 + ... + βk xk +
I Then we note that the variables x1 , x2 , ..., xk are not independent : they satisfy
P
a constraint xi = 1, that is any observation must receive any one of the levels
Ai .
Using dummy for all levels
I Now suppose in the same situation we use k dummy variables x1 , x2 , ..., xk
instead of k − 1 variables x1 , x2 , ..., xk−1 and fit a model as
y = α + β1 x1 + β2 x2 + ... + βk xk +
I Then we note that the variables x1 , x2 , ..., xk are not independent : they satisfy
P
a constraint xi = 1, that is any observation must receive any one of the levels
Ai .
y = α + β1 x1 + β2 x2 + ... + βk xk +
I Then we note that the variables x1 , x2 , ..., xk are not independent : they satisfy
P
a constraint xi = 1, that is any observation must receive any one of the levels
Ai .
I Statistical lesson: There can be alternative parametrization for the same model.
Example: ANOVA model (One way layout)
I Suppose we have a factor A and let A1 , A2 , ...., Ak be the levels of A
which constitutes the population of interest.
Example: ANOVA model (One way layout)
I Suppose we have a factor A and let A1 , A2 , ...., Ak be the levels of A
which constitutes the population of interest.
I Further assume there are ni observations receiving the level Ai and yij be
the j th observation receiving the i th level Ai .
Example: ANOVA model (One way layout)
I Suppose we have a factor A and let A1 , A2 , ...., Ak be the levels of A
which constitutes the population of interest.
I Further assume there are ni observations receiving the level Ai and yij be
the j th observation receiving the i th level Ai .
where
µi = fixed effect due to Ai and eij = random error.
Example: ANOVA model (One way layout)
I Suppose we have a factor A and let A1 , A2 , ...., Ak be the levels of A
which constitutes the population of interest.
I Further assume there are ni observations receiving the level Ai and yij be
the j th observation receiving the i th level Ai .
where
µi = fixed effect due to Ai and eij = random error.
I We assume that
eij ∼ N(0, σ 2 )
and eij0 s are independent.
Example: ANOVA model (One way layout)
I Suppose we have a factor A and let A1 , A2 , ...., Ak be the levels of A
which constitutes the population of interest.
I Further assume there are ni observations receiving the level Ai and yij be
the j th observation receiving the i th level Ai .
where
µi = fixed effect due to Ai and eij = random error.
I We assume that
eij ∼ N(0, σ 2 )
and eij0 s are independent.
I This implies that E (yij ) = µi and Var (yij ) = σ 2 for all j = 1, ..., ni which
means µ0i s are the factor level means for each i and σ 2 is the common
variability among observations belonging to each group.
One way ANOVA as linear model
y = µ1 x1 + µ2 x2 + .... + µk xk +
y11 e11
y12 e12
.. ..
. .
y1n e1n
1 1
1n1 0 ··· 0
y21
µ1 e21
. µ2 0 1n2 ··· 0 .
y = .. , β = . and Xn×k = .
.
.. .. .. and = .
.. ..
y2n2
. . . e2n2
.
. µ k 0 0 ··· 1nk .
.
. .
yk1 ek1
. .
.. ..
yknk eknk
One way ANOVA as linear model
I Suppose we denote
y11 e11
y12 e12
.. ..
. .
y1n e1n
1 1
1n1 0 ··· 0
y21
µ1 e21
. µ2 0 1n2 ··· 0 .
y = .. , β = . and Xn×k = .
.
.. .. .. and = .
.. ..
y2n2
. . . e2n2
.
. µ k 0 0 ··· 1nk .
.
. .
yk1 ek1
. .
.. ..
yknk eknk
y = Xβ +
2
where ∼ Nn (0, σ In ).
Reparametrization
I At times, an alternative but completely equivalent formulation of the
single-factor ANOVA model is used. This alternative formulation is called
the factor effects model.
Reparametrization
I At times, an alternative but completely equivalent formulation of the
single-factor ANOVA model is used. This alternative formulation is called
the factor effects model.
I Let us write
µi = µ̄ + (µi − µ̄) = µ + αi
P
ni µi
where µ = µ̄ = n
and αi = µi − µ̄.
Reparametrization
I At times, an alternative but completely equivalent formulation of the
single-factor ANOVA model is used. This alternative formulation is called
the factor effects model.
I Let us write
µi = µ̄ + (µi − µ̄) = µ + αi
P
ni µi
where µ = µ̄ = n
and αi = µi − µ̄.
X
I Then we note that ni αi = 0.
i
Reparametrization
I At times, an alternative but completely equivalent formulation of the
single-factor ANOVA model is used. This alternative formulation is called
the factor effects model.
I Let us write
µi = µ̄ + (µi − µ̄) = µ + αi
P
ni µi
where µ = µ̄ = n
and αi = µi − µ̄.
X
I Then we note that ni αi = 0.
i
where µ denotes the general effect or the average effect and αi denotes
the
X additional effect (fixed) due to Ai subject to the restriction
ni αi = 0 and eij denotes the random error.
i
Reparametrization
I At times, an alternative but completely equivalent formulation of the
single-factor ANOVA model is used. This alternative formulation is called
the factor effects model.
I Let us write
µi = µ̄ + (µi − µ̄) = µ + αi
P
ni µi
where µ = µ̄ = n
and αi = µi − µ̄.
X
I Then we note that ni αi = 0.
i
where µ denotes the general effect or the average effect and αi denotes
the
X additional effect (fixed) due to Ai subject to the restriction
ni αi = 0 and eij denotes the random error.
i
y = Xβ +
I Note that we allow the number of observations in the two groups to be different
and y ’s represent the value of the response.
Example: More use of dummy variables
I Consider a setup where we need to judge the effectiveness of a treatment (or
may be comparing the effectiveness of two treatments, in which case the control
group may be thought of getting some treatment). This may be a controlled
experiment or observational study.
I Note that we allow the number of observations in the two groups to be different
and y ’s represent the value of the response.
I This situation can also be tackled with a linear model with the use of dummy
variables.
More use of dummy (contd.)
I Let us define a dummy variable as
z = α + βx +
or more precisely
z = α + βx +
or more precisely
I Here (
y1i , i = 1, 2, ..., n1
zi =
y2(i−n1 ) , i = n1 + 1, ..., n1 + n2
and 1 , 2 , ..., n are the random errors.
More use (Contd.)
I Suppose we write the above linear model as
z = X θ + .
More use (Contd.)
I Suppose we write the above linear model as
z = X θ + .
z = X θ + .
where the upper submatrix consists of n1 rows and the lower one contains
n2 rows.
More use (Contd.)
I Suppose we write the above linear model as
z = X θ + .
where the upper submatrix consists of n1 rows and the lower one contains
n2 rows.
I Note that in the above formulation the effect of the treatment is α + β and the
effect of the control is α- so the change in effect due to the treatment is β.
More than one categorical predictors
I Here the responses are the yields on different plots and there
are two factors fertilizer brand and soil type.
Interaction
I Here the responses are the yields on different plots and there
are two factors fertilizer brand and soil type.
I Here the responses are the yields on different plots and there
are two factors fertilizer brand and soil type.
y = α + β1 x1 + β2 x2 + β3 x1 x2 +
Interaction (Contd.)
y = α + β1 x1 + β2 x2 + β3 x1 x2 +
E [y |X1 = x1 + 1, X2 = x2 ] − E [y |X1 = x1 , X2 = x2 ].
Interaction (Contd.)
I That difference is, rather β1 + β3 x2 .
Interaction (Contd.)
I That difference is, rather β1 + β3 x2 .
I The fact that we can’t give one answer to “how much does the
response change when we change this variable?”, that the
correct answer to that question always involves the other
variable, is what interaction means.
Interaction (Contd.)
I That difference is, rather β1 + β3 x2 .
I The fact that we can’t give one answer to “how much does the
response change when we change this variable?”, that the
correct answer to that question always involves the other
variable, is what interaction means.
I The fact that we can’t give one answer to “how much does the
response change when we change this variable?”, that the
correct answer to that question always involves the other
variable, is what interaction means.
XX
1
I Here µ̄00 = pq µij is the general effect (say µ) as it is
i j
obtained by averaging over the effects of all possible level
combinations.
Reparametrization
XX
1
I Here µ̄00 = pq µij is the general effect (say µ) as it is
i j
obtained by averaging over the effects of all possible level
combinations.
I Further
1X
µ̄i0 = µij = the fixed effect due to Ai
q j
X
⇒ αi = µ̄i0 − µ̄00 = fixed additional effect (main) due to Ai with αi = 0
i
I And
1X
µ̄0j = µij = the fixed effect due to Bj
p i
X
⇒ βj = µ̄0j − µ̄00 = fixed additional effect (main) due to Bj with βj = 0.
j
I And
1X
µ̄0j = µij = the fixed effect due to Bj
p i
X
⇒ βj = µ̄0j − µ̄00 = fixed additional effect (main) due to Bj with βj = 0.
j
I Also µij − µ̄i0 is the additional effect due to Bj when A is held constant at the
i th level Ai .
I And
1X
µ̄0j = µij = the fixed effect due to Bj
p i
X
⇒ βj = µ̄0j − µ̄00 = fixed additional effect (main) due to Bj with βj = 0.
j
I Also µij − µ̄i0 is the additional effect due to Bj when A is held constant at the
i th level Ai .
I Averaging out over those effects for varying i, we get µ̄0j − µ̄00 .
I Thus
γij = (µij − µ̄i0 ) − (µ̄0j − µ̄00 ) = fixed interaction effect due to (Ai , Bj )
with X
γij = 0 for all j
i
and X
γij = 0 for all i.
j
Interaction or no interaction?
I One potential question of interest is when should we include
interaction in our model?
Interaction or no interaction?
I One potential question of interest is when should we include
interaction in our model?
I The fact is that there cannot be any objective answer to this
question.
Interaction or no interaction?
I One potential question of interest is when should we include
interaction in our model?
I The fact is that there cannot be any objective answer to this
question.
I Rather let us understand the difference between including or
not including the interaction term in the model.
Interaction or no interaction?
I One potential question of interest is when should we include
interaction in our model?
I The fact is that there cannot be any objective answer to this
question.
I Rather let us understand the difference between including or
not including the interaction term in the model.
I For illustration let us consider an example of a simple
two-factor study in which the effects of gender (male and
female) and age (young, middle and old) on learning of a task
are of interest.
Interaction or no interaction?
I One potential question of interest is when should we include
interaction in our model?
I The fact is that there cannot be any objective answer to this
question.
I Rather let us understand the difference between including or
not including the interaction term in the model.
I For illustration let us consider an example of a simple
two-factor study in which the effects of gender (male and
female) and age (young, middle and old) on learning of a task
are of interest.
I When we assume no interaction effects we call the factor
effects are additive, that is,
µij = µ + αi + βj
Interaction or no interaction?
I One potential question of interest is when should we include
interaction in our model?
I The fact is that there cannot be any objective answer to this
question.
I Rather let us understand the difference between including or
not including the interaction term in the model.
I For illustration let us consider an example of a simple
two-factor study in which the effects of gender (male and
female) and age (young, middle and old) on learning of a task
are of interest.
I When we assume no interaction effects we call the factor
effects are additive, that is,
µij = µ + αi + βj
I The figure shows that Age has some effect (due to difference
in height) whereas gender has no effect (since lines have zero
slope) on the mean response.
I Also the lines do not intersect meaning that there is no
interaction effect.
No interaction
I Here both age and gender have effects on the mean response
but still there is no interaction effect because the lines do not
intersect.
I Thus it is entirely possible that factors are additive (that is
factors have main effects but they do not interact).
Interaction
I There are main effects of both the factors along with the
interaction effect.
I Is it possible that factors have interaction effects but no main
effects? (Can some parallel lines intersect ? )
Notes on interactions
I In case of multifactor studies some interactions may be zero
even though the factors are interacting. All interactions must
equal zero in order for the two factors to be additive.
Notes on interactions
I In case of multifactor studies some interactions may be zero
even though the factors are interacting. All interactions must
equal zero in order for the two factors to be additive.
I If the line is available for one day and only eight batches of
product can be produced in a day, the experiment may have to
be limited to eight observations.
Two way layout with one observation per cell
I In many studies we have constraints on cost, time, and
materials that limit the number of observations that can be
obtained.
I If the line is available for one day and only eight batches of
product can be produced in a day, the experiment may have to
be limited to eight observations.
where µij is fixed effect due to (Ai , Bj ) and eij is random error.
Model
where µij is fixed effect due to (Ai , Bj ) and eij is random error.
yij = µ + αi + βj + eij .
Example: Two way layout with more than one observation
per cell
I Let there be two factors A and B such that A has p levels
A1 , A2 , ..., Ap and B has q levels B1 , B2 , ..., Bq . These pq level
combinations (Ai , Bj ) constitute the entire population of
interest.
Example: Two way layout with more than one observation
per cell
I Let there be two factors A and B such that A has p levels
A1 , A2 , ..., Ap and B has q levels B1 , B2 , ..., Bq . These pq level
combinations (Ai , Bj ) constitute the entire population of
interest.
I Further we assume that we have m observations corresponding
to each level combination.
Example: Two way layout with more than one observation
per cell
I Let there be two factors A and B such that A has p levels
A1 , A2 , ..., Ap and B has q levels B1 , B2 , ..., Bq . These pq level
combinations (Ai , Bj ) constitute the entire population of
interest.
I Further we assume that we have m observations corresponding
to each level combination.
I Suppose yijk be the k th observation receiving the treatment
combination (Ai , Bj ).
Example: Two way layout with more than one observation
per cell
I Let there be two factors A and B such that A has p levels
A1 , A2 , ..., Ap and B has q levels B1 , B2 , ..., Bq . These pq level
combinations (Ai , Bj ) constitute the entire population of
interest.
I Further we assume that we have m observations corresponding
to each level combination.
I Suppose yijk be the k th observation receiving the treatment
combination (Ai , Bj ).
I Then the model we consider here is
y = α + β1 x1 + β1b xb x1 +
Interaction of Categorical and Numerical Variables
y = α + β1 x1 + β1b xb x1 +
y = α + β1 x1 + β1b xb x1 +
y = α + β1 x1 + β1b xb x1 +
y = α + βb xb + β1 x1 + β1b xb x1 +
Interaction (Contd.)
y = α + βb xb + β1 x1 + β1b xb x1 +
y = α + βb xb + β1 x1 + β1b xb x1 +