0% found this document useful (0 votes)

244 views4 pages

29 Regression Ext

1. The document defines key terminology used in linear regression models, including population regression function (PRF), sample regression function (SRF), estimands, estimators, estimates, and other parameters. 2. It describes how the PRF represents the hypothetical linear relationship between variables X and Y in the overall population. The SRF estimates this relationship based on a sample of data using techniques like ordinary least squares (OLS) regression. 3. Several estimators are defined including the slope (β1) and y-intercept (β0) of the regression line, as well as the variance (σ2) of the error term. Formulas are given for calculating point estimates of these quantities from a sample dataset.

Uploaded by

Parthiban Rajendran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

244 views4 pages

29 Regression Ext

Uploaded by

Parthiban Rajendran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

1 Clarity on Terminology

1.1 Setup
Let given observed sample set be {(x1 , y1 ), (x2 , y2 ), · · · (xN , yN )}.

X, Y are random variables that can take on any value (xi , yi ) within range of sample set.

If θ̂ would be estimator of estimand θ then θ̂(x) would be estimate of estimand θ at x 1

One variable is always independent or regressor or predictor variable, typically X and another
is dependent or regressand or predicted variable, typically Y

We predict Y , not estimate. Prediction is different from Estimation 2

Due to frequent usage, for simplicity, let us define,

N
X
Sxy = Syx = (xi − x)2 (yi − y)2 constant (1)
i=1
N
X
Sxx = (xi − x)2 constant (2)
i=1
XN
Syy = (yi − y)2 constant (3)
i=1

Do not confuse them with sample standard deviation estimator S

1.2 Population Regression Function, PRF

Given a population (X, Y ) we hypothesize underlying population has a regression line as
follows. The conditional expectation is

E(Y |x) = β0 + β1 x PRF (4)

The above equation is called Population Regression Function, PRF. Including the error ε, the
prediction of dependent variable would be

Y = E(Y |x) + ε Prediction (5)

which is called simple linear regression model for population.E(Y |x) is often hypothetical be-
cause we would not know β0 , β1 unless we know population. We do not care about distribution of
Y (µY , σY2 ) here as regression is always one sided 3 . For Y , we do the other way, but that is another
story in similar lines.

RV(Parameters): ε(0, σ 2 ), X(µX , σX

2 ), Y |x(µ 2
Y |x , σY |x )

1
https://en.wikipedia.org/wiki/Estimator
2
https://stats.stackexchange.com/a/17789/202481
3
unless we standardize dataset, which leads to symmetry and correlation coefficient

1
Other Main Parameters: β0 , β1

All Parameters are constants (and typically unknown for population)

Distribution: ε assumed to have normal distribution N (0, σ 2 )

1.3 Sample Regression Function, SRF

1.3.1 Point Estimates from single SRF
Given a sample set (X, Y ), we estimate underlying population has a regression line as follows.

Ŷ = βˆ0 + βˆ1 x SRF, Estimator of RV E(Y |x), not Y (6)

ε̂ = Y − Ŷ Estimator of RV ε (7)

For given sample (xi , yi ) from sample set (X, Y ), a fitted value and residual are

yˆi = Ŷ (xi ) = b0 + b1 xi Fitted value, Estimate of RV E(Y |x) at xi (8)

εˆi = yi − yˆi Residual, Estimate of RV ε at (xi , yi ) (9)

Using OLS,

P
(x,y) (y − Y )(x − X)
β̂1 = Slope RV, Estimator of RV β1 (10)
− X)2
P
x (x
β̂0 = Y − β̂1 X y-intercept RV, Estimator of RV β0 (11)
P
(yi − y)(xi − x) Sxy
b1 = i P 2
= Slope constant, Estimate of RV β1 (12)
i (xi − x) Sxx
b0 = y − b1 x y-intercept constant, Estimate of RV β0 (13)

βˆ0 , βˆ1 are estimators of β0 , β1 for any sample set. b0 , b1 are estimates of β0 , β1 for given sample
set

Estimator(Estimates): ε̂(0, s2 ), X̂(x, s2X ), Ŷ (ŷ = y, s2Y |x = s2 ), βˆ1 (b1 ), βˆ0 (b0 )

All Estimators are Random Variables. All Estimates are constants.

Distribution: ε̂ assumed to have normal distribution N (0, s2 )

Error sum of squares or residual sum of squares, SSE:

N
X N
X
SSE = (yi − yˆi )2 = [yi − (b0 + b1 xi )]2 constant (14)
i=1 i=1

Variance Estimation σ:

− Ŷ )2
P
2 2 y (y
S = σ̂ = RV, Variance Estimator of RV ε (15)
n−2

2
where n − 2 is the degrees of freedom because it requires βˆ0 , βˆ1 to be calculated (in other words,
β0 , β1 to be estimated) before summation.

PN
2 − yˆi )2
i=1 (yi SSE
s = = constant, Variance Estimate of RV ε (16)
n−2 n−2
S 2 is estimator of σ 2 for any sample set. s2 is estimate of σ 2 for given sample set

S 2 is unbiased estimator (while S is not).

Total sum of squares, SST:

N
X
SST = Syy = (yi − y)2 constant (17)
i=1

Coefficient of determination, rd2 : (to differentiate from Pearson’s correlation coefficient r)

SSE
rd2 = 1 − constant (18)
SST
0 ≤ rd ≤ 1 (19)
2
r = rd2 where r is sample correlation coefficient (20)

1.3.2 Estimates from Multiple SRFs

Here, we wonder if βˆ1 is a random variable, and when we have multiple estimates, what would
be the point and interval estimates of resultant distribution.

Estimand β1 E(βˆ1 ) = µβˆ1 V (βˆ1 ) = σβˆ1

Estimator βˆ1 µ
d βˆ1 σcβˆ1
Estimate b1 sβˆ1

Note in above table, for columns 2 and 3, the estimand is parameter of estimator βˆ1 itself, not
that of β1 . That is, we are interested in the mean and variance of estimator βˆ1 .

Assumption: X fixed for all sample sets so only corresponding Y is RV.

P
(x,y) (x − X)(y − Y ) X
β̂1 = = cy Slope RV, Estimator of RV β1 (21)
X)2
P
x (x − y

x−X
c= P 2
constant (22)
x (x − X)

PN N
i (xi − x)(yi − y) X
b1 = PN = ci yi Slope constant, Estimate of RV β1 (23)
2
i (xi − x) i
xi − x
ci = PN constant (24)
i (xi − x)2

3
Because each Yi is normal (as underlying ε is normal), β̂1 also should be normal.

Mean of βˆ1 :

µβˆ1 = E(βˆ1 ) = β1 RV (25)

Variance of βˆ1 :

σ2
σβ̂2 = V (βˆ1 ) = P 2
RV (26)
x (x − x)
1

S2
Sβ2ˆ = σc
βˆ1 = P 2
RV, Variance Estimator of RV σβ̂1 (27)
x (x − x)
1

s2 s2
s2βˆ = PN = constant, Variance Estimate of RV σβ̂1 (28)
1
i (xi − x)
2 Sxx

Sβ2ˆ is estimator of σβ̂2 for resultant any sampling distribution of βˆ1 or multiple SRFs
1 1

s2ˆ is estimate of σ 2 for resultant given sampling distribution of βˆ1 or multiple SRFs
β1 β̂1

From here, Confidence intervals and Hypothesis testing procedures for β1 could be built (im-
mediate next step would be seeing standardized β̂1 having t distribution with df N − 2)

1.4 Correlation Coefficient

Given a sample set (X, Y ), the sample correlation coefficient is given by

Sxy
r=√ p
Sxx Syy

R-Programming - Unit 5
No ratings yet
R-Programming - Unit 5
43 pages
Derivation of BLUE Property of OLS Estimators
100% (2)
Derivation of BLUE Property of OLS Estimators
4 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Applied Linear Regression Models 4th Ed Note
No ratings yet
Applied Linear Regression Models 4th Ed Note
46 pages
Ref. CH 3 Gujarati Book
No ratings yet
Ref. CH 3 Gujarati Book
51 pages
2-Simple Linear Regression
No ratings yet
2-Simple Linear Regression
59 pages
Econometrics Hawas-1
No ratings yet
Econometrics Hawas-1
83 pages
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
No ratings yet
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
110 pages
Two Variable Regression Analysis - PPTX - II
No ratings yet
Two Variable Regression Analysis - PPTX - II
13 pages
Simple-Linear-Regression-Model-3 24
No ratings yet
Simple-Linear-Regression-Model-3 24
87 pages
Chap02-5 (Autosaved)
No ratings yet
Chap02-5 (Autosaved)
66 pages
Chapter 2 SLRM
No ratings yet
Chapter 2 SLRM
40 pages
EC2C4 Econometrics II
No ratings yet
EC2C4 Econometrics II
56 pages
Lecture 5
No ratings yet
Lecture 5
45 pages
Properties of OLS Estimators: Assumptions Underlying Model
100% (1)
Properties of OLS Estimators: Assumptions Underlying Model
23 pages
Lecture 2 - Regression Model PDF
No ratings yet
Lecture 2 - Regression Model PDF
69 pages
Correlation and Regression: Fathers' and Daughters' Heights
No ratings yet
Correlation and Regression: Fathers' and Daughters' Heights
43 pages
Chapter2 PDF
No ratings yet
Chapter2 PDF
68 pages
CE463 Fall 2024 Regression Lecture 12
No ratings yet
CE463 Fall 2024 Regression Lecture 12
16 pages
Uni Variate Regression
No ratings yet
Uni Variate Regression
61 pages
Regression Analysis NEW-1
No ratings yet
Regression Analysis NEW-1
60 pages
CVEN2002 Week11
No ratings yet
CVEN2002 Week11
49 pages
RegEstimationLS ML StatColumbia
No ratings yet
RegEstimationLS ML StatColumbia
44 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
54 pages
Chapter # 6: Multiple Regression Analysis: The Problem of Estimation
No ratings yet
Chapter # 6: Multiple Regression Analysis: The Problem of Estimation
43 pages
Chapter Two
No ratings yet
Chapter Two
44 pages
Notes 1017 Part1
No ratings yet
Notes 1017 Part1
50 pages
Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness
No ratings yet
Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness
23 pages
Chapter 9 Simple Linear Regression and Correlation
No ratings yet
Chapter 9 Simple Linear Regression and Correlation
56 pages
Chapter2
No ratings yet
Chapter2
70 pages
1.1 Simple Linear Regression Model
100% (1)
1.1 Simple Linear Regression Model
15 pages
C1 English
No ratings yet
C1 English
26 pages
Econometrics Unit 2
No ratings yet
Econometrics Unit 2
21 pages
Concepts Practical Applications and Computer Implementation 5263016
No ratings yet
Concepts Practical Applications and Computer Implementation 5263016
60 pages
Lec3 ppt2019
No ratings yet
Lec3 ppt2019
18 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
ExamFinal Topics
No ratings yet
ExamFinal Topics
9 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
ECN 5121 Econometric Methods Two-Variable Regression Model: The Problem of Estimation By: Domodar N. Gujarati
No ratings yet
ECN 5121 Econometric Methods Two-Variable Regression Model: The Problem of Estimation By: Domodar N. Gujarati
65 pages
Grouped Data Calculation PDF
100% (1)
Grouped Data Calculation PDF
14 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Notes 2
No ratings yet
Notes 2
16 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
Regression Analysis
100% (2)
Regression Analysis
28 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
ECMT1020 Formulas 2021
No ratings yet
ECMT1020 Formulas 2021
9 pages
Chapter3-Goodness of Fit Tests
No ratings yet
Chapter3-Goodness of Fit Tests
24 pages
Sampling Unit 6
No ratings yet
Sampling Unit 6
5 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Module 2 ILT1 Statistics in Analytical Chemistry
No ratings yet
Module 2 ILT1 Statistics in Analytical Chemistry
7 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
Ordinary Least Squares Linear Regression Review: Week 4
No ratings yet
Ordinary Least Squares Linear Regression Review: Week 4
10 pages
Simple Linear
No ratings yet
Simple Linear
10 pages
SMB-R Programming Lab
No ratings yet
SMB-R Programming Lab
57 pages
Six Sigma Project - Operators Attrition
100% (3)
Six Sigma Project - Operators Attrition
25 pages
Module 1 - Tests of Hypothesis For A Single Sample
100% (1)
Module 1 - Tests of Hypothesis For A Single Sample
27 pages
The Three-Variable Model: Notation and Assumptions
No ratings yet
The Three-Variable Model: Notation and Assumptions
8 pages
Lecture Two (Copy)
No ratings yet
Lecture Two (Copy)
27 pages
Arnav MLlab02
No ratings yet
Arnav MLlab02
6 pages
MAE202 FINALterm 2nd Sem AY 22-23-Zafra-Jonald-Grace
No ratings yet
MAE202 FINALterm 2nd Sem AY 22-23-Zafra-Jonald-Grace
13 pages
Session 2-3 (ANOVA) Regression
No ratings yet
Session 2-3 (ANOVA) Regression
54 pages
Lecture 2: Simple Linear Regression Model: Recap
No ratings yet
Lecture 2: Simple Linear Regression Model: Recap
5 pages
Math644 Chapter 1 Part1
No ratings yet
Math644 Chapter 1 Part1
5 pages
Lecture 3
No ratings yet
Lecture 3
32 pages
Class Test (Data Analytics)
No ratings yet
Class Test (Data Analytics)
4 pages
Introduction To Machine Learning: The Problem of Overfitting
No ratings yet
Introduction To Machine Learning: The Problem of Overfitting
8 pages
Introduction To ML Linear Regression
No ratings yet
Introduction To ML Linear Regression
33 pages
Unit-6 - Non Parametric Test
No ratings yet
Unit-6 - Non Parametric Test
16 pages
Pengaruh Profitabilitas Dan Aktivitas Perusahaan Terhadap Audit Delay Pada Perusahaan Sektor Barang Konsumen Primer Yang Terdaftar Di Bursa Efek Indonesia
No ratings yet
Pengaruh Profitabilitas Dan Aktivitas Perusahaan Terhadap Audit Delay Pada Perusahaan Sektor Barang Konsumen Primer Yang Terdaftar Di Bursa Efek Indonesia
15 pages
Cs3491 - Aiml - Unit III - Probabilistic Discriminative Model
No ratings yet
Cs3491 - Aiml - Unit III - Probabilistic Discriminative Model
9 pages
ECE 531: Detection and Estimation Theory: Natasha Devroye Devroye@ece - Uic.edu Spring 2011
No ratings yet
ECE 531: Detection and Estimation Theory: Natasha Devroye Devroye@ece - Uic.edu Spring 2011
15 pages
Significance+Tests+Four Step+Practice+Answer+Key+ +Intro+Stats+ +Stats+Medic
No ratings yet
Significance+Tests+Four Step+Practice+Answer+Key+ +Intro+Stats+ +Stats+Medic
2 pages
ADIGRAT UNIVERSITY Bass New
No ratings yet
ADIGRAT UNIVERSITY Bass New
8 pages
Interpretation of Water Quality Data by Principal
No ratings yet
Interpretation of Water Quality Data by Principal
8 pages
Dyslexia Treatment Studies A Systematic Review and
No ratings yet
Dyslexia Treatment Studies A Systematic Review and
19 pages
TY - COMP - Descriptive Analytics - DEC 2019
No ratings yet
TY - COMP - Descriptive Analytics - DEC 2019
4 pages
Programa CP 2019
No ratings yet
Programa CP 2019
98 pages
2 Conditional Probability
No ratings yet
2 Conditional Probability
6 pages
Crosstabs: Kriteria Produk Kriteria Keputusan Pemanfaatan Crosstabulation
No ratings yet
Crosstabs: Kriteria Produk Kriteria Keputusan Pemanfaatan Crosstabulation
7 pages
Factor Analysis SPSS
No ratings yet
Factor Analysis SPSS
3 pages
Kebiasaan Merokok Dalam Rumah Kejadian ISPA Crosstabulation
No ratings yet
Kebiasaan Merokok Dalam Rumah Kejadian ISPA Crosstabulation
5 pages
Business Statistics Syl Lab Us
No ratings yet
Business Statistics Syl Lab Us
2 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

29 Regression Ext

Uploaded by

29 Regression Ext

Uploaded by

1 Clarity on Terminology

 If θ̂ would be estimator of estimand θ then θ̂(x) would be estimate of estimand θ at x 1

 We predict Y , not estimate. Prediction is different from Estimation 2

Due to frequent usage, for simplicity, let us define,

Do not confuse them with sample standard deviation estimator S

1.2 Population Regression Function, PRF

E(Y |x) = β0 + β1 x PRF (4)

Y = E(Y |x) + ε Prediction (5)

 RV(Parameters): ε(0, σ 2 ), X(µX , σX

 All Parameters are constants (and typically unknown for population)

 Distribution: ε assumed to have normal distribution N (0, σ 2 )

1.3 Sample Regression Function, SRF

Ŷ = βˆ0 + βˆ1 x SRF, Estimator of RV E(Y |x), not Y (6)

yˆi = Ŷ (xi ) = b0 + b1 xi Fitted value, Estimate of RV E(Y |x) at xi (8)

 All Estimators are Random Variables. All Estimates are constants.

 Distribution: ε̂ assumed to have normal distribution N (0, s2 )

Error sum of squares or residual sum of squares, SSE:

 S 2 is unbiased estimator (while S is not).

Total sum of squares, SST:

Coefficient of determination, rd2 : (to differentiate from Pearson’s correlation coefficient r)

1.3.2 Estimates from Multiple SRFs

Estimand β1 E(βˆ1 ) = µβˆ1 V (βˆ1 ) = σβˆ1

Assumption: X fixed for all sample sets so only corresponding Y is RV.

µβˆ1 = E(βˆ1 ) = β1 RV (25)

1.4 Correlation Coefficient

You might also like

If θ̂ would be estimator of estimand θ then θ̂(x) would be estimate of estimand θ at x 1

We predict Y , not estimate. Prediction is different from Estimation 2

RV(Parameters): ε(0, σ 2 ), X(µX , σX

All Parameters are constants (and typically unknown for population)

Distribution: ε assumed to have normal distribution N (0, σ 2 )

All Estimators are Random Variables. All Estimates are constants.

Distribution: ε̂ assumed to have normal distribution N (0, s2 )

S 2 is unbiased estimator (while S is not).