0% found this document useful (0 votes)
34 views22 pages

Linear Regression

This document discusses simple linear regression. Simple linear regression finds the best fit linear relationship between a dependent variable (Y) and independent variable (X) such that Y = a + bX, where a is the intercept and b is the slope. The fitted regression line is estimated using the least squares method to minimize the sum of the squared residuals. The coefficient of determination (R2) measures how well the regression line approximates the real data points, with values closer to 1 indicating a better fit. Confidence intervals can be constructed for the slope and intercept parameters based on the t-distribution. An example calculates 95% confidence intervals for the slope and intercept of sample data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views22 pages

Linear Regression

This document discusses simple linear regression. Simple linear regression finds the best fit linear relationship between a dependent variable (Y) and independent variable (X) such that Y = a + bX, where a is the intercept and b is the slope. The fitted regression line is estimated using the least squares method to minimize the sum of the squared residuals. The coefficient of determination (R2) measures how well the regression line approximates the real data points, with values closer to 1 indicating a better fit. Confidence intervals can be constructed for the slope and intercept parameters based on the t-distribution. An example calculates 95% confidence intervals for the slope and intercept of sample data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

SIMPLE LINEAR REGRESSION Ade Y.P., Ph.

D
DEFINITION
Linear regression: best relationship between
Y (dependent variable=response) and
X (independent variable=regressor)
Y=a+bX
Where:
a = intercept
b = slope
DEFINITION
ESTIMATION OF REGRESSION LINE/THE
FITTED REGRESSION LINE
Y=a+bX
Best regression line is regression with the
best estimation of “a” and “b” (fitted).
The fitted regression line is close to the true
regression line when large amount of data
involved. Fitted regression comes from
regression of coefficients “a” and “b”.
ESTIMATION OF REGRESSION LINE/THE
FITTED REGRESSION LINE
Fitted
regression line

True
regression line
FITTED REGRESSION LINE

Residual: Error of fitted model


If real data: (xi,yi) with i = 1,2,3,4,….n
Then fitted model :
Y=a+bX resulted (Xi,Yi) with i = 1,2,3,4,….n
Then Error of fit is E=yi-Yi
ESTIMATE COEFFICIENT WITH LEAST
SQUARE METHOD
The sum of squares of the errors about the regression line is
denoted by SSE.
𝒏 𝒏
𝟐
𝑺𝑺𝑬 = ෍ 𝑬𝒊 = ෍ 𝒚𝒊 − 𝒀𝒊 𝟐
𝒊=𝟏𝒏 𝒊=𝟏
𝟐
𝑺𝑺𝑬 = ෍ 𝒚𝒊 − 𝒂 − 𝒃𝒙𝒊
𝒊=𝟏
ESTIMATE COEFFICIENT WITH LEAST
SQUARE METHOD
𝒏
𝝏𝑺𝑺𝑬
= −𝟐 ෍ 𝒚𝒊 − 𝒂 − 𝒃𝒙𝒊 = 𝟎
𝝏𝒂
𝒊=𝟏

𝒏
𝝏𝑺𝑺𝑬
= −𝟐 ෍ 𝒚𝒊 − 𝒂 − 𝒃𝒙𝒊 𝒙𝒊 = 𝟎
𝝏𝒃
𝒊=𝟏
ESTIMATE COEFFICIENT WITH LEAST
SQUARE METHOD
𝒏 𝒏

𝒏𝒂 + 𝒃 ෍ 𝒙𝒊 = ෍ 𝒚𝒊
𝒊=𝟏 𝒊=𝟏

𝒏 𝒏 𝒏
𝟐
𝒂 ෍ 𝒙𝒊 + 𝒃 ෍ 𝒙𝒊 = ෍ 𝒙 𝒊 𝒚𝒊
𝒊=𝟏 𝒊=𝟏 𝒊=𝟏
ESTIMATE COEFFICIENT WITH LEAST
SQUARE METHOD
𝒏 σ𝒏𝒊=𝟏 𝒙𝒊 𝒚𝒊 − σ𝒏𝒊=𝟏 𝒙𝒊 σ𝒏𝒊=𝟏 𝒚𝒊
𝒃= 𝟐
𝒏 σ𝒏𝒊=𝟏 𝒙𝟐𝒊 − 𝒏
σ𝒊=𝟏 𝒙𝒊

σ𝒏𝒊=𝟏 𝒚𝒊 − 𝒃 σ𝒏𝒊=𝟏 𝒙𝒊
𝒂=
𝒏
THE ERROR SUM OF SQUARES (SSE)
A MEASURE OF QUALITY OF FIT:
COEffiCIENT OF DETERMINATION

2
𝑆𝑆𝐸
𝑅 =1−
𝑆𝑆𝑇
Where:
𝑛

𝑆𝑆𝑇 = ෍ 𝑦𝑖 − 𝑦ത𝑖 2

𝑖=1
SST Is the total corrected sum of squared
ILLUSTRATION OF R 2
CORRELATION COEFFICIENT

𝑆𝑥𝑥 𝑆𝑥𝑦
𝑟=𝑏 =
𝑆𝑦𝑦 𝑆𝑥𝑥 𝑆𝑦𝑦

𝑟 ≈ 1 (𝑔𝑜𝑜𝑑 𝑙𝑖𝑛𝑒𝑎𝑟 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑥 𝑎𝑛𝑑 𝑦)


2
VARIANCE (𝜎 ) ESTIMATOR

→ 𝑌𝑖 = 𝑎 + 𝑏𝑋𝑖

→ 𝐹𝑖𝑡𝑡𝑒𝑑 𝑀𝑜𝑑𝑒𝑙

→ 𝑅𝑒𝑎𝑙 𝑑𝑎𝑡𝑎
CONFIDENCE INTERVAL FOR 𝛽1
A 100(1-𝛼)% confidence interval for 𝛽1 in regression
line 𝜇𝑌|𝑥 = 𝛽0 + 𝛽1 𝑥 is

Where 𝑡𝛼|2 is the value of t-distribution with n-2


degrees of freedom
CONFIDENCE INTERVAL FOR 𝛽0
A 100(1-𝛼)% confidence interval for 𝛽0 in regression
line 𝜇𝑌|𝑥 = 𝛽0 + 𝛽1 𝑥 is

Where 𝑡𝛼/2 is the value of 𝑡-distribution with (𝑛 − 2)


degrees of freedom
EXAMPLE
From Table 11.1
𝑆𝑥𝑥 = 4152.18
𝑆𝑦𝑦 = 3713.88
𝑆𝑥𝑦 = 3752.09
𝑏1 = 0.903643
𝑏0 = 3.829633
𝑛 = 33
𝑛

෍ 𝑥𝑖 = 41086
𝑖=1

Find a 95% confidence interval for 𝛽1 and 𝛽0 in the


regression line 𝜇𝑌|𝑥 = 𝛽0 + 𝛽1 𝑥 !!!
2
Solve 𝑠 𝑏𝑦 𝑠
Solve 𝛼 𝑎𝑛𝑑 𝑡𝛼/2
Solve interval
𝛼 = 100% − 95% = 1 − 0.95 = 0.05
𝑎𝑛𝑑 𝑡𝛼/2 = 𝑡0.05/2 = 𝑡0.025

 = 𝑛 − 2 = 33 − 2 = 31

Cek tabel distribusi -𝑡


𝑇𝑒𝑛𝑡𝑢𝑘𝑎𝑛 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑑𝑒𝑛𝑔𝑎𝑛 𝑚𝑒𝑚𝑖𝑙𝑖ℎ:
𝛼 = 0.025 𝑑𝑎𝑛  = 31

You might also like