Chapter11 - Simple Regression
Chapter11 - Simple Regression
[email protected] N
3. Correlation
Introduction
▪ A scatter plot can be used to show the relationship between two variables.
▪ Regression analysis is used to:
Predict the value of a dependent variable based on the value of at least one
independent variable
SIMPLE LINEAR REGRESSION Explain the impact of changes in an independent variable on the dependent
variable
▪ Dependent variable Y: the variable we wish to predict or explain.
▪ Independent variable X: the variable used to predict or explain the dependent
variable.
1
Regression Models Types of Relationships
2
Simple Linear Regression Equation
Simple Linear Regression Model (Prediction Line)
The simple linear regression equation provides an estimate of the population
regression line
y
LS minimizes ˆ
i =1
i
2
= ˆ12 + ˆ22 + ˆ32 + ˆ42
60
40 𝟏 = 𝑺𝒙𝒚; 𝜷
𝜷 𝟎 = 𝒚 𝟏ഥ
ഥ−𝜷 𝒙,
𝑺𝒙𝒙
σ 𝒙𝒊 𝟐
20 𝑺𝒙𝒙 = 𝒙𝒊 − 𝒙 𝟐
= 𝒙𝒊 𝟐 −
𝒏
0 x σ 𝒙𝒊 σ 𝒚𝒊
𝑺𝒙𝒚 = σ 𝒙𝒊 − 𝒙 𝒚𝒊 − 𝒚 = σ 𝒙𝒊 𝒚𝒊 −
0 20 40 60 𝒏
3
Interpretation of the Slope and the
Intercept Example
0 is the estimated mean value of Y when the value of X is zero
𝛽 A real estate agent wishes to examine House Price in $1000s Square Feet
1 is the estimated change in the mean value of Y as a result of a one-unit the relationship between the selling (Y) (X)
𝛽
increase in X price of a home and its size (measured 245 1400
in square feet). A random sample of 10 312 1600
houses is selected 279 1700
Dependent variable (Y) = house price in 308 1875
$1000s 199 1100
Independent variable (X) = square feet 219 1550
405 2350
324 2450
319 1425
255 1700
4
Example (using Excel) Example (using Excel)
Add-Ins: PHStat: Regression: Simple Linear Regression Excel output:
5
Example
A mail-order firm is interested in estimating the number of order that need to be
processed on a given day from the weight of the mail received. A close
ෝ𝟐 SSR
monitoring of mail on 4 randomly selected business days produced the results 𝝈
𝑺𝑺𝑬
below.
𝑺𝑺𝑻
a) Find the equation of the least squares regression line relating the number of
orders to the weight of the mail and use this equation to predict the number of
orders when x = 15.
b) Find error sum of squares and the estimate of the variance of the random error.
1 𝑥ҧ 2
𝑠𝑒 𝛽መ0 = 𝜎ො 2 ( + )
𝑛 𝑆𝑥𝑥
• We use t-test with degree of freedom df = n - 2 to test for
𝐻0 : 𝛽𝑖 = 𝛽𝑖,0 , 𝑖 = 1,2.
6
Test hypothesis about the slope and intercept Test for significance of regression
Remark
If 𝛽1 = 0 then X is NOT significant in explaining the values of Y.
We say that the (linear) regression is not significant.
245 1400
▪ Null and alternative hypotheses (These hypotheses relate to the significance of The slope of this model is 0.1098
312 1600
regression).
Is there a relationship between the square 279 1700
• H0: β1 = 0 (no linear relationship) footage of the house and its sales price? 308 1875
• H1: β1 ≠ 0 (linear relationship does exist) 199 1100
219 1550
▪ Test statistic
d.f. = n − 2 405 2350
324 2450
319 1425
255 1700
7
Example Correlation coefficient
H0: β1 = 0 To measure the strength of the linear relationship between X and Y we can use
H1: β1 ≠ 0 the correlation coefficient ρ.
𝛽መ1 = 0.10977 Properties of ρ:
𝑠𝑒 𝛽መ1 = 0.03297
1. −1 ≤ 𝜌 ≤ 1
𝑡0 = 3.32938, 𝑡0.025,8 = 2.306
df = 10 – 2 = 8 2. If ρ ∼ 1, then there is a strong positive linear regression.
Decision: Reject 𝐻0
3. If ρ ∼ -1, then there is a strong negative linear regression.
There is sufficient evidence that square footage
affects house price 4. If ρ ∼ 0, then linear relation between X and Y is weak.
Remark:
1. −1 ≤ 𝑅 ≤ 1
𝑆𝑆 𝑆𝑆
2. 𝑅 2 = 𝑅 = 1 − 𝐸 is called the coefficient of determination and it is often
𝑆𝑆𝑇 𝑆𝑆𝑇
used to judge the adequacy of a regression model.
3. R and 𝛽1 have same sign.
4. R and 𝑅 2 both measure the strength of a linear relationship.
8
Examples Test for Zero Correlation (t-test)
Example1: In a regression problem the following pairs of (x, y) are given (-4; 8); ▪ Hypotheses: 𝐻0 : 𝜌 = 0 vs 𝐻1 : 𝜌 ≠ 0
(-1; 3); (0; 0); (1; -3). ▪ Test statistic: df = n - 2
What does this indicate about the value of coefficient of correlation and
coefficient of determination?
Example 2:
The least squares regression line is 𝑦ො = −2.87 − 1.6𝑥 and a coefficient of
determination of 0.36.
a. Find R
b. Describe the correlation between weight and maximum speed.
Answer.
a. R = - 0.813
b. Negative correlation
9
Exercises Exercises
A paired data set has n = 5, σ𝑥 = 15, σ𝑦 = 27, σ𝑥𝑦 = 100, and σ𝑥 2 = 55.
Find 𝑆𝑥𝑦
A. 1.9
B. 10
C. 19
Exercises Exercises
A paired data set has n = 5, σ𝑥 = 15, σ𝑦 = 27, σ𝑥𝑦 = 100, and σ𝑥 2 = 55. A paired data set has n = 5, σ𝑥 = 15, σ𝑦 = 27, σ𝑥𝑦 = 100, and σ𝑥 2 = 55.
Find 𝑆𝑥𝑥 The slope of the regression line is
A. 1.9 A. 1.9 B. 10 C. 19
B. 10 The intercept of the regression line is
C. 19 A. 1.9 B. 5.4 C. – 0.3
10
Exercises Exercises
A paired data set has 𝑥ҧ = 10, 𝑦ത = 8, and slope of the regression line 1.5. The equation of the regression line is 𝑦ො = 1.2𝑥 − 3.4. Compute the residual for
the point (7, 6).
The intercept of the regression line is
A. 1 B. – 1 C. 5
A. 8 B. – 7 C. 23
Exercises
A linear regression analysis of Birth Weight (grams) and Gestational Age (weeks)
gave the following output.
Calculate the predicted birth weight of a baby born at 40 weeks gestational age
A. 3632 B. 3747 C. 3977 D. 3862
11
What is the correlation between pressure and flux estimated to? Give
also an interpretation of the correlation.
Answer. R = 0.964
Interpretation. flux is found to increase with increasing pressure
12