0% found this document useful (0 votes)
47 views

Chapter11 - Simple Regression

1. This document discusses simple linear regression and correlation. It introduces simple linear regression models and how to draw the line of best fit to predict dependent variables from independent variables. 2. Equations for the simple linear regression line are provided to estimate the population regression line using the least squares method. The slope and intercept are interpreted. 3. An example of using simple linear regression to predict house prices from square footage is shown and the prediction for a 2000 square foot house is calculated. Metrics like error sum of squares and variance estimates are also explained.

Uploaded by

hallulel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Chapter11 - Simple Regression

1. This document discusses simple linear regression and correlation. It introduces simple linear regression models and how to draw the line of best fit to predict dependent variables from independent variables. 2. Equations for the simple linear regression line are provided to estimate the population regression line using the least squares method. The slope and intercept are interpreted. 3. An example of using simple linear regression to predict house prices from square footage is shown and the prediction for a 2000 square foot house is calculated. Metrics like error sum of squares and variance estimates are also explained.

Uploaded by

hallulel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Contents

CHAPTER 11. 1. Simple Linear Regression


SIMPLE LINEAR REGRESSION
AND CORRELATION 2. Hypothesis Tests in Simple Linear Regression

[email protected] N
3. Correlation

Introduction
▪ A scatter plot can be used to show the relationship between two variables.
▪ Regression analysis is used to:
Predict the value of a dependent variable based on the value of at least one
independent variable
SIMPLE LINEAR REGRESSION Explain the impact of changes in an independent variable on the dependent
variable
▪ Dependent variable Y: the variable we wish to predict or explain.
▪ Independent variable X: the variable used to predict or explain the dependent
variable.

1
Regression Models Types of Relationships

Types of Relationships Simple Linear Regression Model

2
Simple Linear Regression Equation
Simple Linear Regression Model (Prediction Line)
The simple linear regression equation provides an estimate of the population
regression line

Simple Linear Regression Equation


(Prediction Line) The Least Squares Method
How would you draw a line through the points? ෢0 and 𝛽
𝛽 ෢1 are obtained by finding the values that minimize the sum of the
How do you determine which line ‘fits best’? squared differences between Y and 𝑌෠
n

y
LS minimizes  ˆ
i =1
i
2
= ˆ12 + ˆ22 + ˆ32 + ˆ42
60
40 ෡ 𝟏 = 𝑺𝒙𝒚; 𝜷
𝜷 ෡𝟎 = 𝒚 ෡𝟏ഥ
ഥ−𝜷 𝒙,
𝑺𝒙𝒙
σ 𝒙𝒊 𝟐
20 𝑺𝒙𝒙 = ෍ 𝒙𝒊 − 𝒙 𝟐
= ෍ 𝒙𝒊 𝟐 −
𝒏
0 x σ 𝒙𝒊 σ 𝒚𝒊
𝑺𝒙𝒚 = σ 𝒙𝒊 − 𝒙 𝒚𝒊 − 𝒚 = σ 𝒙𝒊 𝒚𝒊 −
0 20 40 60 𝒏

3
Interpretation of the Slope and the
Intercept Example
෢0 is the estimated mean value of Y when the value of X is zero
𝛽 A real estate agent wishes to examine House Price in $1000s Square Feet
෢1 is the estimated change in the mean value of Y as a result of a one-unit the relationship between the selling (Y) (X)
𝛽
increase in X price of a home and its size (measured 245 1400
in square feet). A random sample of 10 312 1600
houses is selected 279 1700
Dependent variable (Y) = house price in 308 1875
$1000s 199 1100
Independent variable (X) = square feet 219 1550
405 2350
324 2450
319 1425
255 1700

Example Example (using Excel)


Enter Y range and X range and desired options

4
Example (using Excel) Example (using Excel)
Add-Ins: PHStat: Regression: Simple Linear Regression Excel output:

Example Standard error of estimate


Predict the price for a house with 2000 square feet:
• Total sum of squares:

house price = 98.25 + 0.1098 (sq.ft.) 𝑆𝑆𝑇 = σ 𝑦𝑖 − 𝑦 2


= σ 𝑦𝑖 2 −
σ 𝑦𝑖 2
𝑛
• Regression sum of squares:
= 98.25 + 0.1098(200 0) 𝑆𝑆𝑅 = σ 𝑦ො𝑖 − 𝑦 2
= 𝛽መ1 𝑆𝑥𝑦
• Error sum of squares:
= 317.85 𝑆𝑆𝐸 = ෍ 𝑦𝑖 − 𝑦ො𝑖 2
= 𝑆𝑆𝑇 − 𝑆𝑆𝑅

The predicted price for a house with 2000 2


• An unbiased estimator of 𝜎 is
𝑆𝑆𝐸
square feet is 317.85($1,000s) = $317,850 𝜎ො 2 =
𝑛−2

5
Example
A mail-order firm is interested in estimating the number of order that need to be
processed on a given day from the weight of the mail received. A close
ෝ𝟐 SSR
monitoring of mail on 4 randomly selected business days produced the results 𝝈
𝑺𝑺𝑬
below.
𝑺𝑺𝑻

a) Find the equation of the least squares regression line relating the number of
orders to the weight of the mail and use this equation to predict the number of
orders when x = 15.
b) Find error sum of squares and the estimate of the variance of the random error.

Test hypothesis about the slope and intercept


Remark
• Estimated of regression slope 𝛽1 is 𝛽መ1
• Estimated of regression intercept 𝛽0 is 𝛽መ0
ෝ2
𝜎
• Estimated standard error of the slope is 𝑠𝑒 𝛽መ1 =
HYPOTHESIS TEST • Estimated standard error of the intercept is
𝑆𝑥𝑥

1 𝑥ҧ 2
𝑠𝑒 𝛽መ0 = 𝜎ො 2 ( + )
𝑛 𝑆𝑥𝑥
• We use t-test with degree of freedom df = n - 2 to test for
𝐻0 : 𝛽𝑖 = 𝛽𝑖,0 , 𝑖 = 1,2.

6
Test hypothesis about the slope and intercept Test for significance of regression
Remark
If 𝛽1 = 0 then X is NOT significant in explaining the values of Y.
We say that the (linear) regression is not significant.

So to test of significance of regression we can use t-test for


𝐻0 : 𝛽1 = 0 versus 𝐻1 : 𝛽1 ≠ 0

If we reject 𝐻0 : 𝛽1 = 0, we support 𝐻1 : 𝛽1 ≠ 0, and then the regression is


significant.
If we fail to reject 𝐻0 : 𝛽1 = 0, the regression is not significant.

Inferences About the Slope: t Test Example


▪ t test for a population slope Estimated Regression Equation: House Price in
Square Feet
$1000s
(x)
Is there a linear relationship between X and Y? house price = 98.25 + 0.1098 (sq. ft.) (y)

245 1400
▪ Null and alternative hypotheses (These hypotheses relate to the significance of The slope of this model is 0.1098
312 1600
regression).
Is there a relationship between the square 279 1700
• H0: β1 = 0 (no linear relationship) footage of the house and its sales price? 308 1875
• H1: β1 ≠ 0 (linear relationship does exist) 199 1100

219 1550
▪ Test statistic
d.f. = n − 2 405 2350
324 2450

319 1425

255 1700

7
Example Correlation coefficient
H0: β1 = 0 To measure the strength of the linear relationship between X and Y we can use
H1: β1 ≠ 0 the correlation coefficient ρ.
𝛽መ1 = 0.10977 Properties of ρ:
𝑠𝑒 𝛽መ1 = 0.03297
1. −1 ≤ 𝜌 ≤ 1
𝑡0 = 3.32938, 𝑡0.025,8 = 2.306
df = 10 – 2 = 8 2. If ρ ∼ 1, then there is a strong positive linear regression.
Decision: Reject 𝐻0
3. If ρ ∼ -1, then there is a strong negative linear regression.
There is sufficient evidence that square footage
affects house price 4. If ρ ∼ 0, then linear relation between X and Y is weak.

Correlation coefficient Correlation coefficient


Sample correlation coefficient R
𝑆𝑥𝑦
R=
𝑆𝑥𝑥 𝑆𝑆𝑇

Remark:
1. −1 ≤ 𝑅 ≤ 1
𝑆𝑆 𝑆𝑆
2. 𝑅 2 = 𝑅 = 1 − 𝐸 is called the coefficient of determination and it is often
𝑆𝑆𝑇 𝑆𝑆𝑇
used to judge the adequacy of a regression model.
3. R and 𝛽1 have same sign.
4. R and 𝑅 2 both measure the strength of a linear relationship.

8
Examples Test for Zero Correlation (t-test)
Example1: In a regression problem the following pairs of (x, y) are given (-4; 8); ▪ Hypotheses: 𝐻0 : 𝜌 = 0 vs 𝐻1 : 𝜌 ≠ 0
(-1; 3); (0; 0); (1; -3). ▪ Test statistic: df = n - 2
What does this indicate about the value of coefficient of correlation and
coefficient of determination?

Example 2:
The least squares regression line is 𝑦ො = −2.87 − 1.6𝑥 and a coefficient of
determination of 0.36.

What is the coefficient of correlation?

Test for Zero Correlation (t-test) Exercises


Example: You want to explore the relationship between the grades students 1. A group of adults is weighted, and their maximum speed when sprinting is
receive on their first two exams. For a sample of 25 students, you find a measured:
correlation of 0.45. What is your conclusion in testing 𝐻0 : 𝜌 = 0 versus 𝐻1 : 𝜌 ≠ 0
at significant level α = 0.05.

a. Find R
b. Describe the correlation between weight and maximum speed.
Answer.
a. R = - 0.813
b. Negative correlation

9
Exercises Exercises
A paired data set has n = 5, σ𝑥 = 15, σ𝑦 = 27, σ𝑥𝑦 = 100, and σ𝑥 2 = 55.
Find 𝑆𝑥𝑦
A. 1.9
B. 10
C. 19

Exercises Exercises
A paired data set has n = 5, σ𝑥 = 15, σ𝑦 = 27, σ𝑥𝑦 = 100, and σ𝑥 2 = 55. A paired data set has n = 5, σ𝑥 = 15, σ𝑦 = 27, σ𝑥𝑦 = 100, and σ𝑥 2 = 55.
Find 𝑆𝑥𝑥 The slope of the regression line is
A. 1.9 A. 1.9 B. 10 C. 19
B. 10 The intercept of the regression line is
C. 19 A. 1.9 B. 5.4 C. – 0.3

10
Exercises Exercises
A paired data set has 𝑥ҧ = 10, 𝑦ത = 8, and slope of the regression line 1.5. The equation of the regression line is 𝑦ො = 1.2𝑥 − 3.4. Compute the residual for
the point (7, 6).
The intercept of the regression line is
A. 1 B. – 1 C. 5
A. 8 B. – 7 C. 23

Exercises
A linear regression analysis of Birth Weight (grams) and Gestational Age (weeks)
gave the following output.

Calculate the predicted birth weight of a baby born at 40 weeks gestational age
A. 3632 B. 3747 C. 3977 D. 3862

11
What is the correlation between pressure and flux estimated to? Give
also an interpretation of the correlation.

Answer. R = 0.964
Interpretation. flux is found to increase with increasing pressure

12

You might also like