0% found this document useful (0 votes)
9 views7 pages

Regression Models - Follow

Chapter 4 discusses regression models, focusing on the relationship between a dependent variable and one or more independent variables. It outlines learning objectives, including identifying variables, developing regression equations, and testing model significance using Excel. The chapter also provides a practical example of predicting lunch spending based on breakfast spending, illustrating key concepts such as the coefficient of determination and correlation.

Uploaded by

happyfelix57
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views7 pages

Regression Models - Follow

Chapter 4 discusses regression models, focusing on the relationship between a dependent variable and one or more independent variables. It outlines learning objectives, including identifying variables, developing regression equations, and testing model significance using Excel. The chapter also provides a practical example of predicting lunch spending based on breakfast spending, illustrating key concepts such as the coefficient of determination and correlation.

Uploaded by

happyfelix57
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Chapter 4 - Regression Models

regression is an approach for modeling the relationship between a quantitative dependent variable y and one or more
explanatory variables (or independent variables) represented by X(s). The case of one explanatory variable is called
simple linear regression.

The main purposes of regression analysis are to understand relationship between/among variables and to predict one
variable based on the other(s).

Learning Objectives for this Chapter

At the completion of the Spring 2023 semester, students will be able to:

4.1 – Identify variables, visualize them using scatter diagram, and use them in a regression model

4.2 – Develop simple linear regression equations from a collected data and interpret the slope and intercept

4.3 – Compute the coefficient of Determination and the coefficient of correlation and interpret their meanings

4.4 – List assumptions used in regression and use residual plot to identify problems

4.5 – Test the model for significance

4.6 – Use Excel to do a regression analysis

4.7 – Develop a multiple regression model using excel and use it for prediction

We are going to use the example blow to go through learning objectives 4.1 to 4.6

A cafeteria at a local college would like to come up with a regression model that would predict what a student would
spend for lunch based on what they spent for breakfast. The collected data from a randomly selected students and the
result is shown below:

X(money spent on breakfast) 5 6 7 7 9 10 12


Y(money spent on lunch) 12 11 9 8 4 3 2

4.1 Scatter diagram

Y(money spent on lunch)


14
12
10
8
6
4
2
0
4 5 6 7 8 9 10 11 12 13

Speculation: There seem to be a negative linear relationship between what a given student spends for breakfast and
lunch. For this scatter plot, $ breakfast is our input, independent, or explanatory variable, whereas $ lunch is our output,
dependent, or response variable.
4.2 – Developing a simple linear model

The simple linear regression model is Y = β0 + β1X + ε  This is a general model

Where

Y is the dependent variable

X is the independent variable

β0 is the intercept (Y value when X = 0)

β1 is the slope of the regression line

ε is some random error

Simple regression model estimated for a data sample.

Ŷ = b0 + b1x where b0 and b1 are estimated values of the intercept and slope assuming that error are at minimum.
Note that error still exist and can be tabulated by E = (actual value Y) – (predicted value ŷ)

Here,

Ŷ is the predicted value of Y

b0 is the predicted value of β0, based on a sample

b1 is the predicted value of β1, based on a sample

How to compute these values?

We need to first compute the following

The best way to do this is to develop a table (You can use excel)
(X-
X Y (X-X̄ )^2==> Explanation X̄ )^2 (X-X̄ )(Y-Ȳ)==> Explanation (X-X̄ )(Y-Ȳ)
5 12 (5-8)^2 9 (5-8)(12-7) -15
6 11 (6-8)^2 4 -8
7 9 (7-8)^2 1 -2
7 8 (7-8)^2 1 -1
9 4 (9-8)^2 1 -3
10 3 (10-8)^2 4 -8
12 2 (12-8)^2 16 -20
Sum 56 49 36 -57

X̄ = 56/7 8

Ȳ = 49/7 7
b1 = -57/36  -1.58

b0 = 7- (-1.58)(8)  19.67

so, the simple regression equation is

4.3 – Measuring the Fit of the Regression Model

To know for a fact that the model developed is good enough to be used for prediction, we must start by computing the
coefficient of Determination (R 2) and the coefficient of correlation ( r ).

To do that, we must first compute the following:


 Sum of Square total or SST: This measures the total variability of Y about the mean.
SST = ∑(Y-Ȳ)2
 Sum of Suare error or SSE: This Measures the variability of Y about the regression line.
SSE = ∑(e)2  ∑(Y-Ŷ)2
 Sum of Squares Regression or SSR: This indicates how much total variability of Y can be explained by
the regression model.
SSR = ∑(Ŷ - Ȳ)2
Important relationship: Since SST = SSR+SSE, therefore SSR  SST-SSE.
X Y (Y-Ȳ)^2 Ŷ = bo - b1 X here (ŷ = 19.67-1.58x) (Y - Ŷ)^2 (Ŷ - Ȳ)^2
5 12 25 11.77 0.0529 22.7529
6 11 16 10.19 0.6561 10.1761
7 9 4 8.61 0.1521 2.5921
7 8 1 8.61 0.3721 2.5921
9 4 9 5.45 2.1025 2.4025
10 3 16 3.87 0.7569 9.7969
12 2 25 0.71 1.6641 39.5641
Sums 56 49 96 5.7567 89.8767
SST SSE SSR
X̄ = 56/7 8
Ȳ = 49/7 7
Coefficient of determination

The coefficient of determination (represented by R2) gives proportion of the variation in the dependent variable(Y) that
is predictable from the regression with the independent variable (X)

R2 = SSR/SST This is also the same as 1 – SSE/SST

For our question, R2 = 89.8767/96  0.936 or about 94%

Interpretation: About 94% of the variation in Y (money spent on lunch) can be explained by the regression with X
(money spent on breakfast). The remaining 6% are due to other fact (are due to error)

Coefficient of correlation

The quantity r, called the linear correlation coefficient, measures the strength and
the direction of a linear relationship between two variables.

r = +/- √ R2  important r has the same sign as the slope (b1) of the line of regression

Speculating on r based on the scatter diagram

The value of is such that -1 < r < +1. The + and – signs are used for positive
linear correlations and negative linear correlations, respectively.

Positive correlation: If x and y have a strong positive linear correlation, r is close


to +1. An r value of exactly +1 indicates a perfect positive fit. Positive values indicate a relationship between x and y
variables such that as values for x increases, values for y also increase.
Negative Correlation: If x and y have a strong negative linear correlation, r is close
to -1. An r value of exactly -1 indicates a perfect negative fit. Negative values
indicate a relationship between x and y such that as values for x increase, values
for y decrease.
No Correlation: If there is no linear correlation or a weak linear correlation, r is
close to 0. A value near zero means that there is a random, nonlinear relationship between the two variables
Note that r is a dimensionless quantity; that is, it does not depend on the units
employed.
Perfect Correlation: A perfect correlation of r = ± 1 happens only when the data points all lie exactly on a straight line. If
r = +1, the slope of this line is positive. If r = -1, the slope of this line is negative.

Negative Positive
-1 Strong - .7Moderated - .5 Weak 0 Weak .5 Moderated .7 Strong +1
In this case, r = -√ 0.936 = - 967

This is negative because the slope of the line of regression is also negative.

Interpretation: there is a strong negative linear relationship between the amount of money spent on breakfast and
the amount of money spent on lunch.

4.4 – Assumption of the Regression Model

We stated earlier that the linear regression model comes with errors in it due to the fact that we are not dealing with
perfectly aligned set of points… In other terms, the SSE in not always equal to 0, or R 2 is not always 100%. Therefore, we
have to make some assumptions about the errors in the regression model so that we can test it for significance. We
must make the following assumptions about the errors:

 The errors are independent.


 The errors are normally distributed
 The errors have a mean of zero
 The errors have constant variance (Regardless of values of X)

When assumptions are met, a plot of errors against the independent variable should appear to be random

In our example, we are going to plot X against Residual (Y - Ŷ ) and check for randomness

Ŷ = bo - b1 X here (ŷ =
X Y 19.67-1.58x) Residual (Y - Ŷ )
5 12 11.77 0.23
6 11 10.19 0.81
7 9 8.61 0.39
7 8 8.61 -0.61
9 4 5.45 -1.45
10 3 3.87 -0.87
12 2 0.71 1.29

Using Excel

Residual (Y - Ŷ )
1.5
1
0.5
0
4 5 6 7 8 9 10 11 12 13
-0.5
-1
-1.5
-2

We can see that the scatter plot appears to be random – You can use figure 4.4A, 4.4B, and 4.4C on page 118 to check
for likelihood of randomness. We want the residual plot to look like figure 4.4 a.
The next step is to estimate the variance.

While errors are assumed to have constant variance ( σ 2), it can only be estimated when a sample is collected. The
Mean Squared Error (MSE or s2) is a good estimate of the population variance σ 2.

S2 = MSE = SSE/(n-k-1), where n is the number of observations (pairs of points), and k it the number of independent
variables.

For our example, s2 = 5.7567/(7-1-1) = 1.15

For the sample variance, we can estimate the standard deviation by taking the square root of s 2.

Here, s = √ 1.15 = 1.07. This is also called the standard error estimate or standard deviation of the regression.

4.5 – Testing the Model for significance

Steps for Hypothesis Testing

1. Determine the Null Hypothesis (H0) and the Alternative Hypothesis (H1).
This is always

H0 : ᵝI = 0 The correlation is 0 (The correlation is not significant)


Ha : ᵝI ≠ 0 The correlation is 0 (The correlation significant)
2. Select the level of significance (Probability to reject H o) . This is either 0.05 or 0.01

3. Compute the calculated value of F. For our course, we will read that value on the regression
summary output.

4. Reject H0 if F calculated is greater than F critical (On F table)… and interpret the finding.

How to read the F table…


 Select a level of significance either 0.05 or 0.01

 Locate df1 or Degrees of freedom of the numerator (entry column on F table). DF1 is the number of
independent variables K.

 Locate df2 or Degrees of freedom of the denominator (entry row on F table). The value of dF 2 is by
the n – k – 1 (Sample size – number of independent variables – 1).

 The Critical value of f or F-critical (df1, df2) is going to be the number located at the junction of the
identify entry row (df1) and the entry column (df2).
For hour example
Step 1

H0 : ᵝI = 0 The correlation is 0 (The correlation is not significant)


Ha : ᵝI ≠ 0 The correlation is 0 (The correlation significant)
Step 1 We are going to use α = 0.05 to test our hypothesis

Step 3 Calculate the value of F statistic. Fcalculated = MSR/MSE

MSR = SSR/k 89.8767/1 = 89.8767

MSE = 1.15

Fcalculated = 89.8767/1.15 = 78.1536

Step 4: Decision: Reject Ho if the test statistics is greater than F critical (From the F table)

df1 = k = 1  first column

df2 = n-k-1 = 7-1-1 = 5 5th row

We are going to go to the F Table in appendix D, look for α = 0.05 (the first F distribution table) and go to first column
and fifth row  F0.05, 1, 5 = 6.61.

Here, since Fcalculated of 78.1536 is greater than Fcritical of 6.61, we are going to reject Ho. Therefore, the regression model is
significant. That is, prediction generated by the linear model ŷ = 19.67 – 1.58X will be reliable.

Try this
Additional examples X 4 5 6 8 10
Given the following pairs of points Y 13 16 8 3 2

a. Draw a scatter diagram and speculate on the linear relationship between x and y
b. Find the equation of the regression line
c. Compute the coefficient of determination and tell us what that means
d. Find the coefficient of correlation and determine the strength of the relationship between x and y
e. Is the linear relationship significant? Use alpha of 0.05 to test this hypothesis for significance

You might also like