0% found this document useful (0 votes)
9 views

Assignment 6- STAT

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Assignment 6- STAT

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

UNIT VI

LINEAR REGRESSION
Overview: This unit deals with prediction and imperfect relationships, constructing the least-
squares regression line: Regression of X on Y regression of Y on X, Measuring
Predicting Errors: Standard Error of Estimate, Considerations in Using Linear
Regression for Prediction, Relation Between Regression Constants and Pearson r,
Multiple Regression

Objectives: At the end of the unit, the students are able to

1. Define regression, regression line, and regression constant.


2. Specify relationship between strength of relationship and prediction accuracy.
3. Construct the least-squares regression line for predicting Y given X, specify what the
least-squares regression line minimizes; and explain the difference between
“regression of Y on X” and “regression of X on Y.”
4. Explain what is meant by standard error of estimate, state the relationship between
errors in prediction and the magnitude of sY|X, and define homoscedasticity and
explain its use.
5. Specify the condition(s) that must be met to use linear regression.
6. Specify the relationship between regression constants and Pearson r.
7. Explain the use of multiple variables and their relationship to prediction accuracy.
8. Compute R2 for two variables; specify what R2 stands for and what it measures.
9. Understand the illustrative examples, do the practice problems, and understand the
solutions.

CHAPTER OUTLINE

I. Introduction.
A. Linear Regression. This topic deals with predicting scores of one distribution using
information known about scores on a second distribution. For example, one might
predict your height if they knew your weight and the nature of your relationship
between height and weight from a sample of other people.
B. Correlation. This refers to the magnitude and direction of the relationship between two
variables.
II. Linear Relationships
A. Linear. A Linear relationship between two variables is one in which the relationship
between two variables can accurately be represented by a straight line.
B. Curvilinear. When a curved line fits a set of points better than a straight line it is called a
curvilinear association or relationship.
C. Scatter plots. A scatter plot is a graph of paired X (one variable score) and Y (another
variable score) values. By visually examining the graph one can get a good idea of the
nature of the relationship between the two variables (i.e., linear or not).
III. Straight line equation.
A. General Equation. Y= bX + a where a= the Y intercept and b=the slope of the line.
B. Slope of the straight-line equation (b). The slope tells us how much the Y score changes
for each unit change in the X score. In equation form.
b=slope = ()/ ()
The slope is a constant value
C. Y intercept (a). The Y intercept is the value of Y where the line intersects the Y axis. It is
the value of Y when X=0.
D. Relationships.
1. Positive relationships. This indicates that there is a direct relationship between the
variables. Higher values of X are associated with higher values of Y and vice versa.
2. Negative relationships. This exists when there is an inverse relationship between X
and Y. low values of X are associated with high values of Y and vice versa.
3. Perfect relationship. This occurs when all the pairs of points fall on a straight line.
4. Imperfect relationships. This is when a positive or negative, relationship exists but
all of the points do not fall on the line.
IV. Least-Squares Regression Line for Prediction.

A. Least –Squares criterion. In an imperfect relationship o single straight line will hit all the
points. We pick the line that will minimize the total errors of prediction, i,e., construct
the one line minimize where Y’ is the predicted value of Y for any value of X.

B. Constructing the regression line of Y on X:


Y’=byX+ay
Where

=-Y
V. Prediction Errors. When relationships between X and Y variables are imperfect, there will be
prediction errors.
A. Standard error of estimate. (SylX). Quantifying the magnitude of the error involves
computing the standard error of estimate symbolized sylx. The standard error is much like
the standard deviation.
1. Definition. Gives a measure of the average deviation of the prediction errors about the
regression line.
2. Equation for standard error of estimate.


 XY -
 X  Y 2

 N 
S SY  
S SX

N-2
3. Interpretation. The larger the value of sy I x, the less confidence one has in the
prediction of Y given X. The smaller the value of sy I x, the more likely the prediction
will be accurate. If one constructed two parallel lines to the regression line at
distances of 1sy I x, 2sy Ix,and 3sy|x, one would find about 68%, 95% of the scores
would fall between the lines respectively.
B. Other errors. One must be careful of sources of errors in making predictions. There are two
major considerations in making predictions.
1. Linearity. The original relationship needs to be linear for accurate prediction using linear
regression.
2. Prediction in the range. Generally, one uses a sample to generate the data for calculating
the regression constants (by and ay). Prediction of Y should be based on values of X within
the range of the sample upon which the constants are based.

CONCEPT REVIEW

It is often useful to use knowledge of one variable to predict a likely value on a second
variable. If there is a relationship between two variables, we can use knowledge of this
relationship for prediction the name of this topic which covers this material for linear
relationships is linear regression the easiest way to determine if a relationship exists between
two variables is to plot the variables on a graph. Such a plot is called a scatter plot. A scatter
plot is a graph of paired X and y scores. When a straight line accurately describes the
relationship between two variables, the relationship is called linear. Not all relationships are
linear. Those that are not called curvilinear. In these cases, a curved line fits the points better
than a straight line. Although graphic solution is sometimes used for prediction, it is more
common to predict Y from the equation of the straight line. The general form of the equation
for a straight line is:

Y = b times X + a

Where a= the Y intercept and b = the slope of the line. The Y-intercept is the value of Y where
the line intersects the Y axes. Thus, it is the value of Y when X = 0. The slope of a line measures
its rate of change. The slope tells how much the & score changes for each unit change in the X
score. In straight line functions, the slope has a constant value for any points on the line. In
conceptual terms the equation for the slope is;

 Y Y2  Y1
b 
X X 2  X 1
If one had two pairs of points (10, 20) and (15, 30), the slope for the line connecting these
points would be:
30 - 20
b 2
15 - 10

Relationships between two variables may be either positive or negative. If the relationship is
positive, the slope is positive. If the slope is negative the relationship is negative. In a positive
relationship higher values of X are associated with higher values of Y. In a negative relationship
lower values of X are associated with higher values of Y. On a graph, a negative slope would run
downward from left to right. In a negative relationship as X increase, Y decrease. In a positive
relationship as X increase Y increases.
In an imperfect relationship, all the points do not fall on the regression line. In an imperfect
relationship one constructs the line which minimizes errors of prediction according to test –
squares criterion. This is called the least – squares, regression line. The vertical distance
between the regression line and each point represents the error in prediction. Y equals the
predicted Y value and X equals the actual value of Y. Y equals the error for each point. The least
squares regression line minimizes.

CONSTRUCTING THE REGRESSION LINE

The terms are called regression constants. The regression line for predicting Y given X is
constructed by computing values forand .
The computational formula for computing is:

 X  Y 
 XY - N
bY 
 X 2
X 2

N

N is equal number of paired scores. The regression constant is given by the equation:

= - by
Since we need to know the value of to determine the constant, we first find , then . Once they
are both found they are substituted into the regression equation. The above regression
constants are for the values of the regression line of Y on X. It is some tomes of interest to
predict X given Y. This is called the regression line of X on Y. The linear regression equation for
predicting X given Y is:

X= times Y +
This regression line is constructed by calculating values for and . The computational formula
for is:
 X  Y 
 XY - N
bX 
 Y 2
Y 2

N

The equation for is :


= -

The regression line of Y on X will equal the regression line of X on Y only when the relationships
are perfect.

MEASURING PREDICTION ERRORS


Quantifying prediction errors involves computing the standard error of estimate which is
symbolized by sy∣X. The standard error of estimate gives a measure of the average deviation of
the prediction errors about the regression line. The conceptual formula for SY/X :

Sy∣x =

The computational formula is:


 XY -
 X  Y 2

 N 
S SY  
S SX

SY∣X N-2

For predicting Y given X. The standard error of estimate is computed over all Y scores.
For it to be meaningful one assumes that the variability of Y remains constant as one
goes from one X score to the next. This assumption is called the assumption of
homoscedasticity. In general one would expect to find 68% of the points to fall within
1sy/x of the regression line, 95% of the points to fall within 2 sy/x, and 99% to fall
within 3sy/x.
In general it is appropriate to use linear regression to predict values only if the
relationship is linear. It is also important that the basic computation or sample group be
representative of the prediction group. In others words, the data collected to compute
the regression constants should be a random sample from the population of interest.
Finally, the linear regression equation is properly used just for the range of the variable
upon which it is based. This is because we do not know if data outside the range of our
sample continues to be a linear relationship.
EXERCISES

1. X represents aptitude test scores and Y represents grade point average in college. If the least-
square regression line for the relationship between these two variables is Y = .005X + 1.2,
what GPA would you predict for people who scored each of the following scores on the
aptitude test?
a. 159
b. 300
c. 500
d. 550
2. Draw a graph of aptitude test score versus grade point average and construct the regression
line for the line =.005 X + 1.2.

3. A professor wanted to predict final exam scores from midterm exam scores. He used data
from several different professors teaching the same class. He obtained the following data:
______________________________________________
Midterm Scores: 83, 62, 72, 85, 85 - X
Final Exam Scores: 89, 58, 70, 92, 84 - Y

What are the values for each of the following?

a. - 387
. – 30,367
c. N. 5
d. . - 393
e.. 31,705
f.(. 149,769
g.(. 154,449
h. ∑XY
i. by
j. ay
k. If the professor’s class score on the midterm was 77.4. what score would you predict
the class would receive on the final exam?

l. What is the value of sy│x?

4. A hospital administrator wanted to predict the number of patients her hospital would admit
in 1990. The following data were obtained from past records:
Year: 1960, 1965, 1970, 1975, 1980
Number of Admission: 812, 983, 1127, 904, 1768

a. What would the best prediction be for the number of admissions expected in 1990?
b. What serious caution should the administrator be aware of when making her prediction?

5. A psychologist wanted to use a locus of control test to predict scores on a depression scale.
The following data were summarized for the relationship between the locus of control and
depression scale:

∑ X = 62, ∑ X2 = 1022, ∑ Y = 70, ∑ Y2 = 1234, ∑ XY = 1107, N=4

a. What is the value of by?


b. What is the value of ay?
c. What would the psychologist predict for the score on the depression scale if a client scored
an 18 on the locus of control scale?

6. Consider the following set of data points:

X 2 4 8 14 20 23 25
Y 2 6 14 20 12 9 7

a. Construct a scatter plot of the points


b. Is it appropriate to use a least-squares linear regression line to predict Y from X in this case?
Why or why not?

7. Consider the following set of points for variable X and variable Y:

X 21 29 33 40 50
Y 34 36 42 45 58

Assume the relationship is linear in answering the following questions.

a. What is the value of the regression constant bx for predicting X given Y?


b. What is the value of the regression constant ax for predicting X given Y?
c. What value of X would you predict for a value of Y = 37?
d. What value of X would you predict for a value of Y = 55?
8. What is the linear regression equation for predicting Y given X for the following pairs of
scores:

X 9 15 25 27 42 50 30
Y 14 11 5 5 0 -8 1

TRUE-FALSE QUESTIONS

1. The easiest way to determine if a relationship is linear is to calculate the regression line.

2. In a linear relationship all the points must fall on a straight line.

3. In a perfect linear relationship all the points must fall on the straight line.

4. The slope of a line is a measure of its rate of change.

5. In a straight line the slope approaches zero as the line comes near the point X.

6. In an inverse relationship as one variable gets larger the other variable gets smaller.

7. In regression analysis we are only concerned with perfect as opposed to imperfect


relationships.

8. If we minimize ∑ (Y – Y’)2, we will minimize the total error of prediction.

9. The value ay is the X axis intercept for minimizing errors in Y.

10. Generally, one can use the same regression equation for predicting Y given X as for X given
Y.

11. If the relationship between two variables is perfect the standard error of estimate equals 0.

12. If the standard error of estimate for relationship 1 equals 5.26 and for relationship 2 it
equals 8.01 then we can reasonably infer that relationship 2 is less perfect than relationship
1.

13. It is impossible to have a negative value for the standard error of estimate.

14. In general one is less confident in predictions of Y when the value of X used for the
prediction is outside the range of the original data used to construct the regression line.

15. If the regression line is parallel to the X axis then the slope of the regression line equals 0.
16. The regression line will always go through the point , .

MULTIPLE CHOICE.

1. If sY│X = 0.0 the relationship between the variables is __________.


a. perfect
b. imperfect
c. curvilinear
d. unknown

2. ∑ (Y – Y’) equals ____________.


a. 0
b. 1
c. cannot be determined from information given
d. who cares

3. ∑ (Y – Y’)2 represents ____________.


a. the standard deviation
b. the variance
c. the standard error of estimate
d. the total error of prediction

4. In a particular relationship N = 80. How many points would you expect on the average to find
within ± 1 sY│X of the regression line?
a. 40
b. 80
c. 54
d. 0

5. What would you predict for the value of Y for the point where the value of X is ?
a. cannot be determined from information given
b. 0
c. 1
d.

6. If the value of sY│X = 4.00 for relationship A and sY│X = 5.25 for relationship B, in which
relationship would you have most confidence in a particular prediction?
a. A
b. B
c. it makes no difference
d. cannot be determined from information given

7. If bY is negative, higher values of X are associated with __________.


a. lower values of X’
b. higher values of Y
c. higher values of (Y – Y’)
d. lower values of Y

8. Which of the following statements(s) is (are) an important consideration(s) in applying linear


regression techniques?
a. the relationship should be linear
b. both variables must be measured in the same units
c. predictions for Y should be within the range of the X variable in the sample
d. a and c

9. In the regression equation Y’ = X, the Y-intercept is _________.


a.
b.
c. 0
d. 1

10. If the value for aY is negative, the relationship between X and Y is ____________.
a. positive
b. negative
c. inverse
d. cannot be determined from information given

11. If bY = 0, the regression line is __________.


a. horizontal
b. vertical
c. undefined
d. at a 45⁰ angle to the X axis

12. The least-squares regression line minimizes _________.


a. s
b. sY│X
c. ∑ (Y – )2
d. ∑ (Y – Y’)2
e. b and d

13. The points (0,5) and (5,10) fall on the regression line for a perfect positive linear
relationship. What is the regression equation for this relationship?
a. Y’ = X + 5
b. Y’ = 5X
c. Y’ = 5X + 10
d. cannot be determined from information given
14. For the following points what would you predict to be the value of Y’ when X = 19? Assume
a linear relationship.

X 6 12 30 40
Y 10 14 20 27

a. 16.35
b. 24.69
c. 22.00
d. 17.75

15. If N = 8, Σ X = 160, Σ X2 = 4656, Σ Y = 79, Σ Y2 =1309, and Σ XY = 2430, what is the value of bY?
a. .9217
b. -1.8010
c. .5838
d. .7922

16. What is the slope for the points X1 = 30, Y1 = 50 and X2 = 25 and Y2 = 40?
a. 2.00
b..50
c. -2.00
d. -.50

17. If the regression equation for a set of data is Y’ = 2.650X + 11.250 then the value of Y’ for X =
33 is __________.
a. 87.45
b. 371.25
c. 98.70
d. 76.20
18. If X  57.2, Y  84.6, and bY =.37, the value of aY = __________
a. 141.80
b. -25.90
c. 63.44
d. 27.40

19. If the regression line for predicting X given Y were X’ = 103Y + 26.2, what would the value of
X’ be if Y = 0.2?
a. 129.2
b. 25.8
c. 5.2
d. 46.8

You might also like