0% found this document useful (0 votes)
10 views

5_Chapter9-linear regression

Uploaded by

hudabekawi42
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

5_Chapter9-linear regression

Uploaded by

hudabekawi42
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Chapter 9

Simple Linear Relationship & Correlation


Prepared by: Dr. Elias Dabeet

DISPLAYING THE RELATIONSHIP

DEFINITIONS:

Studies are often conducted to attempt to show that some explanatory


variable “causes” the values of some response variable to occur.

The response or dependent variable is the response of interest, the


variable we want to predict, and is usually denoted by y.

The explanatory or independent variable attempts to explain the


response and is usually denoted by x.

A scatter plot shows the relationship between two quantitative variables


x and y. The values of the x variable are marked on the horizontal axis,
and the values of the y variable are marked in the vertical axis. Each pair
of observations (xi, yi), is represented as a point in the plot.

Two variables are said to be positively associated if, as x increases, the


values of y tends to increase. Two variables are said to be negatively
associated if, as x increases, the values of y tends to decrease.

When a scatter plot does not show a particular direction, neither positive,
nor negative, we say that there is no linear association.

1
X Y
Student Midterm Final
Number Score Score
1 39 62
2 44 69
3 32 68
4 40 86
5 45 88.5
6 46 88.5
7 33 76
8 39 66.5
9 32.5 75
10 21 38
11 30 71
12 39 88
13 44 96.5
14 28.5 71.5
15 38 96
16 43 82.5
17 42 85
18 25.5 28
19 47 95
20 36 39
21 31.5 58
22 32 49
23 42 62
24 21 59
25 41 90

Scatterplot of Final vs Midterm Scores


100
90
80
Final 7 0
60
50 The 10th student
40 (21 , 38)
30
20
10
0
0 10 20 30 40 50
Midterm

2
Notes of Caution
1. An observed relationship between two variables does
not imply that there is some causal link between the
two variables.
For example, consider the following scatter-plot of IQ score versus shoe size:

IQ

Shoe Size

As a person ages their shoe size increases as well as their IQ. Although there
is a positive association, there is no causal link between the two variables
shoes size and IQ.

Most studies attempt to show that some explanatory variable "causes" the
values of the response to occur. While we can never positively determine
whether or not there is a distinct cause-and-effect relationship, we can assess
if there appears to be such relationship.

.
2. Sometimes a scatter plot, such as the one in Figure
below, shows a curvilinear relationship between the data.
In this situation, Methods for curvilinear relationships are
beyond the scope of this course.

3
Simple Linear Regression
Scatterplot of Final vs Midterm Scores
100
90
Final 8 0
70
60
50
40 Line #1
30
20
10 Line #2
0
0 10 20 30 40 50
Midterm

So the question remains as to how to find a “best-fitting” line?

Equation of a Line

y=a+bx where

b = slope - the amount y changes when x is increased by 1 unit.

a = y-intercept - the value of y when x is set equal to zero.

4
DEFINITION::
The least squares regression line, given by y  a  bx , is the
line that makes the sum of the squared vertical deviations of the
data points from the line as small as possible. Performing the
regression is often stated as regress y on x .

Least squares regression line for regressing final exam scores, y,


on midterm exam scores, x , is given by y  7.5  175
. x.

5
Estimated slope of b =1.75 tells us that for a 1-point increase on
the midterm we would expect, on average, an increase of 1.75
points on the final exam.

Estimated y -intercept of a =7.5 tells us that if someone were to


score 0 points on the midterm, we would predict they would get
7.5 points on the final exam.

Suppose a new student scores 40 points on the midterm. Based


on our model, what would be their predicted final exam score?
Plug the value of x =40 into our estimated equation. The predicted
final Exam score is y  7.5  175
. (40)  77.5 points.

6
Calculating the Least Squares Regression Line

The Least Squares Regression Line

The least squares regression line is given by y  a  bx where

 xi  x  yi  y  n  xi yi     xi   yi 
slope = b  
 i  x  x  2
n i i
x 2
   x  2

y – intercept = a  y  bx

Example
Test 1 versus Test 2—Obtaining the Regression Line “By Hand”

7
(a) Look at the relationship graphically with a scatter-plot to
confirm initially that a linear model seems appropriate.

(b) Calculate the estimated regression line by completing the


calculation table shown below.

n  xi yi     xi   yi  5 884   60 70 220


b   ..
 11
n  xi2     xi  5 760   60 2
2
200
70 60
a  y  bx    11
.  0.8.
5 5
Least squares equation:   0.8  11
y . x.
Slope of the line is b = 1.1.
This means that Test 2 scores are expected to go up by 1.1 points
on average for each additional point scored on test 1.

A student who scored 15 points on Test 1 is predicted to score


y  08
.  11
. (15)  17.3 points on Test 2.

8
Example:

S UB JE CT AGE X GL UCO S E L E VE L Y XY X2 Y2
1 43 99 4257 1849 9801

2 21 65 1365 441 4225

3 25 79 1975 625 6241

4 42 75 3150 1764 5625

5 57 87 4959 3249 7569

6 59 81 4779 3481 6561

Total 247 486 20485 11409 40022

n x i y i    x i  y i  620485  247 486


b   .385225
 2
i 
n  x   x i 
2
611409  247 
2

 .385225
486 247
a  y  bx   65.1416
6 6

ˆ  65.1416  0.385225 x
y

CORRELATION: HOW STRONG IS THE LINEAR RELATIONSHIP?

DEFINITION:

The sample correlation coefficient r measures the strength of

the linear relationship between two quantitative variables. It

describes the direction of the linear association and indicates how

closely the points in a scatter-plot are to the least squares

regression line.

9
Features of the correlation coefficient.
1. Range 1  r  1
2. Sign The sign of the correlation coefficient
indicates direction of association — negative [-
1 , 0) or positive (0 , +1].

3. Magnitude The magnitude of the correlation


coefficient indicates the strength of the linear
association. If the data follow a straight line
r  1 (if the slope is positive) or r  1 (if
the slope is negative), indicating a perfect
linear association. If r  0 then there is no
linear association.

4. Measures Strength The correlation only measures the


strength of the linear association.

5. Unit-less The correlation is computed using standard


scores of the two variables. It has no unit of
measure and the absolute value of r will not
change if the units of measurement for x or y
are changed. The correlation between x and y
is the same as the correlation between y and
x.

Some Pictures....
y
x
x
x
x
x x x
x
x
x x

x x
x

x Positive, moderate to strong linear


association, r  0.8 .

10
y

x x
x
x x
x
x x
x x
x
x x
x

x Negative, weak linear association, r  0.2


y

x x
x x
x x

x x

x A strong association, just not a linear one,


r  0.

How to Calculate the Correlation Coefficient r

The formula:

n xi y i    xi  y i 
r 

n  xi2   xi   2
 
n  y i2   y i 
2

Example Test 1 v e r s us Test 2 Obtaining t he Correlation


Coefficient
“By Hand”
We already have computed the summation quantities needed for
finding r, shown in the calculation table.

11
Completed Calculation Table
xi yi xi2 xi yi yi2

8 9 64 72 81
10 13 100 130 169
12 14 144 168 196
14 15 196 210 225
16 19 256 304 361

Total: x i  60 y i  70 x 2
i  760 x y i i  884 y 2
i  1032

n  xi yi     xi   yi  5(884)  (60)(70)
r   0.965
n  xi2     xi  n  yi2     yi  5(760)  (60) 2 5(1032)  (70) 2
2 2

The large positive correlation coefficient and the scatter-plot


indicate a strong, positive, linear association between Test 1 and
Test 2
scores.

12
Let’s Do It At home Exercise! Birth Rates
We gathered data from 1970 for twelve nations on the percentage
of women aged 14 or older who were economically active and the
crude birth rate. (We define the crude birth rate as the number of
births in a year per 1000 population size) We are interested in the
relationship of the crude birth rate (y) on the percentage of women
who were economically active (x) Nation x y
Algeria 2 48
Argentina 19 21
Denmark 34 14
a. Create the scatter-plot. E. Germany 40 11
Determine if there is a Guatemala 8 41
positive, negative, or India 12 37
association between x and y. Ireland 20 22
Jamaica 20 31
Japan 37 19
Philippines 19 42
USA 30 15
Soviet Union 46 18

b. Find the equation of the regression line. Interpret the slope.

c. Find the correlation coefficient r.

13
THE SQUARED CORRELATION r 2 — WHAT DOES IT
TELL US?
r = correlation coefficient, gives the strength and the direction of
the linear relationship between two quantitative variables x and y;
–1  r  1.

Note that when we square r we get => 0  r2  1. The value of r2


Is the percentage of variation of dependent variable that are
explained by the independent variable x.

r2 = 0.75 => about 75% of the variation in the response variable


y can be explained by the linear relationship between x and y.

Exercises:
1. A sample of 12 occupational therapy students were subjected to a study by local hospitals
to test whether their knowledge of chemistry depends upon their intelligence test scores.
The twelve students were given a chemistry test and an intelligence test. Determine the
regression equation that can be used for prediction and draw the line.

Intelligence 65 50 55 65 55 70 65 70 55 70 50 55
scores
Chemistry 85 74 76 90 85 87 94 98 81 91 76 74
test scores

2. A medical study at the college of orthopedics conducted a study to determine the


correlation between height and weight of female children. A sample of 6 children were
selected and the data is in the following table. Determine the correlation coefficient.

Height weight
12 18
10 17
14 23
11 19
12 20
9 15

14
3. Dr. Green (a pediatrician) wanted to test if there is a correlation between the number of meals
consumed by a child per day (X) and the child weight (Y). Included you will find a table containing
the information on 5 of the children. Use the table to answer the following:

Child Number of meals child weight X² Y² XY


consumed per (Y)
day (X)
Ahmad 11 8 121 64 88
Ali 16 11 256 121 176
Osama 12 9 144 81 108
Husien 19 13 361 169 247
Total 58 41 882 435 619

a. Determine the simple linear regression equation.


b. Determine the correlation coefficient. Interpret it in words.
c. Determine the coefficient of determination. Interpret it in words.
d. What is the expected child weight if the number of meals increased by 2 meals per day?

4. A hospital supervisor wishes to find the relationship between the number of nurses on a
job and the number of patients examined for a shift. Listed below is the result for a sample
of 4 days. Let the number of patients ( y ) be the dependent variable.

Nurses ( x ) Patients ( y ) x2 y2 xy
9 12
3 14
5 11
8 13

a. Compute  x ,  y ,  x 2 ,  y 2 ,  xy .
b. Determine the coefficient of correlation. Interpret its meaning.
c. Determine the estimated simple linear regression equation.
d. Determine the coefficient of determination. Interpret it in words.
e. If the number of nurses on a job changes by 3, what is the corresponding change in the
number of examined patients.

5. The data below was obtained in a study of age and systolic blood pressure of six randomly
selected subjects. Make a scatter plot to examine the relationship between (x) = age and (y) =
pressure. Comment on the relationship with respect to form, direction, strength, and any
departures or usual values.

Subject Age x Pressure y


A 43 128
B 48 120
C 56 135
D 61 143
E 67 141
F 70 152

15

You might also like