5_Chapter9-linear regression
5_Chapter9-linear regression
DEFINITIONS:
When a scatter plot does not show a particular direction, neither positive,
nor negative, we say that there is no linear association.
1
X Y
Student Midterm Final
Number Score Score
1 39 62
2 44 69
3 32 68
4 40 86
5 45 88.5
6 46 88.5
7 33 76
8 39 66.5
9 32.5 75
10 21 38
11 30 71
12 39 88
13 44 96.5
14 28.5 71.5
15 38 96
16 43 82.5
17 42 85
18 25.5 28
19 47 95
20 36 39
21 31.5 58
22 32 49
23 42 62
24 21 59
25 41 90
2
Notes of Caution
1. An observed relationship between two variables does
not imply that there is some causal link between the
two variables.
For example, consider the following scatter-plot of IQ score versus shoe size:
IQ
Shoe Size
As a person ages their shoe size increases as well as their IQ. Although there
is a positive association, there is no causal link between the two variables
shoes size and IQ.
Most studies attempt to show that some explanatory variable "causes" the
values of the response to occur. While we can never positively determine
whether or not there is a distinct cause-and-effect relationship, we can assess
if there appears to be such relationship.
.
2. Sometimes a scatter plot, such as the one in Figure
below, shows a curvilinear relationship between the data.
In this situation, Methods for curvilinear relationships are
beyond the scope of this course.
3
Simple Linear Regression
Scatterplot of Final vs Midterm Scores
100
90
Final 8 0
70
60
50
40 Line #1
30
20
10 Line #2
0
0 10 20 30 40 50
Midterm
Equation of a Line
y=a+bx where
4
DEFINITION::
The least squares regression line, given by y a bx , is the
line that makes the sum of the squared vertical deviations of the
data points from the line as small as possible. Performing the
regression is often stated as regress y on x .
5
Estimated slope of b =1.75 tells us that for a 1-point increase on
the midterm we would expect, on average, an increase of 1.75
points on the final exam.
6
Calculating the Least Squares Regression Line
xi x yi y n xi yi xi yi
slope = b
i x x 2
n i i
x 2
x 2
y – intercept = a y bx
Example
Test 1 versus Test 2—Obtaining the Regression Line “By Hand”
7
(a) Look at the relationship graphically with a scatter-plot to
confirm initially that a linear model seems appropriate.
8
Example:
S UB JE CT AGE X GL UCO S E L E VE L Y XY X2 Y2
1 43 99 4257 1849 9801
.385225
486 247
a y bx 65.1416
6 6
ˆ 65.1416 0.385225 x
y
DEFINITION:
regression line.
9
Features of the correlation coefficient.
1. Range 1 r 1
2. Sign The sign of the correlation coefficient
indicates direction of association — negative [-
1 , 0) or positive (0 , +1].
Some Pictures....
y
x
x
x
x
x x x
x
x
x x
x x
x
10
y
x x
x
x x
x
x x
x x
x
x x
x
x x
x x
x x
x x
The formula:
n xi y i xi y i
r
n xi2 xi 2
n y i2 y i
2
11
Completed Calculation Table
xi yi xi2 xi yi yi2
8 9 64 72 81
10 13 100 130 169
12 14 144 168 196
14 15 196 210 225
16 19 256 304 361
Total: x i 60 y i 70 x 2
i 760 x y i i 884 y 2
i 1032
n xi yi xi yi 5(884) (60)(70)
r 0.965
n xi2 xi n yi2 yi 5(760) (60) 2 5(1032) (70) 2
2 2
12
Let’s Do It At home Exercise! Birth Rates
We gathered data from 1970 for twelve nations on the percentage
of women aged 14 or older who were economically active and the
crude birth rate. (We define the crude birth rate as the number of
births in a year per 1000 population size) We are interested in the
relationship of the crude birth rate (y) on the percentage of women
who were economically active (x) Nation x y
Algeria 2 48
Argentina 19 21
Denmark 34 14
a. Create the scatter-plot. E. Germany 40 11
Determine if there is a Guatemala 8 41
positive, negative, or India 12 37
association between x and y. Ireland 20 22
Jamaica 20 31
Japan 37 19
Philippines 19 42
USA 30 15
Soviet Union 46 18
13
THE SQUARED CORRELATION r 2 — WHAT DOES IT
TELL US?
r = correlation coefficient, gives the strength and the direction of
the linear relationship between two quantitative variables x and y;
–1 r 1.
Exercises:
1. A sample of 12 occupational therapy students were subjected to a study by local hospitals
to test whether their knowledge of chemistry depends upon their intelligence test scores.
The twelve students were given a chemistry test and an intelligence test. Determine the
regression equation that can be used for prediction and draw the line.
Intelligence 65 50 55 65 55 70 65 70 55 70 50 55
scores
Chemistry 85 74 76 90 85 87 94 98 81 91 76 74
test scores
Height weight
12 18
10 17
14 23
11 19
12 20
9 15
14
3. Dr. Green (a pediatrician) wanted to test if there is a correlation between the number of meals
consumed by a child per day (X) and the child weight (Y). Included you will find a table containing
the information on 5 of the children. Use the table to answer the following:
4. A hospital supervisor wishes to find the relationship between the number of nurses on a
job and the number of patients examined for a shift. Listed below is the result for a sample
of 4 days. Let the number of patients ( y ) be the dependent variable.
Nurses ( x ) Patients ( y ) x2 y2 xy
9 12
3 14
5 11
8 13
a. Compute x , y , x 2 , y 2 , xy .
b. Determine the coefficient of correlation. Interpret its meaning.
c. Determine the estimated simple linear regression equation.
d. Determine the coefficient of determination. Interpret it in words.
e. If the number of nurses on a job changes by 3, what is the corresponding change in the
number of examined patients.
5. The data below was obtained in a study of age and systolic blood pressure of six randomly
selected subjects. Make a scatter plot to examine the relationship between (x) = age and (y) =
pressure. Comment on the relationship with respect to form, direction, strength, and any
departures or usual values.
15