Correlation & Regression
Correlation & Regression
Correlation & Regression
Regression
Correlation
Wt. 67 69 85 83 74 81 97 92 114 85
(kg)
SBP 120 125 140 160 130 180 150 140 200 130
(mmHg)
Wt. 67 69 85 83 74 81 97 92 114 85
SBP(mmHg) (kg)
SBP 120 125 140 160 130 180 150 140 200 130
(mmHg)
220
200
180
160
140
120
100
80 wt (kg)
60 70 80 90 100 110 120
negative relationship
no relationship
Positive relationship
18
16
14
12
Height in CM
10
0
0 10 20 30 40 50 60 70 80 90
Age in Weeks
Negative relationship
Reliability
Age of Car
No relation
Correlation Coefficient
If r = l = perfect correlation.
How to compute the simple correlation
coefficient (r)
xy x y
r n
x
2
( x) 2
. y
2
( y) 2
n n
:Example
A sample of 6 children was selected, data about their
age in years and weight in kilograms was recorded as
shown in the following table . It is required to find the
correlation between age and weight.
xy x y
r n
x
2
( x) 2
. y
2
( y) 2
n n
Weight Age
Serial
Y2 X2 xy (Kg) (years)
.n
(y) (x)
144 49 84 12 7 1
64 36 48 8 6 2
144 64 96 12 8 3
100 25 50 10 5 4
121 36 66 11 6 5
169 81 117 13 9 6
r = 0.759
Interpretation of r:
Strong direct correlation
EXAMPLE: Relationship between Anxiety and
Test Scores
Anxiety Test X2 Y2 XY
)X( score (Y)
10 2 100 4 20
8 3 64 9 24
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
X = 32∑ Y = 32∑ X2 = 230∑ Y2 = 204∑ XY=129∑
Calculating Correlation Coefficient
r = - 0.94
Interpretation of r:
Indirect strong correlation
Regression Analysis
220
200
180
160
140
120
100
80
Wt (kg)
60 70 80 90 100 110 120
Get the constants for the
regression line
Now we see how to calculate the constants a: y intercept, where the
regression line will meet Y axis and slope i.e. gradient of the straight
line
180
120
mathematically 100
80
Wt (kg)
60 70 80 90 100 110 120
Intercept
Slope
Linear Equations
Y
ŷY = baX + bX
a
Change
b = S lo p e in Y
C h a n g e in X
a = Y -in te r c e p t
X
Using the Least-Square Method:
Problem I
The director of a sanitation department is interested in the
relationship between the age of a garbage truck and the
annual repair expense she could expect to incur. In order to
determine this relationship, the director has accumulated
information concerning four of the trucks the city currently
.owns Truck Number Age of the Truck (in Repair Expense during
years) (X) last year in Hundreds
of $ (Y)
101 5 7
102 3 7
103 3 6
104 1 4
b = 0.75
a = 3.75
If the city has a truck that is 4 years old,
predict the annual repair expense for the
.same
Answer: $ 675.00
Hours studying and grades
Regressing grades on hours
Linear Regression
90.00 Final grade in course = 59.95 + 3.17 * study
R-Square = 0.88
80.00
70.00
x n
2 41678
20
ŷ =112.13 + 0.4547 x
for age 25
B.P = 112.13 + 0.4547 * 25=123.49 = 123.5 mm hg
Checking the Estimating Equation
A crude way to verify the accuracy of the estimating equation
is to determine the graph of the sample points
Units 40 42 53 35 56 39 48 30 37 40
1 80,000 $1,200
2 29,000 $150
3 53,000 $650
4 13,000 $200
5 45,000 $325
:The regression equation is computed as
Y = 50 + .03 X
For example, if X=50,000 then Y = 50 + .03 (50,000) = $1,550
a=50 or the cost of maintenance when X=0; if there is no mileage on
the car, then the yearly cost of maintenance=$50
b=.03 the value that Y increases for each unit increase in X; for each
extra mile driven (X), the cost of yearly maintenance increases by
$.03
s.e.b = .0005; the value of b divided by s.e.b=60.0; the t-table
indicates that the b coefficient of X is statistically significant (it is
related to Y)
r2=.90 we can explain 90% of the variance in repair costs for
different vehicles if we know the vehicle mileage for each car