Correlation Regression
Correlation Regression
INTRODUCTION TO REGRESSION
AND CORRELATION
4
Correlation
Basic idea: Use data to identify relationships between two variables.
Or
5
i. Graphical Method (Scatter Diagram )
This is the simplest and the easiest method to investigate
the nature of correlation between the two variables.
According to this method ( xi , yi ) are n- paired values
(where i = 1 ,2, 3,….n ) . Plot the paired values of the two
variables x and y on graph paper and do not join the
plotted points by any way. We get some different types of
following relations.
6
If all the plotted points tend to lie
near a straight line ,the correlation is
said to be linear.
+ ve perfect correlation
7
If all the plotted points tend to lie
near a straight line ,the correlation is
said to be linear.
- ve perfect correlation
8
c
+ ve strong correlation
9
d
10
e
+ ve weak correlation
11
f
- ve weak correlation
12
No correlation between x and y g
13
h
14
ii) Mathematical Method
The Pearson correlation coefficient, often
referred to as the Pearson R test, is a statistical
formula that measures the strength between
variables and relationships. To determine how
strong the relationship is between two variables,
you need to find the coefficient value, which can
range between -1.00 and 1.00.
Term is Coefficient of Correlation denoted by r
r
n x y x y
x2 x n y y
2 2 2
n
15
Notation for the Linear Correlation Coefficient
n = Number of paired values
∑ = Denote the addition
∑x = sum of all x
∑ x2 = each x should be squared and then those square
added
( ∑ x )2 = indicates that the x should be added and the
total then squared.
∑ x y = indicates that each x should be first multiplied
by its corresponding y. After obtaining all
such products, find their sum.
16
Strength of Coefficient of Correlation
17
Examples
For a good looking
Height personality Correlation
between height and
weight either strong or
Perfect
Weight
18
Examples
Weight
19
Business
Profit
No. of Employees
An ideal Situation for a business
20
Example – 1
A nuclear engineer has been assigned the task of developing a
model to predict peak power load at a nuclear power plant.
Initially, the engineer will model peak power load as a function of
the high temperature for the day, based on the theory that higher
temperatures result in higher peak power loads. The high
temperature and peak power load were observed for a random
sample of six days, are listed, and are listed in the table.
21
22
300
250
Peak Power Load
200
150
100
50
0
0 20 40 60 80 100 120
High Temp.
23
Mathematical Method
x y x y x2 y2
92 207 19044 8464 42849
84 139 11676 7056 19321
95 211 20045 9025 44521
102 273 27846 10404 74529
88 156 13728 7744 24336
97 244 23668 9409 59536
558 1230 116007 52102 265092
x y x y x2 y 2
24
Example - 2
Windmill is used to generate direct current. Data are collected on 45
different days to determine the relationship between wind speed in
mi / h ( x ) and current in kA ( y ). The data are presented in the following
Table. Find the relationship between these two variables by graphical
and mathematical method. comments on results
Table
25
Day Wind Current Day Wind Current Day Wind Current
Speed Speed Speed
1 4.2 1.9 16 3.7 2.1 31 2.6 1.4
2 1.4 0.7 17 5.9 2.2 32 7.7 2.8
3 6.6 2.2 18 6.0 2.6 33 6.1 2.4
4 4.7 2.0 19 10.7 3.2 34 5.5 2.2
5 2.6 1.1 20 5.3 2.3 35 4.7 2.3
6 5.8 2.6 21 5.1 1.9 36 4.0 2.0
7 1.8 0.3 22 4.9 2.3 37 2.3 1.2
8 5.8 2.3 23 8.3 3.1 38 11.9 3.0
9 7.3 2.6 24 7.1 2.3 39 8.6 2.5
10 7.1 2.7 25 9.2 2.9 40 5.6 2.1
11 6.4 2.4 26 4.4 1.8 41 4.2 1.7
12 4.6 2.2 27 8.0 2.6 42 6.2 2.3
13 1.6 1.1 28 10.5 3.0 43 7.7 2.6
14 2.3 1.5 29 5.1 2.1 44 6.6 2.9
15 4.2 1.5 30 5.8 2.5 45 6.9 2.6
Sum x y x y x2 y
2
26
Coefficient of Correlation
n x y x y
r
n x
2
x 2
n y 2
y 2
45 618 72 257 98
r 0 89
45 1718 5 257 45 230 96 98
2 2
27
Case Study
Noise level at London Gatwick Airport
A study was conduct at London Gatwick Airport to investigate the
existing procedures for prediction of aircraft noise. The Aim was to
predict the perceived noise level ( P N L ) given the slant distance
( S D) in meters which is the distance from the point at which the
aircraft starts its take off to its position when it passes over the noise
recorder located beyond the end of run way. Data
i. Plot the following data and comments on graph
ii. Find the coefficient of correlation and comments on results
28
Analysis of Noise Level on an Airport
140
120
100
Noise Level
80
60
40
20
0
0 200 400 600 800 1000 1200
Slant Distance 29
S.No. SD = x P N L= y x y x2 y2
1 993 107 106251 986049 11449
2 1013 98 99274 1026169 9604
3 977 102 99654 954529 10404
4 182 120 21840 33124 14400
5 275 114 31350 75625 12996
6 96 123 11808 9216 15129
7 93 121 11253 8649 14641
8 994 100 99400 988036 10000
9 136 121 16456 18496 14641
10 204 119 24276 41616 14161
11 1015 97 98455 1030225 9409
12 996 101 100596 992016 10201
13 982 99 97218 964324 9801
14 242 117 28314 58564 13689
15 204 120 24480 41616 14400
16 149 120 17880 22201 14400
17 207 116 24012 42849 13456
18 211 116 24476 44521 13456
19 1037 100 103700 1075369 10000
20 178 115 20470 31684 13225
30
Total 10184 2226 1061163 8444878 249462
r
x y x y
n
n x x n y y
2 2 2 2
31
Regression
Basic idea: Use data to identify relationships among variables and
use these relationships to make predictions.
Regression analysis is the process of constructing a mathematical model
or function that can be used to predict or determine one variable by
another variable. Relationship between dependent and independent
variables. The concept of regression analysis deals with finding the best
relationship between Y and x.
Regression Equations
i. Simple Regression Equation
ii. Parabolic Regression Equation
iii. Multiple Regression
a = y-intercept
b = Slope
33
We have to estimate ‘a’ ( y-intercept ) and ‘b’ ( Slope ).To compute a and b use
the method of least square
Slope ‘b’
b
n xy x y
n x2 x 2
Y-intercept ‘a’
a y b x
a
y
b
x
n n
34
Similarly regression equation x on y is
X=c+dy
Slope ‘d’
d
n x y x y
n y2 y
2
X-intercept ‘c’
c x d y or
c
x
d
y
n n
35
Example – 1
36
x y x y x2 y2
92 207 19044 8464 42849
x y x y x 2
y 2
37
Power Load ( y ) on High Temperature ( x )
Power Load = a + b (High temperature )
Y=a+bx
b
n xy x y
n x2 x
2
a y b x
1230 558
a 7.77 517 . 98
6 6
38
Power Load = -517.98 + 7.77 ( 116 )
Power Load = 383.34
******************
Probability & Statistics for Engineers &; Scientists, (E I G H
T H E D I T I ON), by Ronald E. Walpole.
39
11.2 The grades of a class of 9 students on a midterm report (x) and on the final examination (y)
are as follows:
X : 77 50 71 72 81 94 96 99 67
Y: 82 66 78 34 47 85 99 99 68
(a) Estimate the linear regression line.
(b) Estimate the final examination grade of a student who received a grade of 85 on the midterm report.
11.3 A study was made on the amount of converted sugar in a certain process at various
temperatures. The data were coded and recorded as follows:
Temperature, x : 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
Converted Sugar, y: 8.1 7.8 8.5 9.8 9.5 8.9 8.6 10.2 9.3 9.2 10.5
(a) Estimate the linear regression line.
(b) Estimate the mean amount of converted sugar produced when the coded temperature is 1.75.
(c) Plot the residuals versus temperature. Comment.
11.4 In a certain type of metal test specimen, the normal stress on a specimen is known to
be functionally related to the shear resistance. The following is a set of coded experimental
data on the two variables:
Normal Stress,a: 26.8 25.4 28.9 23.6 27.7 23.9 24.7 28.1 26.9 27.4 22.6 25.6
Shear Resistance,y: 26.5 27.3 24.2 27.1 23.6 25.9 26.3 22.5 21.7 21.4 25.8 24.9
(a) Estimate the regression line ny\x = a + ,3x.
(b) Estimate the shear resistance for a normal stress of 24.5 kilograms per square centimeter.
•
40
11.5 The amounts of a chemical compound y, which dissolved in 100 grams of water at various
temperature, x were recorded as follows:
x (°C): 0 15 30 45 60 75
y (grams): 8 12 25 31 44 48
6 10 21 33 39 51
8 14 24 28 42 44
(a) Find the equation of the regression line.
(b) Graph the line on a scatter diagram.
(c) Estimate the amount of chemical that will dissolve in 100 grams of water at 50°C.
11.6 A mathematics placement test is given to all entering freshmen at a small college. A
student who receives a grade below 35 is denied admission to the regular mathematics
course and placed in a remedial class. The placement test scores and the final grades for
20 students who took the regular course were recorded as follows:
Placement Test : 50 35 35 40 55 65 35 60 90 35 90 80 60 60 60 40 55 50 65 50
Course Grade : 53 41 61 56 68 36 11 70 79 59 54 91 48 71 71 47 53 68 57 79
(a) Plot a scatter diagram.
(b) Find the equation of the regression line to predict course grades from placement test
scores.
(c) Graph the line on the scatter diagram.
(d) If 60 is the minimum passing grade, below which placement test score should students
in the future be denied admission to this course?
41
11.49 Compute and interpret the correlation coefficient for the following grades of 6 students
selected at random:
Mathematics grade: 70 92 80 74 65 83
English grade: 74 84 63 87 78 90
11.52 The following data were obtained in a study of the relationship between the weight
and chest size of infants at birth:
Weight (kg) Chest Size (cm)
27.5 29.5
2.15 26.3
4.41 32.2
5.52 36.5
3.21 27.2
4.32 27.7
2.31 28.3
4.30 30.3
3.71 28.7
Calculate r & What percentage of the variation in the infant chest sizes is explained by
difference in weight?
42