0% found this document useful (0 votes)
90 views42 pages

Correlation Regression

This document introduces various types of variables commonly used in statistics such as categorical, continuous, dependent, independent, and provides examples. It then discusses correlation, both graphically using scatter plots and mathematically using the Pearson correlation coefficient. A positive correlation indicates variables increase together, while a negative correlation indicates one variable increases as the other decreases. The strength of correlation is classified based on the coefficient value from perfect (±1) to negligible (0). Two examples are provided to demonstrate finding the correlation between two variables graphically and mathematically using data tables.

Uploaded by

Yumna Saleem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views42 pages

Correlation Regression

This document introduces various types of variables commonly used in statistics such as categorical, continuous, dependent, independent, and provides examples. It then discusses correlation, both graphically using scatter plots and mathematically using the Pearson correlation coefficient. A positive correlation indicates variables increase together, while a negative correlation indicates one variable increases as the other decreases. The strength of correlation is classified based on the coefficient value from perfect (±1) to negligible (0). Two examples are provided to demonstrate finding the correlation between two variables graphically and mathematically using data tables.

Uploaded by

Yumna Saleem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 42

1

INTRODUCTION TO REGRESSION
AND CORRELATION

A “variable” in algebra really just means one thing—an unknown


value. However, in statistics, you’ll come across dozens types of
variables. In most cases, the word still means that you’re dealing
with something that’s unknown, but—unlike in algebra—that
unknown isn’t always a number. Some variable types are used
more than others.
2

Common Types of Variables
Categorical variable: variables that can be put into categories. For example, the category
“Toothpaste Brands” might contain the variables Colgate and Aquafresh etc.
• Continuous variable: a variable with infinite number of values, like “time” or “weight”.
• Dependent variable: the outcome of an experiment. As you change the independent
variable, there will some change in the dependent variable.
• Discrete variable: a variable that can only take on a certain number of values. For example,
“number of cars in a parking lot” is discrete because a car park can only hold so many cars.
• Independent variable: a variable that is not affected by anything that you, the researcher,
does. Usually plotted on the x-axis.
• Nominal variable: another name for categorical variable.
• Ordinal variable: similar to a categorical variable, but there is a clear order. For example,
income levels of low, middle, and high could be considered ordinal.
• Qualitative variable: a broad category for any variable that can’t be counted (i.e. has no
numerical value). Nominal and ordinal variables fall under this umbrella term.
• Quantitative variable: A broad category that includes any variable that can be counted, or
has a numerical value associated with it. Examples of variables that fall into this category
include discrete variables and ratio variables.
• Random variables are associated with random processes and give numbers to outcomes of
random events.
3
Correlation

4
Correlation
Basic idea: Use data to identify relationships between two variables.

Correlation is the numerical measure the degree of relationship


between two random variables.

Or

A linear association between two random variable.

There are two methods to find this association / relation.

i. Graphical Method (Scatter Diagram )

ii. Mathematical Method ( Coefficient of Correlation )

5
i. Graphical Method (Scatter Diagram )
This is the simplest and the easiest method to investigate
the nature of correlation between the two variables.
According to this method ( xi , yi ) are n- paired values
(where i = 1 ,2, 3,….n ) . Plot the paired values of the two
variables x and y on graph paper and do not join the
plotted points by any way. We get some different types of
following relations.

6
If all the plotted points tend to lie
near a straight line ,the correlation is
said to be linear.

+ ve perfect correlation

7
If all the plotted points tend to lie
near a straight line ,the correlation is
said to be linear.

- ve perfect correlation

8
c

+ ve strong correlation

9
d

-ve strong correlation

10
e

+ ve weak correlation

11
f

- ve weak correlation

12
No correlation between x and y g

13
h

Nonlinear correlation between x and y

14
ii) Mathematical Method
The Pearson correlation coefficient, often
referred to as the Pearson R test, is a statistical
formula that measures the strength between
variables and relationships. To determine how
strong the relationship is between two variables,
you need to find the coefficient value, which can
range between -1.00 and 1.00.
Term is Coefficient of Correlation denoted by r

r 
n x y   x y
 x2    x  n y    y 
2 2 2
n

15
Notation for the Linear Correlation Coefficient
n = Number of paired values
∑ = Denote the addition
∑x = sum of all x
∑ x2 = each x should be squared and then those square
added
( ∑ x )2 = indicates that the x should be added and the
total then squared.
∑ x y = indicates that each x should be first multiplied
by its corresponding y. After obtaining all
such products, find their sum.

16
Strength of Coefficient of Correlation

Coefficient of Correlation Degree of Association


±1 Perfect
± 0.9 Strong
0.5 ± 0.8 Moderate
± 0.2 ± 0.49 Weak
0 ± 0.19 Negligible

17
Examples
For a good looking
Height personality Correlation
between height and
weight either strong or
Perfect

Weight

18
Examples

Weight

For a good looking


personality Correlation
between height and
weight either strong or
Perfect
Height

19
Business
Profit

No. of Employees
An ideal Situation for a business

20
Example – 1
A nuclear engineer has been assigned the task of developing a
model to predict peak power load at a nuclear power plant.
Initially, the engineer will model peak power load as a function of
the high temperature for the day, based on the theory that higher
temperatures result in higher peak power loads. The high
temperature and peak power load were observed for a random
sample of six days, are listed, and are listed in the table.

High Temperature x , oF 92 84 95 102 88 97


Peak Power load y , Megawatts 207 139 211 273 156 244

i. Draw a scatter plot


ii. Find the co efficient of correlation and comment on it
iii. Fit a regression line y on x Estimate power load if high
temperature 116 oF

21
22
300

250
Peak Power Load

200

150

100

50

0
0 20 40 60 80 100 120

High Temp.

23
Mathematical Method

x y x y x2 y2
92 207 19044 8464 42849
84 139 11676 7056 19321
95 211 20045 9025 44521
102 273 27846 10404 74529
88 156 13728 7744 24336
97 244 23668 9409 59536
558 1230 116007 52102 265092
x y  x  y  x2 y 2

6  116007 - 558  1230


r  0.985
6  52102 -  558  6  265092 -  1230 
2 2

24
Example - 2
Windmill is used to generate direct current. Data are collected on 45
different days to determine the relationship between wind speed in
mi / h ( x ) and current in kA ( y ). The data are presented in the following
Table. Find the relationship between these two variables by graphical
and mathematical method. comments on results

Table

25
Day Wind Current Day Wind Current Day Wind Current
Speed Speed Speed
1 4.2 1.9 16 3.7 2.1 31 2.6 1.4
2 1.4 0.7 17 5.9 2.2 32 7.7 2.8
3 6.6 2.2 18 6.0 2.6 33 6.1 2.4
4 4.7 2.0 19 10.7 3.2 34 5.5 2.2
5 2.6 1.1 20 5.3 2.3 35 4.7 2.3
6 5.8 2.6 21 5.1 1.9 36 4.0 2.0
7 1.8 0.3 22 4.9 2.3 37 2.3 1.2
8 5.8 2.3 23 8.3 3.1 38 11.9 3.0
9 7.3 2.6 24 7.1 2.3 39 8.6 2.5
10 7.1 2.7 25 9.2 2.9 40 5.6 2.1
11 6.4 2.4 26 4.4 1.8 41 4.2 1.7
12 4.6 2.2 27 8.0 2.6 42 6.2 2.3
13 1.6 1.1 28 10.5 3.0 43 7.7 2.6
14 2.3 1.5 29 5.1 2.1 44 6.6 2.9
15 4.2 1.5 30 5.8 2.5 45 6.9 2.6

Total 257 98 618.72 1718.5 230.96

Sum  x  y  x y  x2  y
2

26
Coefficient of Correlation

n  x  y  x y
r
n  x 
2
 x 2
n  y 2
  y  2

45  618  72  257  98
r  0  89
45  1718  5   257  45  230  96   98 
2 2

27
Case Study
Noise level at London Gatwick Airport
A study was conduct at London Gatwick Airport to investigate the
existing procedures for prediction of aircraft noise. The Aim was to
predict the perceived noise level ( P N L ) given the slant distance
( S D) in meters which is the distance from the point at which the
aircraft starts its take off to its position when it passes over the noise
recorder located beyond the end of run way. Data
i. Plot the following data and comments on graph
ii. Find the coefficient of correlation and comments on results

28
Analysis of Noise Level on an Airport

140

120

100
Noise Level

80

60

40

20

0
0 200 400 600 800 1000 1200

Slant Distance 29
S.No. SD = x P N L= y x y x2 y2
1 993 107 106251 986049 11449
2 1013 98 99274 1026169 9604
3 977 102 99654 954529 10404
4 182 120 21840 33124 14400
5 275 114 31350 75625 12996
6 96 123 11808 9216 15129
7 93 121 11253 8649 14641
8 994 100 99400 988036 10000
9 136 121 16456 18496 14641
10 204 119 24276 41616 14161
11 1015 97 98455 1030225 9409
12 996 101 100596 992016 10201
13 982 99 97218 964324 9801
14 242 117 28314 58564 13689
15 204 120 24480 41616 14400
16 149 120 17880 22201 14400
17 207 116 24012 42849 13456
18 211 116 24476 44521 13456
19 1037 100 103700 1075369 10000
20 178 115 20470 31684 13225
30
Total 10184 2226 1061163 8444878 249462
r 
x y   x y
n
n x    x  n y    y 
2 2 2 2

20  1061163  10184  2226


r   0  969
20  8444878   10184 
2
20  249462   2226  2

31
Regression
Basic idea: Use data to identify relationships among variables and
use these relationships to make predictions.
Regression analysis is the process of constructing a mathematical model
or function that can be used to predict or determine one variable by
another variable. Relationship between dependent and independent
variables. The concept of regression analysis deals with finding the best
relationship between Y and x.

Regression Equations
i. Simple Regression Equation
ii. Parabolic Regression Equation
iii. Multiple Regression

i.Simple Regression Linear Model: The most elementary regression model is


called simple regression, which is bivariate linear regression ,which means
that it involves only two variables. In simple regression analysis, only a
straight-line relationship between two variables is examined.
32
The first step in determining the equation of the regression line that passes
through the sample data is to establish the equation's from. Several different
types of equations of lines are discussed in algebra, finite math or analytic
geometry courses. Equation of a line are the two - point form, the point-slope
form, and the slope- intercept form. In regression analysis, researchers use
the Slope-intercept equation of a line.
Linear regression equation y on x is [ Y depends of x ]
Y=a+bx

Y= Dependent Variable ( Response )

a = y-intercept

b = Slope

x = Independent Variable ( Predictor )

33
We have to estimate ‘a’ ( y-intercept ) and ‘b’ ( Slope ).To compute a and b use
the method of least square

Slope ‘b’

b 
n  xy  x  y
n  x2   x  2

Y-intercept ‘a’

 
a  y  b x

a 
 y
b
 x
n n

34
Similarly regression equation x on y is
X=c+dy

Slope ‘d’
d 
n  x y   x  y
n  y2   y 
2

X-intercept ‘c’
 
c x  d y or

c
 x
d
y
n n

35
Example – 1

A nuclear engineer has been assigned the task of developing a model to


predict peak power load at a nuclear power plant. Initially, the engineer will
model peak power load as a function of the high temperature for the day,
based on the theory that higher temperatures result in higher peak power
loads. The high temperature and peak power load were observed for a
random sample of six days, are listed, and are listed in the table.

High Temperature x , oF 92 84 95 102 88 97


Peak Power load y , Megawatts 207 139 211 273 156 244

i. Draw a scatter plot


ii. Find the co efficient of correlation and comment on it
iii. Fit a regression line y on x Estimate power
load if high temperature 116 oF

36
x y x y x2 y2
92 207 19044 8464 42849

84 139 11676 7056 19321 b 


n  xy  x  y
n  x2   x 
2

95 211 20045 9025 44521

102 273 27846 10404 74529 6  142407  558  1230


b   7.77
6  52102   558 
2
88 156 40128 7744 207936

97 244 23668 9409 59536

x y  x  y x 2
y 2

558 1230 142407 52102 448692

37
Power Load ( y ) on High Temperature ( x )
Power Load = a + b (High temperature )

Y=a+bx

b 
n  xy  x  y
n  x2   x 
2

6  142407  558  1230


b   7.77
6  52102   558 
2

 
a  y  b x
1230 558
a   7.77    517 . 98
6 6

Power Load = -517.98 + 7.77 ( Temp.)

38
Power Load = -517.98 + 7.77 ( 116 )
Power Load = 383.34

******************
Probability & Statistics for Engineers &; Scientists, (E I G H
T H E D I T I ON), by Ronald E. Walpole.

Simple Linear Correlation and Regression, Questions: 11.2


to 11.6 and 11.49 & 11.52

39
11.2 The grades of a class of 9 students on a midterm report (x) and on the final examination (y)
are as follows:
X : 77 50 71 72 81 94 96 99 67
Y: 82 66 78 34 47 85 99 99 68
(a) Estimate the linear regression line.
(b) Estimate the final examination grade of a student who received a grade of 85 on the midterm report.
11.3 A study was made on the amount of converted sugar in a certain process at various
temperatures. The data were coded and recorded as follows:
Temperature, x : 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
Converted Sugar, y: 8.1 7.8 8.5 9.8 9.5 8.9 8.6 10.2 9.3 9.2 10.5
(a) Estimate the linear regression line.
(b) Estimate the mean amount of converted sugar produced when the coded temperature is 1.75.
(c) Plot the residuals versus temperature. Comment.
11.4 In a certain type of metal test specimen, the normal stress on a specimen is known to
be functionally related to the shear resistance. The following is a set of coded experimental
data on the two variables:
Normal Stress,a: 26.8 25.4 28.9 23.6 27.7 23.9 24.7 28.1 26.9 27.4 22.6 25.6
Shear Resistance,y: 26.5 27.3 24.2 27.1 23.6 25.9 26.3 22.5 21.7 21.4 25.8 24.9
(a) Estimate the regression line ny\x = a + ,3x.
(b) Estimate the shear resistance for a normal stress of 24.5 kilograms per square centimeter.
•  
40
11.5 The amounts of a chemical compound y, which dissolved in 100 grams of water at various
temperature, x were recorded as follows:
x (°C): 0 15 30 45 60 75
y (grams): 8 12 25 31 44 48
6 10 21 33 39 51
8 14 24 28 42 44
(a) Find the equation of the regression line.
(b) Graph the line on a scatter diagram.
(c) Estimate the amount of chemical that will dissolve in 100 grams of water at 50°C.
11.6 A mathematics placement test is given to all entering freshmen at a small college. A
student who receives a grade below 35 is denied admission to the regular mathematics
course and placed in a remedial class. The placement test scores and the final grades for
20 students who took the regular course were recorded as follows:
Placement Test : 50 35 35 40 55 65 35 60 90 35 90 80 60 60 60 40 55 50 65 50
Course Grade : 53 41 61 56 68 36 11 70 79 59 54 91 48 71 71 47 53 68 57 79
(a) Plot a scatter diagram.
(b) Find the equation of the regression line to predict course grades from placement test
scores.
(c) Graph the line on the scatter diagram.
(d) If 60 is the minimum passing grade, below which placement test score should students
in the future be denied admission to this course?
41
11.49 Compute and interpret the correlation coefficient for the following grades of 6 students
selected at random:
Mathematics grade: 70 92 80 74 65 83
English grade: 74 84 63 87 78 90

11.52 The following data were obtained in a study of the relationship between the weight
and chest size of infants at birth:
Weight (kg) Chest Size (cm)
27.5 29.5
2.15 26.3
4.41 32.2
5.52 36.5
3.21 27.2
4.32 27.7
2.31 28.3
4.30 30.3
3.71 28.7
Calculate r & What percentage of the variation in the infant chest sizes is explained by
difference in weight?

42

You might also like