0% found this document useful (0 votes)

20 views

11august2010 - Correlation and Regression

Uploaded by

JOCELYN CAMACHO

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

11august2010 - Correlation and Regression

Uploaded by

JOCELYN CAMACHO

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

11.

CORRELATION AND REGRESSION

Objectives:
At the end of the chapter, for a given data; the students should be able to:
1. Make a scatter plot;
2. Find the correlation coefficient;
3. Test the significance of r for a given level of significance;
4. Find the coefficients of determination and non-determination; and
5. Find the equation for the regression line.

CORRELATION ANALYSIS

Correlation analysis is used to determine if there is a relationship, or correlation, between two

variables and to determine the strength of the correlation.
A correlation is a relationship between two statistical variables measured from the same
population. In this chapter, we will consider only linear correlation which comes in three types: positive
linear correlation, negative linear correlation and non-linear correlation.
A Positive Linear Correlation indicates that high values for one variable tend to correspond to
high values for the second variable or simply, if one value increases, so does the other the other. For
example, the height vs. weight for adults (For a normal individual, as the height increases, the weight
also increases).
A Negative Linear Correlation indicates high values for one variable tend to correspond to low
values for the second variable., that is, one variable increases and the other decreases. For instance, the
year of acquiring a vehicle and the and resale price (As the vehicle gets older, the re sale price becomes
lower).
Non Linear Correlation means no relationship between the variables or a non-linear relationship.
For example, the height and no. of years of education (The height of the person in no way has a bearing
on the number of years he had been in school).
Regression analysis is used to determine what type of relationship exists to make predictions
using the relationship.

Simple Correlation
In simple correlation, only two variables are studied at once. The two variables are the
independent and dependent variable. The independent variable , is the variable that can be
controlled or picked. The independent variable, is the variable that you assume to be dependent on
the other variable. The independent variable are used to predict the dependent variable if there is a
correlation between the two variables.
One way to determine the type of linear correlation between two variables is by means of a scatter
plot. The scatter plot is a graph with the independent variable at the bottom (or along the ) and
the dependent variable along the side For each pair of numbers, we plot a point but the
points are not connected with a line.
The scatter plot shows if there is a linear correlation between two variables. We can then
determine the type of linear correlation as follows:
1. Positive Linear Correlation - general trend in the plotted points is from bottom left to top right.
2. Negative Linear Correlation - general trend in the plotted points is from top left to bottom right.
3. No Linear Correlation - No general trend in plotted points, or a non-linear trend.
The strength of the linear correlation can be judged by looking at how closely the points approximate
a straight line.

Example 1: The following table shows the Height (X) vs. Weight (Y) measurements (both in inches) for
10 men:
Prepared by MJDP Page - 1 -
x 70.8 66.2 71.7 68.7 67.6 69.2 66.5 67.2 68.3 65.6
y 42.5 40.2 44.4 42.8 40.0 47.3 43.4 40.1 42.1 36.0

Interpretation: The diagram scatter plot in Excel below shows a positive linear correlation between the
variables.

Example 2: The following table gives the resale value of a car bought in 1970 at Php200,000.00.
x (Php) 1970 1973 1976 1979 1982 1985 1988 1991 1994 1997
y (000) 200 150 145 135 120 100 79 65 54 35.0

Interpretation: The diagram (Excel) indicates a negative linear correlation between the variables.

Example 3. Below is a data of the scores in an examination. Make a scatter plot and interpret the
data.
Test scores
100
Mid-Term Final
73 70
90
86 80
Final Term Score

93 96
80
92 85
72 68
70
65 68
58 62
60
75 78
50
Prepared by MJDP Page - 2 -
50 55 60 65 70 75 80 85 90 95 100

Mid-Term Score
Interpretation: There is a fairly positive correlation between scores in the mid-term examination and the
final examination.

Coefficient of Correlation
A more precise method of determining the type and strength of a linear correlation is to calculate
the coefficient of linear correlation for the two variables using the formula:

∑ ∑ ∑
√[ ∑ ∑ ]√ ∑ ∑
The coefficient of linear correlation will always be a number between -1.00 and 1.00, with a
positive value indicating a positive correlation and a negative value a negative correlation. A coefficient
of r  1 for a data set indicates perfect positive linear correlation, and indicates perfect
negative linear correlation, while would indicate no linear correlation. The closer the value of r is
to , the stronger the correlation, and the closer to zero, the weaker the correlation.
The coefficient of correlation between two variables is most easily calculated by constructing a
table (see example below).

Coefficient of Determination
The coefficient of determination tells us how much variation in the dependent variable is
explained by the independent variable. The coefficient of determination is and is usually explained as
percent.
On the other hand, the coefficient of non-determination, , is the variation in the dependent
variable that is not explained by the independent variable.

Testing the Coefficient of Correlation

The coefficient correlation can be tested for significance using a and following the
procedures for hypothesis testing, or by comparing to a value in Table 9. The null hypothesis is that
there is no correlation, or that . The alternative hypothesis is that there is a correlation,
Summarizing, therefore, to test ,
1. Find the value from Table 9 for the desired level of significance.
2. If is between , accept the null hypothesis. There may not be a correlation.
3. If is smaller than or greater than the , reject the null hypothesis.
There is a correlation.

Example 4. Using the data in Example 3, we have

Grade
n x2 y2 xy
Mid-Term (x) Final Term (y)
1 73 70 5,329 4,900 5,110
2 86 80 7,396 6,400 6,880
3 93 96 8,649 9,216 8,928
4 92 85 8,464 7,225 7,820
5 72 68 5,184 4,624 4,896
6 65 68 4,225 4,624 4,420
7 58 62 3,364 3,844 3,596
8 75 78 5,625 6,084 5,850
S 614 607 48,236 46,917 47,500

Prepared by MJDP Page - 3 -

∑ ∑ ∑
√[ ∑ ∑ ]√ ∑ ∑

√[ ] √[ ]

a) Test : In Table 9 at 0.05 level, the Table value is 0.707. Note that 0.933 is not between -0.707
and 0.707, or Interpretation: so there is a correlation and 0.933 is a very strong
positive correlation.
b) Coefficient of determination: . Interpretation: 87% of the variation in the
final grades can be determined by the variations in the mid-term grades.
c) Coefficient of non-determination: . Interpretation: 13% of the
variation in the final grades cannot be determined by the variations in the mid-term grades

Example 5: Calculate the coefficient of correlation for the vehicle weight (x) and distance
travelled in miles per gallon(y) data sets. The table of variables is given below:
n x y x2 y2 xy
1 3.55 30 12.60 900.00 106.50
2 2.60 32 6.76 1,024.00 83.20
3 3.25 30 10.56 900.00 97.50
4 3.93 24 15.44 576.00 94.32
5 4.00 26 16.00 676.00 104.00
6 3.12 30 9.73 900.00 93.60
7 3.24 33 10.50 1,089.00 106.92
8 3.23 27 10.43 729.00 87.21
9 2.44 37 5.95 1,369.00 90.28
10 3.24 32 10.50 1,024.00 103.68
11 2.29 37 5.24 1,369.00 84.73
12 2.50 34 6.25 1,156.00 85.00
13 4.02 26 16.16 676.00 104.52
S 41.41 398 136.14 12,388.00 1,241.46

∑ ∑ ∑
√[ ∑ ∑ ]√ ∑ ∑

√[ ] √[ ]

a) Test : In Table 9 at 0.05 level, the Table value is 0.553. . Interpretation:

There is a very strong negative correlation between vehicle weight and distance travelled in miles
per gallon. As the weight of the vehicle increases, the travel distance per gallon of gasoline
decreases.
b) Coefficient of determination: . Interpretation: 81% of the variation in the
distance travelled in miles per gallon can be determined by the variations in the weight of the
vehicle.
c) Coefficient of non-determination: . Interpretation19% of the variation
in the distance travelled in miles per gallon cannot be determined by the variations in the weight
of the vehicle.

Prepared by MJDP Page - 4 -

REGRESSION ANALYSIS

Linear Regression
If a pair of variables has a significant linear correlation, then the relationship between the data
values can be roughly approximated by a linear equation. The regression line is the equation of the line
that best fits the points of the scatter plot.
The process of finding the linear equation which best fits the data values is known as linear
regression and the line of best fit is called the regression line.
It is a fact of linear algebra and analysis that the least squares line of best fit to a set of data values
has an equation of the form where: is the and is the .
will be the predicted value of for any given value. If the scatter plot indicates the line that is going
up, the slope will be positive and will be positive. If the scatter plot indicates the line that is going
down, the slope will be negative and will be negative.
To solve for the regression equation , we use the following:

∑ ∑ ∑ ∑ ∑ ∑ ∑
∑ ∑ ∑ ∑
The standard error estimate, , is the standard deviation of the observed values about the
predicted value, or an average of how much error there will be in each predicted This can be
computed using any of the following:
∑ ∑ ∑ ∑
√ or √
A confidence interval for a predicted y can be found using the standard error of estimate if the
sample size is larger than 100. Instead of predicting a single value for , you could be more confident in
saying that would be between for a given . That is:
A 95% confidence interval, if , where is the value obtained from for
given : .

Example 6. Find the equation for the regression line and the standard error of estimate in
Example no. 4.
Solution: a) For the regression equation , we use the following:
∑ ∑ ∑ ∑ ∑ ∑ ∑
∑ ∑ ∑ ∑

b) The standard error of estimate is:

∑ ∑ ∑
√ √
Interpretation: Each predicted final grade will have an error of about 5.28 points.

Example 7. For a sample of 1 , the regression line is and the standard error of
estimate is . Find the confidence interval for .
Solution: Find the predicted value for y: , at , we get

Prepared by MJDP Page - 5 -

Interpretation: We can be confident that for , will be between and .

Multiple Regression
Multiple regression is used when there is one independent variable and two or more independent
variables.
The following are the conditions in using multiple regression:
1. The variable must be normally distributed;
2. The variances for the must be the same for each value of the independent variable;
3. There must be a linear relationship between the dependent and each independent variable;
4. The independent variables must not be correlated; and
5. The independent variables must be independent.
The general form of the multiple regression equation with independent variables is:

.
The multiple regression coefficient, , will be between 0 and 1. Close to 0 is a weak correlation
and close to 1 is a strong relationship. will always be stronger than the individual correlation
coefficients.

Worksheet no. 11

1. Given the following data:

Number of absences 0 1 2 2 3 3 4 5 6
Final grade 96 91 78 83 75 62 70 68 56
a. Draw a scatter plot.
b. Find the correlation coefficient and test its significance at the 0.01 level.
c. Find and interpret the coefficient of determination and non-determination.
d. Find the regression equation and standard error of estimate.
2. For a sample of , the regression equation and the standard error of estimate is
. Find a 95% confidence interval for .
3. Using the data in Example 5,
a. Find and interpret the coefficient of determination and non-determination; and
b. Find the regression equation and standard error of estimate.

Table 9

3 0.997 18 0.468 0.543 0.590

4 0.950 0.980 0.990 19 0.456 0.529 0.575
5 0.878 0.934 0.959 20 0.444 0.516 0.561
6 0.811 0.882 0.917 21 0.433 0.503 0.549
7 0.754 0.833 0.875 22 0.423 0.492 0.537

8 0.707 0.789 0.834 27 0.381 0.445 0.487

9 0.666 0.750 0.798 32 0.349 0.409 0.449
10 0.632 0.715 0.765 37 0.325 0.381 0.418
11 0.602 0.685 0.735 42 0.304 0.358 0.393
12 0.576 0.658 0.708 47 0.288 0.338 0.372

Prepared by MJDP Page - 6 -

13 0.553 0.634 0.684 52 0.273 0.322 0.354
14 0.532 0.612 0.661 62 0.250 0.295 0.325
15 0.514 0.592 0.641 72 0.232 0.274 0.302
16 0.497 0.574 0.623 82 0.217 0.256 0.283
17 0.482 0.588 0.606 92 0.205 0.242 0.267
Source: This table was abridged from Table VI of R.A. Fisher and F. Yates, Statistical Tables for
Biological, Agricultural, and Medical Research, Longman Group Ltd. London.

Prepared by MJDP Page - 7 -

Regression Analysis Assignment
100% (1)
Regression Analysis Assignment
8 pages
Regression Analysis Assignment
No ratings yet
Regression Analysis Assignment
8 pages
Course Notes Advanced SWAT: Calibrating Using SWAT-CUP
0% (1)
Course Notes Advanced SWAT: Calibrating Using SWAT-CUP
27 pages
Statistics 02
No ratings yet
Statistics 02
8 pages
Module9-Correlation and Regression (Business)
No ratings yet
Module9-Correlation and Regression (Business)
15 pages
Unit_6_Machine_Learning_Algorithms
No ratings yet
Unit_6_Machine_Learning_Algorithms
13 pages
Econometrics For Finance
100% (1)
Econometrics For Finance
54 pages
Complete - Lesson 2 Corr
No ratings yet
Complete - Lesson 2 Corr
26 pages
Lesson 7 Pearson Product of Moment Coefficient Correlation
No ratings yet
Lesson 7 Pearson Product of Moment Coefficient Correlation
6 pages
UNIT-2 ML
No ratings yet
UNIT-2 ML
39 pages
Complete - Lesson 2 Correation Analysis
No ratings yet
Complete - Lesson 2 Correation Analysis
26 pages
Module 3 PoM-Forecasting
No ratings yet
Module 3 PoM-Forecasting
5 pages
MODULE 7 1 Converted
No ratings yet
MODULE 7 1 Converted
12 pages
Regression
No ratings yet
Regression
21 pages
GMATH Regression Analysis
No ratings yet
GMATH Regression Analysis
3 pages
Chapter 4 Correlation PART 1
No ratings yet
Chapter 4 Correlation PART 1
52 pages
Simple Linear Regression Part 1
No ratings yet
Simple Linear Regression Part 1
63 pages
Regression Analysis
No ratings yet
Regression Analysis
5 pages
11_eco_correlation_tp01
No ratings yet
11_eco_correlation_tp01
7 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
LESSON 3FINALS Linear Regression and Correlation
No ratings yet
LESSON 3FINALS Linear Regression and Correlation
8 pages
Correlation and Regration
No ratings yet
Correlation and Regration
8 pages
Reymond Contreras Pearson r and Spearman Rho
No ratings yet
Reymond Contreras Pearson r and Spearman Rho
35 pages
4
No ratings yet
4
36 pages
Stat 11 q4 Week 7 SSLM
No ratings yet
Stat 11 q4 Week 7 SSLM
4 pages
LINEAR REGRESSION IN R
No ratings yet
LINEAR REGRESSION IN R
6 pages
Lesson 3: Linear Regression and Correlations: Learning Objectives
No ratings yet
Lesson 3: Linear Regression and Correlations: Learning Objectives
8 pages
Lesson 6.2 Correlation and Regression Analysis Final Edition
No ratings yet
Lesson 6.2 Correlation and Regression Analysis Final Edition
8 pages
K-Nearest Neighbor in Missing Data Imputation: Ms.R.Malarvizhi, DR - Antony Selvadoss Thanamani
No ratings yet
K-Nearest Neighbor in Missing Data Imputation: Ms.R.Malarvizhi, DR - Antony Selvadoss Thanamani
3 pages
Asynchronus Learning Module - Sesi 8
No ratings yet
Asynchronus Learning Module - Sesi 8
9 pages
Regn & Marketing Research
No ratings yet
Regn & Marketing Research
23 pages
Block-2
No ratings yet
Block-2
111 pages
Correlation and regression
No ratings yet
Correlation and regression
32 pages
Pearson's Correlation Coefficient
No ratings yet
Pearson's Correlation Coefficient
7 pages
Correlation
No ratings yet
Correlation
9 pages
Public Health, Health Economics, Regression Analysis
No ratings yet
Public Health, Health Economics, Regression Analysis
22 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Correlation and Regression
No ratings yet
Correlation and Regression
34 pages
BSN A23 Mat Sas 14
No ratings yet
BSN A23 Mat Sas 14
5 pages
Copy of Hints of Assignment5_Fall 2024
No ratings yet
Copy of Hints of Assignment5_Fall 2024
11 pages
Econ 321.6
No ratings yet
Econ 321.6
20 pages
Introduction To Correlation and Regression Analyses PDF
No ratings yet
Introduction To Correlation and Regression Analyses PDF
12 pages
Chapter 3 complete
No ratings yet
Chapter 3 complete
109 pages
Introduction To Statistics (4485) : Semester: Spring, 2023
No ratings yet
Introduction To Statistics (4485) : Semester: Spring, 2023
26 pages
lec2
No ratings yet
lec2
16 pages
Unit-14
No ratings yet
Unit-14
16 pages
Bio2 Module 1 - Simple Linear Regression and Correlation
No ratings yet
Bio2 Module 1 - Simple Linear Regression and Correlation
20 pages
Chapter 8 Simple Linear Regression
100% (3)
Chapter 8 Simple Linear Regression
17 pages
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual - Download Today With Full Content
100% (3)
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual - Download Today With Full Content
44 pages
Copy of Assignment5_Fall 2024
No ratings yet
Copy of Assignment5_Fall 2024
14 pages
Quantitative Techniques:: OPJS University, Rajgarh - Churu Quantitative Method
No ratings yet
Quantitative Techniques:: OPJS University, Rajgarh - Churu Quantitative Method
20 pages
Biostat-Epi Chap10 Correlation
No ratings yet
Biostat-Epi Chap10 Correlation
33 pages
Regression&Corr&Annova
No ratings yet
Regression&Corr&Annova
71 pages
Statistics Module 3
No ratings yet
Statistics Module 3
33 pages
Module 11 Unit 2 Simple Linear Regression
No ratings yet
Module 11 Unit 2 Simple Linear Regression
10 pages
Regression Analysis Assignment
No ratings yet
Regression Analysis Assignment
8 pages
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual - PDF DOCX Format Is Available For Instant Download
100% (2)
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual - PDF DOCX Format Is Available For Instant Download
51 pages
Interactive Lecture Notes 12-Regression Analysis
No ratings yet
Interactive Lecture Notes 12-Regression Analysis
22 pages
Case2_1015_1018_1060_1116_1124
No ratings yet
Case2_1015_1018_1060_1116_1124
8 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
17 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Science 10 Ddl9
No ratings yet
Science 10 Ddl9
2 pages
Science 10 Ddl7
No ratings yet
Science 10 Ddl7
2 pages
Science 10 Ddl10
No ratings yet
Science 10 Ddl10
2 pages
Science 10 Ddl11
No ratings yet
Science 10 Ddl11
3 pages
Science 10 Ddl15
No ratings yet
Science 10 Ddl15
3 pages
8august2010 - Confidence Interval and Sample Size
No ratings yet
8august2010 - Confidence Interval and Sample Size
5 pages
Correlation Coefficient Definition
100% (1)
Correlation Coefficient Definition
8 pages
Econometricsnotes2 140407141735 Phpapp01 PDF
No ratings yet
Econometricsnotes2 140407141735 Phpapp01 PDF
23 pages
Tugas MPS Variabel Jurnal
No ratings yet
Tugas MPS Variabel Jurnal
5 pages
2703-Article Text-11192-1-10-20231122
No ratings yet
2703-Article Text-11192-1-10-20231122
15 pages
Dill & Carr, 2003
No ratings yet
Dill & Carr, 2003
9 pages
U.S Medical Insurance Costs: Wesley F. Maia
No ratings yet
U.S Medical Insurance Costs: Wesley F. Maia
30 pages
2574-Article Text-9671-1-10-20240517
No ratings yet
2574-Article Text-9671-1-10-20240517
9 pages
Ejercicio N°3
No ratings yet
Ejercicio N°3
5 pages
Fin 542 Yen Bal
No ratings yet
Fin 542 Yen Bal
36 pages
Statistics With R-Programming Lab Manual
100% (9)
Statistics With R-Programming Lab Manual
35 pages
Exceed Case Study
No ratings yet
Exceed Case Study
15 pages
1 PB
No ratings yet
1 PB
12 pages
Bthma2e ch06 SM
No ratings yet
Bthma2e ch06 SM
183 pages
13 - GBFR-24-02-09 - Nam Huong Dau
No ratings yet
13 - GBFR-24-02-09 - Nam Huong Dau
13 pages
Innovations
No ratings yet
Innovations
10 pages
Determinants of Portfolio Performance: A Framework For Analysis
No ratings yet
Determinants of Portfolio Performance: A Framework For Analysis
6 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Flight Price Prediction Using Machine Learning Algorithms
No ratings yet
Flight Price Prediction Using Machine Learning Algorithms
5 pages
Case Study - How We Determined Optimal Staffing Levels
No ratings yet
Case Study - How We Determined Optimal Staffing Levels
5 pages
Changes in Marital Beliefs Among Emerging Adults: Examining Marital Paradigms Over Time
No ratings yet
Changes in Marital Beliefs Among Emerging Adults: Examining Marital Paradigms Over Time
10 pages
Multiple Regression: Curve Estimation
100% (2)
Multiple Regression: Curve Estimation
23 pages
Analyzing APC Performance - Chemical Engineering Progress - Aug 2002
No ratings yet
Analyzing APC Performance - Chemical Engineering Progress - Aug 2002
7 pages
12 Green Open Spaces in Shopping Malls
No ratings yet
12 Green Open Spaces in Shopping Malls
12 pages
Geopolitical Risks Uncertainty and Stock Market Performance
No ratings yet
Geopolitical Risks Uncertainty and Stock Market Performance
14 pages
Correlation and Regression - The Simple Case
100% (2)
Correlation and Regression - The Simple Case
106 pages
Effects of Online Tax System On Tax Comp
No ratings yet
Effects of Online Tax System On Tax Comp
18 pages
Measuring Commuters' Perception On Service Quality Using SERVQUAL in Public Transportation
No ratings yet
Measuring Commuters' Perception On Service Quality Using SERVQUAL in Public Transportation
14 pages
Binary
No ratings yet
Binary
135 pages
Factors Impact Accounting Information System Performance
No ratings yet
Factors Impact Accounting Information System Performance
7 pages