0% found this document useful (0 votes)
268 views28 pages

Correlation and Regression Analysis

The document discusses correlation and regression analysis. It defines correlation as the relationship between two random variables and introduces the Pearson product-moment correlation coefficient (PPMC) as a measure of the linear relationship between variables. It provides formulas to calculate the PPMC and discusses interpreting the correlation coefficient value. It also introduces simple linear regression analysis and the assumptions of the linear regression model. In summary, the document outlines common statistical methods for analyzing relationships between variables through correlation and linear regression.

Uploaded by

Lay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
268 views28 pages

Correlation and Regression Analysis

The document discusses correlation and regression analysis. It defines correlation as the relationship between two random variables and introduces the Pearson product-moment correlation coefficient (PPMC) as a measure of the linear relationship between variables. It provides formulas to calculate the PPMC and discusses interpreting the correlation coefficient value. It also introduces simple linear regression analysis and the assumptions of the linear regression model. In summary, the document outlines common statistical methods for analyzing relationships between variables through correlation and linear regression.

Uploaded by

Lay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Correlation and

Regression Analysis
Correlation

Correlation refers to the departure of two


random variables from independence.

Pearson product-moment correlation (PPMC)


is the most widely used in statistics to
measure the degree of the relationship
between the linear related variables.

The correlation coefficient is defined as the


covariance divided by the standard
deviations of the variables.
Pearson product-moment correlation

Pearson’s product-moment correlation


coefficient of simply correlation coefficient
(or Person’s r) is a measure of the linear
strength of the association between two
variables.

 Founded by Karl Pearson.

 The value of the correlation coefficient


varies between +1 and –1.
Correlation Coefficient

Perfect Positive Correlation (r = Perfect Negative Correlation (r =


1.00) -1.00)

Positive Correlation (r = 0.80) Negative Correlation (r = -0.80)


Correlation Coefficient

Zero Correlation (r = 0.00) Non-Linear Correlation (r = -1.00)


Pearson product-moment correlation

N XY  ( X)( Y)
r
[N( X2)  ( X)2][N( Y 2)  ( Y)2]

Test of Significance

r N 2
t
1 r 2

df = n – 2
Correlation Coefficient & Strength of Relationships

0.00 – no correlation, no relationship


±0.01 to ±0.20 – slight correlation, almost negligible
relationship
±0.21 to ±0.40 – low correlation, definite but small relationship

±0.41 to ±0.70 – moderate correlation, substantial relationship

±0.71 to ±0.90 – high correlation, marked relationship


±0.91 to ±0.99 – very high correlation, very dependable
relationship
±1.00 – perfect correlation, perfect relationship
Assumptions

Subjects are randomly selected and


independently assigned to groups.

Both populations are normally distributed.


Procedure for Pearson Product-Moment Corr. test

 Set up the hypotheses.


H0:  = 0 (The correlation in the population is
zero.)
H1:   0,   0,   0 (The correlation in the
population is different from zero.)
 Calculate the value of Pearson’s r.

 Calculate the value of t value.

 Statistical decision for hypothesis testing


If tcomputed  tcritical, do not reject H0.
If tcomputed  tcritical, reject H0.
Example 1: Pearson r

The owner of a chain of fruit shake stores would like to


study the correlation between atmospheric temperature and
sales during the summer season. A random sample of 12
days is selected with the results given as follows:

Day 1 2 3 4 5 6 7 8 9 10 11 12
Temperature (°F) 79 76 78 84 90 83 93 94 97 85 88 82
Total Sales (Units) 14 14 14 16 20 15 19 21 20 18 20 15
7 3 7 8 6 5 2 1 9 7 0 0

Plot the data on a scatter diagram. Does it appear there is a


relationship between atmospheric temperature and sales?
Compute the coefficient of correlation. Determine at the
0.05 significance level whether the correlation in the
population is greater than zero.
Scatter Plot
Solution 1:
Step 1: State the hypotheses.
H0: r = 0
There is no correlation between atmospheric
temperature and total sales of fruit shake.
H1: r  0
There is a correlation between atmospheric
temperature and total sales of fruit shake.
Step 2: Level of significance is α = 0.05.

Step 3: df = n–2 = 12 – 2 = 10 & t critical value is 2.228

Step 4: Compute the Pearson’s r.


Table
Day X Y X2 Y2 XY
1 79 147 6,241 21,609 11,613  X 1,029
2 76 143 5,776 20,449 10,868
3 78 147 6,084 21,609 11,466  Y 2,115
4 84 168 7,056 28,224 14,112 2

5 90 206 8,100 42,436 18,540


X 88,733
2
6 83 155 6,889 24,025 12,865  380,887
Y
7 93 192 8,649 36,864 17,856
8 94 211 8,836 44,521 19,834  XY 183,222
9 97 209 9,409 43,681 20,273
10 85 187 7,225 34,969 15,895
11 88 200 7,744 40,000 17,600
12 82 150 6,724 22,500 12,300
1,02 2,11 88,73 380,88 183,22
Total 9 5 3 7 2
Computation of Pearson’s r

N XY  ( X)( Y)
r
[N( X 2 )  ( X)2 ][N( Y 2 )  ( Y)2 ]

12(183,222)  (1,029)(2,115)

[12(88,733)  (1,029) 2 ][12(380,887)  (2,115) 2 ]

= 0.93

The atmospheric temperature and total sales


indicates a very high positive correlation (very
dependable relationship)–that is an increased in
atmospheric temperature is highly associated
with the increased in total sales of fruit shake.
Solution 2:
Step 5: Decision rule.
r N 2 0.93 12 2 0.93(3.16227766
) 2.940918224
t    8.00
1 r 2 1 (0.93) 2 1 0.8649 0.367559519

Reject H0

-2.228 0 +2.228
8.00

Step 6: Conclusion.
We can conclude that there is evidence that
shows significant association between the
atmospheric temperature and the total sales
of fruit shake.
Example 2: Spearman Rank

The owner of a chain of fruit shake stores would like to


study the correlation between atmospheric temperature and
sales during the summer season. A random sample of 12
days is selected with the results given as follows:

Day 1 2 3 4 5 6 7 8 9 10 11 12
Temperature (°F) 79 76 78 84 90 83 93 94 97 85 88 82
Total Sales (Units) 14 14 14 16 20 15 19 21 20 18 20 15
7 3 7 8 6 5 2 1 9 7 0 0

Plot the data on a scatter diagram. Does it appear there is a


relationship between atmospheric temperature and sales?
Compute the coefficient of correlation. Determine at the
0.05 significance level whether the correlation in the
population is greater than zero.
Scatter Plot
Solution 2:
Step 1: State the hypotheses.
H0:  = 0
There is no correlation between atmospheric
temperature and total sales of fruit shake.
H1:   0
There is a correlation between atmospheric
temperature and total sales of fruit shake.
Step 2: Level of significance is α = 0.05.

Step 3: df = n–2 = 12 – 2 = 10 & t critical value is 2.228

Step 4: Compute the .


Table

Day X Y RX RY D D2
1 79 147 10 10.5 –0.5 0.25
2 76 143 12 12 0 0
3 78 147 11 10.5 0.5 0.25
4 84 168 7 7 0 0
5 90 206 4 3 1 1
6 83 155 8 8 0 0
7 93 192 3 5 –2 4
8 94 211 2 1 1 1
9 97 209 1 2 –1 1
10 85 187 6 6 0 0
11 88 200 5 4 1 1
12 82 150 9 9 0 0
2
Total 0 D 8.5
Computation of 

6 D2
 1
N(N2  1)
6(8.5) 51
1 2
1 1 0.030.97
12(12  1) 12(143)

The atmospheric temperature and total sales


indicates a very high positive correlation (very
dependable relationship)–that is an increased in
atmospheric temperature is highly associated with
the increased in total sales of fruit shake.
Solution 2:

Step 5: Decision rule.


 N 2 0.97 12 2 0.97(3.16227766
) 3.06740933
t    12.62
1  2 1 (0.97) 2 1 0.9409 0.243104915

Reject H0

-2.228 0 +2.228
12.62
Step 6: Conclusion.
We can conclude that there is evidence that
shows significant association between the
atmospheric temperature and the total sales
of fruit shake.
Simple Regression Equation
Regression analysis is a simple statistical
tool used to model the dependence of a
variable on one (or more) explanatory
variables.
A simple linear regression is the least
estimator of a linear regression model with
a single predictor (or one independent
variable)
The least square model determines a
regression equation by minimizing the sum
of squares of the vertical distances between
the actual Y values and the predicted values
of Y.
Assumptions of Linear Regression Equation

Linearity – The mean of each error


component is zero.
Independence of Error Terms – The errors are
independent of each other.

Normally Distributed Error Terms – Each error


component (random variable) follows an
approximate normal distribution.

Homoscedasticity – The variance of the error


components is the same for each value of the
independent variable.
Estimating the Coefficient

Slope of the regression line

  y   x    x  y 
2
a
n  x    x 
2 2

n( XY )  ( X )( Y )
b 2 2
n( X )  ( X )
Intercept of the regression line

Where:

y a  bx
y = criterion mesure
x= predictor
= ordinate or the point where the regression line crosses the y-axis
b = beta weight or the slope of the line

a
Measures of Variations
Yi Unexplained
Y
sum of squares

Total sum of Ŷ = b 1X + b 0
squares
Explained sum of
squares

Xi X
2 2
 (Yi  Y) =  (Ŷi  Y) +  (Y
i  Ŷi )2 SST = SSR + SSE
Standard Error of Estimate

The standard error of estimate is the


standard deviation of the observed Y
values about the predicted Ŷ values.

2 2
SSE (Y
 i i
 Ŷ )   b0 ( Y)  b1( XY)
Y
sE   
n 2 n 2 n 2
Coefficient of Determination
The coefficient of determination is the measure of
variation of the dependent variable that is
explained by the regression line and the
independent variable
total variation un explained variation explained variation
r2  
total variation total variation

2
2
r 
SSR
1 
SSE
1 
 (Y  Ŷ )
i i
2
SST SST  (Y  Y )
i

Coefficient of non-determination is the proportion in


the dependent variable that is left unexplained by
the independent variable, determined by 1 – r2.

You might also like