0% found this document useful (0 votes)
98 views46 pages

Bivariate Analysis

This document discusses univariate and bivariate analysis. Univariate analysis looks at one variable, while bivariate analysis looks at two variables. Correlation analysis measures the relationship between two variables using methods like scatter diagrams, Pearson's correlation coefficient, Spearman's rank correlation, and concurrent deviation. Pearson's correlation coefficient ranges from -1 to 1, indicating the strength and direction of the linear relationship. The significance of the correlation coefficient can be tested using the t-statistic. Coefficient of determination indicates what proportion of variation in the dependent variable is explained by the independent variable.

Uploaded by

Chef G Krishana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views46 pages

Bivariate Analysis

This document discusses univariate and bivariate analysis. Univariate analysis looks at one variable, while bivariate analysis looks at two variables. Correlation analysis measures the relationship between two variables using methods like scatter diagrams, Pearson's correlation coefficient, Spearman's rank correlation, and concurrent deviation. Pearson's correlation coefficient ranges from -1 to 1, indicating the strength and direction of the linear relationship. The significance of the correlation coefficient can be tested using the t-statistic. Coefficient of determination indicates what proportion of variation in the dependent variable is explained by the independent variable.

Uploaded by

Chef G Krishana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Bivariate Analysis

Correlation and
Regression Analysis

Dr. Shambhu Nath Singh


Univariate & Bivariate Analysis
Univariate Analysis:
In Univariate analysis, we analyze only one
variable at a time and finalise the decision.
For example- measures of central tendency and
measures of dispersion.
Bivariate Analysis:
In Bivariate analysis, we analyze only two
variable at a time and finalise the
decision.
For example- Correlation and Regression
analysis, Time series analysis.
Correlation Analysis
Correlation analysis is a
mathematical technique used to
measure the degree and direction of
relationship between two variables.
For examples-
1. Relationship between heights of
fathers and heights of sons.
2. Relationship between quantum
of rainfall and production of
wheat.
3. Relationship between price of
commodity and demand of
commodity.
Methods of Studying
Correlation
There are mainly four methods for
studying the relationship
between two variable-
1. Scatter Diagram Method
2. Karl Pearson's Coefficient of
Correlation
3. Spearman’s Rank Correlation
Method
4. Concurrent Deviation Method
Scatter Diagram Method
• It is a graphic device for drawing certain
conclusions about the correlation
between two variables. It is a non
mathematical and gives only a rough
idea of how the two variable are related.
• The value of correlation coefficient is
denoted by ‘r’ and lies between (-1 to
+1).
• When r=-1 perfect negative correlation
between two variables.
• When r=+1 perfect positive correlation
between two variables.
• When r=0 there is no correlation
between two variables.
Scatter Diagram Method
Scatter Diagram Method
Karl Pearson's Coefficient of
Correlation
• It is also known as Product moment
correlation coefficient. It gives an exact
idea about the degree of linear
relationship between two variables. It is
also denoted by ‘r’ and may be defined as-

Properties-
• It is independent of origin and scale.
• It is independent of unit of measurement.
• The coefficient of correlation is the
geometric mean of the two regression
coefficients.
Correlation Formula

𝟐 𝟐
&
Question 1-2
1- Find the Karl Pearson correlation
coefficient from the following
data-
X= 10 4 3 5 8 5 6 7

Y= 9 6 4 5 7 4 5 8

2- The covariance between the


variables X and Y is 10. The
variances of X series and Y series
are 16 and 9 respectively. Find the
coefficient of correlation.
Solution of Question No. 1
S. No. X Y 𝟐 𝟐

1 10 9 4 16 3 9 12
2 4 6 -2 4 0 0 0
3 3 4 -3 9 -2 4 6
4 5 5 -1 1 -1 1 1
5 8 7 2 4 1 1 2
6 5 4 -1 1 -2 4 2
7 6 5 0 0 -1 1 0
8 7 8 1 1 2 4 2
N=8 𝐗 = 𝟒𝟖 𝐘 = 𝟒𝟖 == (𝐗 − 𝐗)𝟐 == (𝐘 − 𝐘)𝟐 (𝐗 − 𝐗) ∗ (𝐘 − 𝐘)
= 𝟑𝟔 = 𝟐𝟒 = 𝟐𝟓

𝟐 𝟐

Hence there is highly positive correlation between two variables.


Solution of Question No. 2
Given that in question

Since Variance =
Therefore Standard Deviation=
We know that
Hence there is highly positive correlation
between two variables.
Spearman’s Rank Correlation
Method
• It uses ranks rather than actual
observation and makes no assumptions
about the population from which the
actual observations are drawn. The
correlation coefficient between two
series of ranks is called ‘Rank
Correlation Coefficient’. It is given by the
formula-

In case there is more than one item with


same value in the series, usually average
rank is allotted to each of these items
and the factor (m3-m)/12 is added for each
such tied items toƩD2.
Question 3
• In a beauty competition two Judges
rank the entries as follows-
Participants A B C D E F G H I J K
Judge I 1 2 3 4 5 6 7 8 9 10 11
Judge II 2 3 1 6 4 5 8 7 10 11 9

Find the degree of agreement


between ranks given by the two
Judges.
Solution of Question No. 3
Partici. A B C D E F G H I J K
Judge I 1 2 3 4 5 6 7 8 9 10 11
Judge 2 3 1 6 4 5 8 7 10 11 9
II
-1 -1 2 -2 1 1 -1 1 -1 -1 2
𝟏 𝟐
𝟐
1 1 4 4 1 1 1 1 1 1 4
𝟐

𝟔∗∑ 𝑫𝟐 𝟔∗𝟐𝟎
We know that
𝑵(𝑵𝟐 −𝟏) 𝟏𝟏∗𝟏𝟐𝟎

Hence there is highly positive agreement


between two judges.
Concurrent Deviation Method
• It is based on the direction of change
in the two paired variables. The
correlation coefficient between two
series of direction of change is called
coefficient of concurrent deviation. It
is given by the formula-

it is very simple to understand and


easy to apply. It is very suitable for
large N.
Standard Error & Probable Error
Standard Error: The standard error of
coefficient of correlation is
calculated as follows:
𝟐

Where r= coefficient of correlation


N= number of pairs of observations

Probable Error: The probable error


may be calculated as follows:
Probable error=0.6745*Standard error
Interpretation of
correlation coefficient (r)
S. No. Case Interpretation
1. If ΙrΙ˂6.P.E. The value of r is not at all significant.
There is no evidence of correlation.
2. If ΙrΙ˃6.P.E. The value of r is significant. The
existence of r is practically certain.

Question 4
If r=-0.8 and N=36, calculate
standard error, probable error and
also state whether the value of r is
significant.
Solution of Question No. 4
Given that r=-0.8 and N=36
We know that

. It means the
existence of r is practically significant.
Coefficient of Determination
• The coefficient of determination is
defined as the ratio of Explained
Variance to the total variance.

• The coefficient of non-determination is


defined as the ratio of Unexplained
Variance to the total variance.
Question 5-7
5- If r=0.8, what is the proportion of
variation in the dependent variable
which is explained and not
explained by the independent
variable?
6- Prove that r is significant if N=16,
r=0.71 and P.E. (r)=0.085.
7- State whether r is significant or
not. If N=4 and r=0.3.
Solution of Question No. 5-7
Solution-5 If r=0.8, hence 𝟐 𝟐
it means that
64% of the variation in dependent variable has been
explained by the independent variable.
Solution-6 given that r=0.71, N=16 and P.E. =0.085

. It means the
existence of r is practically significant.
Solution-7 given that r=0.3 and N=4
𝟐

. It means there
is no evidence correlation.
Testing the significance of
the Correlation Coefficient
• The statistical test for the significance of
a correlation coefficient is conducted
using a t- statistic. The hypothesis to be
tested as below:

• Test statistic is given by-


Question 8
• A random sample of 6 pairs of
observations from a normal
population gives a correlation
coefficient of 0.6. Could the
sample come from a population
with zero correlation? Given that-
Degree of freedom 4 5 6

Value of t at 5% level of 2.78 2.57 2.45


significance
Solution of Question No. 8

Given that r=0.6 and n=6; degree of


freedom= (n-2) =6-2=4

Since at 5% the CV of t=1.5 is less than the


TV of t=2.78, hence null hypothesis is
accepted and we may conclude that the
correlation coefficient of population is zero.
Question 9
• From the following data-
X 1 2 3 4 5

Y 10 20 30 50 40

1. Calculate the coefficient of


correlation.
2. Standard Error.
3. Probable Error.
4. Interpretation of the r significant
or not significant.
Solution of Question No. 9
S. No. X Y (𝑿 − 𝑿) (𝐗 − 𝐗)𝟐 (𝒀 − 𝒀) (𝐘 − 𝐘)𝟐 (𝑿 − 𝑿) ∗ (𝒀
− 𝒀)
1 1 10 -2 4 -20 400 40
2 2 20 -1 1 -10 100 10
3 3 30 0 0 0 0 0
4 4 50 1 1 20 400 20
5 5 40 2 4 10 100 20
N=5 𝐗 𝐘 == (𝐗 − 𝐗)𝟐 == (𝐘 − 𝐘)𝟐 (𝐗 − 𝐗) ∗ (𝐘
= 𝟏𝟓 = 𝟏𝟓𝟎 = 𝟏𝟎 = 𝟏𝟎𝟎𝟎 − 𝐘) = 𝟗𝟎
∑𝐗 𝟏𝟓 ∑𝐘 𝟏𝟓𝟎
𝐧 = 𝟓; 𝐗 = = =𝟑& 𝐘= = = 𝟑𝟎
𝐧 𝟓 𝐧 𝟓
∑(𝐗 − 𝐗) ∗ (𝐘 − 𝐘) 𝟗𝟎
𝐫= = = 𝟎. 𝟗
∑(𝐗 − 𝐗)𝟐 ∗ ∑(𝐘 − 𝐘)𝟐 √𝟏𝟎 ∗ √𝟏𝟎𝟎𝟎
1- Hence there is highly positive correlation between two variables that is
r=0.9
𝟏−𝐫 𝟐 𝟏−𝟎.𝟖𝟏
2-
√𝐍 𝟓
3-
4-
It means there is
clear evidence correlation.
Regression Analysis
• Regression is the measure of
average relationship between two or
more variables. Regression analysis
is a statistical tool to study the
nature and extent of functional
relationship between two or more
variables and to estimate the
unknown value of dependent
variable on the basis of known value
of independent variable.
For example- Sales and
Advertisement expenditure
Fitting the Regression Lines-
Least Squares Method (Y on X)
• The standard form of regression line Y on
X is given by- Y=a + b. X ------(1)
where Y= Dependent Variable
X=Independent Variable
a and b are constants and calculated by
using the following Normal Equations-
ƩY=N. a + b. ƩX
ƩXY=a. ƩX + b. ƩX2
Calculating the values of a and b from
these two equations and then put into
equation no. (1), it is the required
regression line.
Fitting the Regression Lines-
Least Squares Method (X on Y)
 The standard form of regression line X on
Y is given by- X=C + d. Y ------(2)
where X= Dependent Variable
Y=Independent Variable
c and d are constants and calculated by
using the following Normal Equations-
ƩX=N. c + d. ƩY
ƩXY=c. ƩY + d. ƩY2
 Calculating the values of c and d from
these two equations and then put into
equation no. (2), it is the required
regression line.
Regression Line- By Regression
Coefficient Methods (Y on X)
The standard form of regression
line Y on X is given by-

 Calculating the values of bYX ; means of


X and Y and then putting these in
equation number (1), we have the
required regression line.
Regression Line- By Regression
Coefficient Methods (X on Y)
The standard form of regression
line X on Y is given by-

Calculating the values of bXY ;


means of X and Y and then putting
these in equation number (2), we
have the required regression line.
Standard Error of Estimate (S)
 The standard error of estimate measures
the dispersion about an average line
called regression line. The standard error
of X values from XC-

 The standard error of Y values from YC-

 Correlation coefficient
Total Variation in Y and X
Total variation in
Total variation in
Unexplained Variation in
Unexplained Variation in
Unexplained
Explained Variation in
Explained Variation in
 Total variation = Unexplained variation + Explained variation

𝟐 𝟐 𝟐
𝑪 𝑪

𝟐 𝟐 𝟐
𝑪 𝑪
Testing the Significance of β
 The hypothesis to be tested for the slope
coefficient is given as-

 The test to be used to test the significance of β


is given by-

Where = estimated value of beta or (b-


regression coefficient)
Goodness of Fit of Regression Line
 The test for the goodness of fit is done by using
F statistic. The hypothesis to be tested is-

 The test statistic F is given by-

 k=number of parameters to estimated


 n= number of observations
Question 10
Following data are related to advertisement
expenditure and sales of a company-
Advertisement Expenditure in Lakhs (X) 1 2 3 4 5

Sales (Y) 10 20 30 50 40

You are required-


1. Find the Regression Equations or Lines (Y on X).
2. Calculate Total Variation in Y.
3. Calculate Unexplained Variation in Y.
4. Calculate Explained Variation in Y.
5. Calculate Standard Error of Estimate.
6. Calculate Sales when Advertisement
Expenditure is 8 Lakhs.
Solution of Question No.10
S. No. X Y 𝟐 𝟐

1 1 10 -2 4 -20 400 40
2 2 20 -1 1 -10 100 10
3 3 30 0 0 0 0 0
4 4 50 1 1 20 400 20
5 5 40 2 4 10 100 20
N=5 == 𝟐 == 𝟐

𝒀𝑿

𝐘𝐗 𝟐

1. Hence the equation will be


2. Total variation in 𝟐
Solution of Question No.10
S. No. X Y 𝑪 𝐂 𝐂
𝟐

1 1 10 12 -2 4
2 2 20 21 -1 1
3 3 30 30 0 0
4 4 50 39 11 121
5 5 40 48 -8 64
Total 190
3-Unexplained vatiation in 𝐂
𝟐

4-Total variation in Y=Explained Variation+Unexplained


Variation
Hence Explained Variation in Y=1000-190=810
∑(𝐘−𝐘𝐂 )𝟐 𝟏𝟗𝟎
5-Standard Error of estimate 𝒀𝑿 𝑵 𝟓

6-Put x=8 lakhs in Regreesion line


Question 11
• Following are the data of quantity demanded and
the price of the commodity-
Years 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Demand 100 75 80 70 50 65 90 100 110 60
Price 5 7 6 6 8 7 5 4 3 9

1. Calculate the value of r between the above two.


2. Test the significance of r at 5% level of
significance.
3. Fine the linear regression line and interpret the
same.
4. Test the slope of beta of regression line.
5. Compute r2 and interpret the same.
6. Test the significance of r2 at 5% level of
significance.
Solution of Question No.11 (1)
Years Price(X) Demand(Y) (𝑿 − 𝑿) (𝐗 − 𝐗)𝟐 (𝒀 − 𝒀) (𝐘 − 𝐘)𝟐 (𝑿 − 𝑿) ∗ (𝒀 − 𝒀)
2006 5 100 -1 1 20 400 -20
2007 7 75 1 1 -5 25 -5
2008 6 80 0 0 0 0 0
2009 6 70 0 0 -10 100 0
2010 8 50 2 4 -30 900 -60
2011 7 65 1 1 -15 225 -15
2012 5 90 -1 1 10 100 -10
2013 4 100 -2 4 20 400 -40
2014 3 110 -3 9 30 900 -90
2015 9 60 3 9 -20 400 -60
N=10 𝐗 𝐘 = 𝟖𝟎𝟎 == (𝐗 − 𝐗)𝟐 == (𝐘 − 𝐘)𝟐 (𝐗 − 𝐗) ∗ (𝐘
= 𝟔𝟎 = 𝟑𝟎 = 𝟑𝟒𝟓𝟎 − 𝐘)
= −𝟑𝟎𝟎
∑ 𝐗 𝟔𝟎 ∑ 𝐘 𝟖𝟎𝟎
𝐧 = 𝟏𝟎; 𝐗 = = =𝟔& 𝐘= = = 𝟖𝟎
𝐧 𝟏𝟎 𝐧 𝟏𝟎
∑(𝐗 − 𝐗) ∗ (𝐘 − 𝐘) −𝟑𝟎𝟎
𝐫= = = −𝟎. 𝟗𝟑
∑(𝐗 − 𝐗)𝟐 ∗ ∑(𝐘 − 𝐘)𝟐 √𝟑𝟎 ∗ √𝟑𝟒𝟓𝟎
Solution of Question No.11 (2-3)
To test the statistical significance of r,
we use the following test
𝟎
𝐚
Given that r=-0.93 and n=10; degree of freedom= (n-2) =10-2=8

𝟖
𝟐

Since at 5% the CV of 𝟖 is greater than the TV of 𝟖 ,


hence null hypothesis is rejected and we may conclude that the
correlation coefficient is significant.
======================================================
𝐘𝐗
∑(𝐗 − 𝐗) ∗ (𝐘 − 𝐘) −𝟑𝟎𝟎
𝐛𝐘𝐗 = = = −𝟏𝟎
∑(𝐗 − 𝐗)𝟐 𝟑𝟎
Hence the equation will be
Solution of Question No.11 (4)
Years Price(X) Demand(Y) 𝒀𝑪 = 𝟏𝟒𝟎 − 𝟏𝟎𝑿 (𝒀𝒄 − 𝒀)𝟐 (𝐘 − 𝐘𝐂 ) (𝐘 − 𝐘𝐂 )𝟐
2006 5 100 90 100 10 100
2007 7 75 70 100 5 25
2008 6 80 80 0 0 0
2009 6 70 80 0 -10 100
2010 8 50 60 400 -10 100
2011 7 65 70 100 -5 25
2012 5 90 90 100 0 0
2013 4 100 100 400 0 0
2014 3 110 110 900 0 0
2015 9 60 50 900 10 100
N=10 𝐗 = 𝟔𝟎 𝐘 = 𝟖𝟎𝟎 800 3000 0 450

∑(𝐘−𝐘𝐂 )𝟐 𝟒𝟓𝟎
Standard Error of estimate 𝒀𝑿 𝑵−𝟐 𝟖
Solution of Question No.11 (4-5)
To test the significance of slope coefficient, the
following hypothesis is to be tested-
𝟎
𝐚
𝑺𝒀𝑿 𝟕.𝟓
Standard error
(𝑿−𝑿)𝟐 √𝟑𝟎
𝜷−𝜷 −𝟏𝟎−𝟎
The t statistic will be 𝒏−𝟐 𝑺𝑬(𝜷) 𝟏.𝟑𝟕

Since at 5% the CV of 𝟖 is greater than the TV


of 𝟖 , hence null hypothesis is rejected and we
may conclude that the price affects the quantity
demand significantly.
The value of coefficient of determination is given by-
𝟐
𝟐 𝑪
𝟐

This means that 87% of the variations in quantity


demanded explained by price.
Solution of Question No.11 (6)
To test the statistical significance of

The calculated value of F (1, 8)=53.54 is greater than the table


value of F (1, 8)=5.32 at 5% level of significance, hence null
hypothesis is rejected and we may conclude that is
significant at 5% level of significance.
With the Best Compliments
Thanking to beloved all…..
By

Dr. Shambhu Nath Singh


M Sc (Physics), MBA (Finance), MA (Economics)
JRF in Management, NET in Economics

Coordinator Ph. D. (Coursework)


Bundelkhand University, JHANSI
Mobile-09450075770; 08299233527 (WhatsApp)
[email protected]

You might also like