0% found this document useful (1 vote)
140 views48 pages

Egression & Orrelation: Nalysis

The document discusses regression and correlation analysis. It provides an overview of the history and significance of regression and correlation. It describes various methods of studying correlation including scatter plots, Pearson's correlation coefficient, frequency tables, and Spearman's rank correlation coefficient. It also discusses types of correlation, regression analysis, multiple regression, and applications. Examples and case studies are provided to illustrate different techniques.

Uploaded by

Mrinal Sandbhor
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
140 views48 pages

Egression & Orrelation: Nalysis

The document discusses regression and correlation analysis. It provides an overview of the history and significance of regression and correlation. It describes various methods of studying correlation including scatter plots, Pearson's correlation coefficient, frequency tables, and Spearman's rank correlation coefficient. It also discusses types of correlation, regression analysis, multiple regression, and applications. Examples and case studies are provided to illustrate different techniques.

Uploaded by

Mrinal Sandbhor
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

REGRESSION & CORRELATION

ANALYSIS
By
ASHWINKUMAR POOJARY - 09
ATANU MANDAL - 10
E.G. PRASANT - 16
JIBIN GEORGE - 22
MRINAL SANDBHOR - 31
RACHIT GOR - 42
SHAUNAK NADIG - 44
Overview

 History & Significance

 Methods of Correlation

 Karl Pearson’s Correlation Coefficient

 Types of Correlation

 Spearman’s Rank Correlation Coefficient

 Regression Analysis

 Multiple Regression & Correlation Analysis

 Applications & Conclusions
History

 Earliest form of Regression -
Legendre
“Method of Least Squares” by
Legendre in 1805 and by Gauss
in 1809.


 Astronomical Observations.
Carl Friedrich Gauss


 Gauss–Markov Theorem in 1821
by Guass and Andrey Markov.

Andrey Markov

 Term “Regression” coined
by Francis Galton in the 19thth
century.


 Biological Phenomenon.
Francis Galton


 His work was later extended
by Udny Yule and Karl
Pearson.

Udny Yule Karl Pearson


R. A. Fisher

Assumption weakened by R.A.Fisher in his


works of 1922 & 1925.
Significance of the Study

 Gives the degree of relationship between price and
supply, income and expenditure, etc.


 Enables to estimate the value.


 Economic behaviour and Business.
Methods of Studying
Correlation
Methods of Studying Correlation

 Scatter Plots

 Karl Pearson’s Coefficient(Covariance
Method)

 Two way Frequency Table(Bivariate
Method)

 Rank Method
Scatter
Scatter Plots
Plots


 Simplest tool of ascertaining correlation
between two variables.

 Helpful where large no. of data points are
there.

 Provide Information about the relationship

 Between the 2 variables Strength, Shape,
Direction, Presence of Outliers.
Scatter Plots Contd...
Scatter Plots Contd…
Scatter Plots Contd...
Scatter Plots Contd...
Scatter Plots Contd…

 Disadvantages :


 Not suitable if no. of observation is fairly large


 Doesn’t provide the exact measure of the extent
of relationship between two variables


 Rigorous method of obtaining the line of best fit
Covariance Method
Population Correlation Coefficient

Sample Correlation Coefficient


Properties

 r= + 1  Positive correlation between the
variables

 r= -1 negative correlation between the
variables

 Correlation coefficient is independent to the
change of origin

 2 independent variables are uncorrelated
but converse is not true.
Inference

 Aim is to test the null hypothesis that the true
correlation coefficient is ρ, based on the value of
the sample correlation coefficient r.


 The other aim is to construct a confidence interval
around r that has a given probability of containing
ρ.
Case Study

 Nilesh is a private tutor in Pune. He teaches maths
& statistics to 10 students. He took a class test on
both the subjects and found that marks in maths
are
45,70,65,30,90,40,50,75,85,60 & marks in
statistics are
35,90,70,40,95,60,80,80,50.
He wanted to find relationship between marks
obtained by students in both the subjects. So he
asked 1 of his meticulous student to help him in
find the correlation between the marks.
Solution

The meticulous student found that coefficient of correlation


is 0.9 using the above formula & probable error is 0.243
which is more than the “r”, thus correlation is significant.

Interpretation-Higher marks of Maths of a student, higher


the score in statistics & vice a versa. It doesn’t mean that
students who are good in maths are poor in statistics. The
coefficient of correlation expresses the relationship between
two series & not between the individual items of the series.
Correlation in Bivariate Frequency Table
Types of Correlation

 Positive and Negative

 Simple, Partial and Multiple

 Linear and Non Linear


Positive and Negative
Correlation

 Correlated.

 Positively correlated

 Negatively correlated
•Simple Correlation

•Partial Correlation

•Multiple Correlation
•Dependent Variables

•Independent variables
Rank Correlation Method
Spearman’s Correlation Coefficient

Non-parametric-

A perfect Spearman correlation results when X and Y are


related by any monotonic function can be contrasted with
the Pearson correlation, which only gives a perfect value
when X and Y are related by a linear function.

Exact sampling distribution can be obtained without


requiring knowledge of the joint probability distribution of
X and Y.
Case Study

 BMC office has 12 clerks. These long-serving clerks feel
that they should have a seniority increment based on
length of service built into their salary structure. An
assessment of their efficiency by their departmental
managers and the personnel departments produces a
ranking of efficiency. This is shown below together
with a ranking of their length of service.
Ranking according to 1 2 3 4 5 6 7 8 9 10 11 12
length of service
Ranking according to 2 3 5 1 9 10 11 12 8 7 6 4
efficiency


 Do the data support the clerks claim for seniority
increment ?

 Applying the formula

6 x 178
12(144 – 1)

1068
= 0.378
1716
Rank Correlation Method

 Merits

 Easy to understand & apply

 Only applied to qualitative things

 Doesn’t assume the parent population from
which observation is drawn is normal.


 Demerits

 Cannot be applied for quantitative things

 Not practicable in case of bi variate frequency
distribution table.
Regression Analysis

 Regression is the measure of average


relationship b/w two or more variables.

 Regression Analysis in the general sense


means the estimation or prediction of
unknown value of one variable from the
known value of one or more other
variables.
 SIMPLE REGRESSION

 MULTIPLE REGRESSION

 LEAST SQURE METHOD

 REGRESSION COEFFECIENT
Simple Regression

Linear Regression between two


variables
Least Square Method

Minimizes the sum of squares of the vertical


distances from the observed points to the
line.
Regression Coefficient
• Regression coefficient y on x

• Regression coefficient x on y
Regression Coefficients - Formulas

byx = cov( x,y)/ var(x)


bxy = cov( x,y)/var(y)

Regression eq: of y on x
Regression eq: of x on y
Case Study
The adjoining table shows the number of motor
registrations in hero Honda motor company in Cochin.

From 2006-2010 and the sale of motor tyres by MRF in


Cochin for the same period.

Estimate and infer the sale of tyres in MRF Cochin. When


motor registration happened in Hero Honda Cochin is
known.

Also estimate the sale of tyres happened in MRF Cochin


When there are 850 motor registration happened in Hero
Honda Cochin unit?
YEAR X Y
2006 600 1250
2007 630 1100
2008 720 1300
2009 750 1350
2010 800 1500

 Let ‘x’ denote the no: of motor registration happened in
Hero Honda Cochin unit


 Let ‘y’denote the no: of tyres sold in MRF Cochin unit


 We have to find regression eq: of y on x


 HERE NO: OF TYRES SOLD IN MRF IS 1524
Multiple Regression Analysis
The principles of Simple Regression Analysis can be extended to
two or more explanatory variables.
With two explanatory variables we get an equation
Y = α + β1X1 + β2X2. . It is customary to write it as Y = β0 +β1X1 +
β2 X 2

As an example, if a hypotensive agent is administered prior to


surgery, recovery time for blood pressure to normal value will
depend on the dose of the hypotensive and the blood pressure
during surgery.

This can be modelled as Recovery time = log dose – Surgery B.P.


Categorical Explanatory Variables

 Binary variables are coded 0, 1. For example a
binary variable x11(‘Gender’) is coded male = 0,
female = 1.
Recovery time for Blood Pressure and
dose of hypotensive
The scatter plot shows a Recovery time for Blood Pressure and dose of hypotensive
RecvTime = -14.2576 + 8.00772 Logdose

linear relationship. Blood S = 14.7103 R-Sq = 15.5 % R-Sq(adj) = 13.8 %

Pressure takes longer to 70

come back to normal 60

value the larger the dose


50

RecvTime
40

of the hypotensive. 30

There are many outliers


20

10 Regression

because of individual 0
95% CI

variability of subjects and 2.5 3.5 4.5 5.5 6.5

because of different types Logdose

of surgical operations.
Recovery time for Blood Pressure and
lowest Blood Pressure reading during
surgery
Recovery time for Blood Pressure and lowest B.P. reading during surgery
The lower the blood RecvTime = 34.4692 - 0.183546 Bpsurg

S = 15.9386 R-Sq = 0.8 % R-Sq(adj) = 0.0 %

pressure achieved 70

60

during surgery the 50

RecvTime
longer the time for it
40

30

to reach normal value 20

10
Regression

during recovery from


95% CI

50 60 70 80 90

anaesthesia Bpsurg
Multiple Regression Analysis

The effects of the two explanatory variables


acting jointly is described by the equation
Recov. Time = 22.3 + 10.6 Log dose – 0.740
Surg. B.P.

As noted on the scatter plots several


observations had outliers or larger than
expected X values.
Application of Correlation and
Regression
Applications of regression are numerous and
occur in almost every field, including:

 - Engineering

 - Physical sciences

 - Economics

 - Management

 - Life and biological sciences

 - Social sciences
Case Study
The General sales Manager of Pvt. Ltd. Enterprise dealing
in the sales of readymade men’s wear- is toying with the
idea of increasing his sales of ` 80000.On checking the
records of sales during the last 10yrs,it was found that the
annual sale proceeds and advertisement exp were highly
correlated to the extend of 0.8.it was further noted that the
annual average sales has been `45000 and annual average
advertisement exp `30000,with the variances of `1600 and
`625 in the advertisement expenses respectively.

In view of the above ,how much exp on advertisement


would you suggest the general sales manager of the
enterprise to incur to meet his target of sales?

Assume advertisement expenses “y” as the
dependent variable.

Assume sales “x” as the independent
variable

The regression equation Advertisement
exp. On sale given by
Solution
Conclusion

• Regression- can learn the relationship between


several independent variables and a dependent
variable.

• Regression- can estimate the unknown parameters


of regression model

• It also can be use for forecasting the response


variable and these predictions are helpful in
planning the project.
THANK YOU

PROFESSOR SINIMOLE K R
SFIMAR
ST. FRANCIS INSTITUTE OF MANAGEMENT & RESEARCH, MUMBAI

You might also like