0% found this document useful (0 votes)
26 views

Correlation and Regression

This document discusses correlation and regression analysis. It defines correlation as the relationship between two variables, which can be positive if both variables increase/decrease together, or negative if one increases as the other decreases. Methods for determining correlation like scatter diagrams and Karl Pearson's coefficient are introduced. Regression analysis describes the influence of one variable on another using a linear regression model to estimate values. Examples are provided to demonstrate calculating and interpreting correlation coefficients and regression equations.

Uploaded by

tilak chauhan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Correlation and Regression

This document discusses correlation and regression analysis. It defines correlation as the relationship between two variables, which can be positive if both variables increase/decrease together, or negative if one increases as the other decreases. Methods for determining correlation like scatter diagrams and Karl Pearson's coefficient are introduced. Regression analysis describes the influence of one variable on another using a linear regression model to estimate values. Examples are provided to demonstrate calculating and interpreting correlation coefficients and regression equations.

Uploaded by

tilak chauhan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Correlation: Introduction

More often than not, we are required to study the behavior or relationship between two variables
such as weight and cholesterol, weight and height of a person, tallness in parents and tallness of
children, intelligent quotient in brother and sister, advertising expenditure and sales etc.
In other words, in addition to studying a variable in isolation, it is also necessary to study the
behavior of two variables simultaneously. We say two variables (say X and Y) to be related if
change in one variable is accompanied by change in the other. Mathematically it is written as
𝑦 = 𝑓(𝑥)
Further if the relation is linear; mathematically it is written as

𝑦 = 𝑓(𝑥) = 𝑎 + 𝑏𝑥
then we say variable to be correlated. Correlation analysis is method to find strength of linear
relationship.

Correlation: Types

If both variable increase (or decrease) together, then the correlation is said to be positive e.g.
height and weight, income and expenditure, supply and price etc.
The more time you spend running on a treadmill, the more calories you will burn.
The less time I spend marketing my business, the fewer new customers I will have.
As the temperature goes up, ice cream sales also go up.
On the other hand, if the changes in the variables are in opposite directions which means that
increase in one of them is reflected by decrease in the other and vice versa, correlation is said to
be negative. e. g. demand and price, number of strikes and industrial production etc.
The more you eat out at restaurants, the less you'll cook food at home.
The more time you spend at work, the less time you'll have to pursue your extracurricular hobbies.
The more cigarettes a person smokes, the less healthy they become.

Method of determining correlation: Scatter Diagram

The pairs of values of X and Y are represented by points plotted on a graph paper. The graph so
obtained is called a scatter diagram. By studying it, conclusions can be drawn about the
correlation.
Method of determining correlation: Karl Pearson’s Coefficient

The coefficient gives numerical measure of nature and extent of correlation. For random variables
X and Y the correlation coefficient for sample pair of data is given by
𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∑ 𝒚
𝒓=
√𝒏 ∑ 𝒙𝟐 − (∑ 𝒙)𝟐 √𝒏 ∑ 𝒚𝟐 − (∑ 𝒚)𝟐

It is a pure number independent of the units of measurement of X and Y. It always lies between -1
& 1.
Depending upon the values of r, we have the following conclusion.
If r = 1 correlation is perfect positive.
If 0 < r < 1 then correlation is positive.
If r = o then there is absence of correlation.
If -1 < r < 0 then correlation is negative.
If r = -1 then correlation is perfect negative.

Note: Correlation coefficient have the following properties.


• Independent of scale of measurement
• Independent of origin of measurement
• Symmetric i. e, the correlation of x and y is same as the correlation of y and x.
Example (1) Zippy cola is studying the effect of its latest advertising campaign. People chosen at
random were called and asked how many cans of Zippy cola they had brought in the past week
and how many Zippy cola advertisement they either read or seen in the past week.

Number of ads : 2 3 7 4 2 0 4 1
Can purchased : 8 11 8 9 4 7 6 3

Draw scatter diagram. Compute the correlation coefficient. Interpret the correlation coefficient
within the context of this problem.

Example (2) A department of transportations study on driving speed and mileage for midsize
automobile resulted in the following data.

Driving speed : 30 50 40 55 30 25 60 25 50 55
Mileage : 28 25 25 23 30 32 21 35 26 25
Compute and interpret the correlation coefficient within the context of this problem.

Method of determining correlation: Spearman's Rank Coefficient


Suppose that a random sample of n pairs of observation is taken. Each values of the characteristic
are ranked ascending / descending order then these ranks can be used to calculate correlation
coefficient called the Spearman rank correlation coefficient and is given by the formula,

𝟔 ∑ 𝒅𝟐
𝑹=𝟏−
𝒏(𝒏𝟐 − 𝟏)
where the d are the differences of the ranked pairs.
Example(3) A survey was conducted in 9 areas of the USA t investigate the relationship between
Divorce Rate (y) and residential Mobility (x); Divorce rate the annual number of divorce per 1000 in
the population and the Residential Mobility is measured by the percentage of the population who
have moved house in the last 5 years. Calculate the rank correlation coefficient.

Residential Mobility 40 38 46 49 47 43 51 57 55

Divorce Rate 3.9 3.4 5.2 4.8 5.6 5.8 6.6 7.6 5.8

Example(4) The following data is the association between the IQ of each adolescent in a sample
with the number of hours they listen to rock music per month. Determine the strength of the
correlation between IQ and rock music using Spearman’s rank correlation. Comment on the
results.
IQ 99 120 98 102 123 105 85 110 117 90
Hours 2 0 25 45 14 20 15 19 22 4

Regression Analysis: Introduction

Correlation coefficients are used to provide a measure of the strength of any linear association
between a pair of random variables. Regression is a Statistical Model which describes the strong
influence of one independent variable (The Cause) on another dependent variable (The Effect).
Let independent [Cause] Variable be denoted by X and dependent [Effect] Variable is Y. The
estimated linear regression model is given by
̂ = 𝒂 + 𝒃𝑿
𝒀

where 𝒂 and 𝒃 are to be estimated from a sample of bi-variate data of the characteristic X and 𝑌.

𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∑ 𝒚
𝒃=
𝒏 ∑ 𝒙𝟐 − (∑ 𝒙)𝟐
And
∑𝒀 ∑𝑿
𝒂= −𝒃
𝒏 𝒏
Example (1) The level of infant mortality (y) is represented by the number of baby deaths for every
1000 births. For 12 areas these are shown in the following table. For each area, the percentage (x)
of babies born into families earning at least £25,000 is also shown.
Area A B C D E F G H I J K L
Percentage (x) 20 6 10 21 12 36 6 19 26 13 21 16
Infant mortality (y) 5 17 16 8 15 5 25 12 11 11 7 12

Calculate the regression equation by the least square method and use it to estimate infant
mortality with area, the percentage (x) of babies born into families earning at least £25,000 is 20.
Example (2) The marketing manager of a large supermarket chain would like to use shelf space to
predict the sales for pet food. A random sample of 12 equal sized stores is selected, with the
following results.
Store 1 2 3 4 5 6 7 8 9 10 11 12
Shelf Space (X) (feet) 5 5 5 10 10 10 15 15 15 20 20 20

Weekly sales (Y)($) 160 220 140 190 240 260 230 270 280 260 290 310

Calculate the regression equation by the least square method and use it to estimate weekly sales
of a store with shelf apace of 25 feet's.
Properties of Regression

(1) Generally, there are two regression equations:


̂ = 𝒂 + 𝒃𝑿
𝒀

Known as regression of 𝑌 on X ( 𝑏 is known as regression coefficient of 𝑌 on X ). Here we assume


X as independent variable and 𝑌 as dependent variable depends on 𝑋 and is used to predict the
value of 𝑌 when the value of X is known.
Similarly, we can define the other regression equations:

̂ = 𝒂, + 𝒃, 𝒀
𝑿

Known as regression of X on 𝑌 ( 𝑏 , is known as regression coefficient of X on 𝑌 ). Here we assume


𝑌 assume as independent variable and 𝑋 as dependent variable depends on 𝑌 and is used to
predict the value of 𝑋 when the value of 𝑌 is known.

Example (4) If the line 𝑋 = 7.83 − 0.68 𝑌 is the least squares regression line of Weight of new
born babies (𝑋) on number of cigarettes smoked by mothers (𝑌) , Predict the weight of a baby
whose mother smoked 50 cigarettes a day

(2) Two regression lines passes through the mean of variables 𝑋 and 𝑌 . Thus, two regression
lines intersect each other at mean of variables 𝑋 and 𝑌

The arithmetic means of variables 𝑋 and 𝑌 can be obtained by solving the two regression
equations simultaneously.
Example (5) Given the two regression lines 𝟓𝑿 + 𝟔𝒀 = 𝟏𝟔𝟎 and 𝑿 + 𝟐𝒀 = 𝟒𝟎, find the
arithmetic mean of 𝑋 and 𝑌

(3) Regression coefficients 𝒃 and 𝒃, always takes the same sign and is same as the sign of
correlation coefficient between the two variables 𝑋 and 𝑌. Also

𝑪𝒐𝒓𝒓𝒆𝒍𝒂𝒕𝒊𝒐𝒏 𝒄𝒐𝒆𝒇𝒇𝒊𝒆𝒏𝒕 = 𝒓 = √𝒃 × 𝒃,

Example (6) Given the regression of 𝑌 on X is given by 𝑌 = 4.16𝑋 + 397.33 and regression of
X on 𝑌 is given by 𝑋 = 0.065𝑌 − 6.35 . Find the correlation coefficient between X and 𝑌 .

You might also like