0% found this document useful (0 votes)

34 views7 pages

Correlation and Regression

Uploaded by

laishramrohesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views7 pages

Correlation and Regression

Uploaded by

laishramrohesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

e-notes on Correlation and regression

Prepared by: T. Loidang Chanu

Assistant Professor, Statistics
BEAS department, CAEPHT, CAU, Ranipool, Sikkim
Covariance: Covariance indicates how two variables are related. Positive covariance
indicates the two variables are positively related and a negative covariance means the
variables are inversely related. The formula for calculating covariance of sample data is-
̅)(𝒚𝒊 −𝒚
𝜮(𝒙𝒊 −𝒙 ̅)
Cov(x,y)=
𝒏−𝟏
1
= 𝜮𝒙𝒊 𝒚𝒊 − 𝒙̅ 𝒚
̅
𝑛−1
Where 𝑥𝑖 and 𝑦𝑖 refers to the values of two variables for ith observation, x̅ and y̅ are means
of two variables. “n” is the number of data points in the sample.
Note: a) Cov(x,y)>0,then there is a +ve relationship
b) Cov(x,y)<0,then there is a -ve relationship
c) Cov(x,y) does not tell us much about strength of such a relationship because it is
affected by changes in the units of measurement. To avoid this disadvantage of the
covariance, we standardized the data.
Correlation coefficient of : Let x and y be any two random variables (discrete and
continuous) with standard deviation of 𝜎𝑥 𝑎𝑛𝑑𝜎𝑦 respectively. The correlation coefficient of
x and y, denoted by cor(x,y) or 𝜌𝑥𝑦 is defined as-
𝑐𝑜𝑣(𝑥,𝑦) 𝜎𝑥𝑦
𝜌𝑥𝑦 = 𝑐𝑜𝑟(𝑥𝑦) = =
𝜎𝑥 𝜎𝑦 𝜎𝑥 𝜎𝑦
1
̅)(𝒚𝒊 −𝒚
𝛴(𝒙𝒊 −𝒙 ̅)
= 1
𝑛
1
̅)}2 ∑{(𝒚𝒊 −𝒚
√ ∑{(𝒙𝒊 −𝒙 ̅)}2
𝑛 𝑛

𝜮𝒙𝒊 𝜮𝒚𝒊
𝜮𝒙𝒊 𝒚𝒊 −
= 2 −𝛴𝑥𝑖
𝒏
∑𝑦 ̅)2 = 𝛴𝑥𝑖2 −
⸪ (𝒙𝒊 − 𝒙
𝜮𝒙𝒊
𝒏
{𝛴𝑥𝑖 𝑛 }{∑ 𝑦𝑖2 − 𝑛 𝑖 }

𝜮𝒙𝒊 𝒚𝒊 −𝒏𝒙̅ 𝒚
̅
= 2 2 simillarly, 𝛴(𝒙𝒊 − 𝒙
̅)(𝒚𝒊 − 𝒚
̅)
(𝛴𝑥𝑖 −𝑛𝑥̅ )((𝛴𝑦𝑖 −𝑛𝑦̅ 2 )
2

𝜮𝒙𝒊 𝜮𝒚𝒊
=𝜮𝒙𝒊 𝒚𝒊 − 𝒏

Note: 1. The cov. Between the standardized x and y data is called correlation between x and
y.
2. It may be noted that cor(x,y) or r(x,y) provides a linear relationship between x and y.
3.Karl Pearson’s correlation coefficient is also called “product- moment correlation
coefficient since
Cov(x,y)= E[{x-E(x)}{y-E(y)}]=𝜇11
Scatter diagram: It is the simplest way of diagrammatic representation of bivariate data.
Thus, for the bivariate distribution (𝑥𝑖 , 𝑦𝑖 ); i= 1,2,…,n, if the values of the variable x and y
are plotted along the x-axis and y- axis respectively in x-y plane, the diagram so obtain is
known as “scatter diagram”. From the scatter diagram we can form a fairy good, though
vague, idea whether the variables are correlated or not, e.g, if points are very dense, i,e, very
close to each other, we should expect a fairly good amount of correlation between variables
and if the points are scattered, a poor correlation is expected. This method, however, is not
suitable if the number of observations is fairly large.
Scatter diagram and correlation coefficient:

Testing the significance of correlation coefficient (𝝆 = 𝟎)

To test the hypothesis that the correlation coefficient of the bivariate normal population is
zero, we can use the observed correlation coefficient (r) in the sample of n pairs of
observation in t- statistic as
𝑟
t= 1−𝑟 2 √𝑛 − 2 with (n-2 ) d.f
Note: 𝐻0 : 𝜌 = 0
𝐻1 : 𝜌 ≠ 0
If the null hypothesis (𝐻0 ) is accepted, we conclude that the variables may be regarded as
uncorrelated in the population.
Properties of correlation coefficient:
(a) The value of correlation coefficient is a pure number which is independent of the units
of measurement of the two variables.
(b) The value of the correlation coefficient lies between -1 and +1.
(c) 𝑟 2 , the square of the correlation coefficient is referred to as co efficient of
determination
(d) The quantity (1- 𝑟 2 ) is referred to as coefficient of alienation.
(e) The square root of correlation coefficient gives the proportion of common variance
not shared between two variables and called coefficient of alienation.
Q. Calculate the correlation coefficient for the following heights (in inches) of fathers (x) and
their son (y)
X: 65 66 67 67 68 69 70 72
Y: 67 68 65 68 72 72 69 71
Solution:
x y 𝑥2 𝑦2 xy
65 67 4225 4489 4355
….
…..
….

…
….

𝑥̅ =………….., 𝑦̅ = …………..
1
𝑐𝑜𝑣(𝑥,𝑦) 𝛴(𝒙𝒊−𝒙
̅ )(𝒚 −𝒚
𝒊
̅)
Cor(x,y)= =𝑛
𝜎𝑥 𝜎𝑦 𝜎𝑥 𝜎𝑦

Regression Analysis: In the correlation analysis we have discussed the degree of relationship
without considering which is cause and which is the effect. In regression analysis there are
two types of variables. The variable whose value is influenced or is to be predicted is called
dependent variable and the variable which influences the values or used for prediction is
called independent variable.
In regression analysis independent variable is also known as regressor or predictor or
explanatory variable while the dependent variable is also known as regressed or explained
variable. In regression analysis we find an algebraic function of the form y=f(x) i,e we
express the dependent variable as a function of the independent variable. Thus regression
analysis makes possible to estimate or predict the unknown values of dependent variables for
known values of independent variables.
The term regression literally means “stepping back towards average”. It was
first used by a British biometrician. Sir Francis Galton (1882-1911) a cousin of Charles
Darwin in connection with the inheritance of stature.

He was interested in predicting the height of son based on height of father.

Looking at the scatter plots of these heights, Galton saw that the trend was linear and
increasing. After fitting a line to these data. He observed that the fathers whose heights
were taller than the average, the regression line predicted that taller fathers tended to
have shorter sons and shorter fathers tended to have taller sons. There is a regression
towards the mean.

Linear Regression: Linear Regression analysis is the method for predicting

values of one or more response variable (dependent variables) from one or more
predictor variables (or independent variables).

In regression analysis there are two types of variables. The variable whose
value is influenced or is to be predicted is called dependent variable is also known as
repressor or predictor or explanatory while the dependent variable is also known as
response or explained variable.

An economist may want to investigate the relationship between expenditure

and income. Let us see what factors does a household consider when deciding how
much money it should spend on food every day or every week or every month. The
income of the household is the main factor and many other variables also effect the
food expenditure i.e., family size, preferences on food and other household items etc.
These all are explanatory variables because the all vary independently and they
explain the variation in expenditure among different households.

Types of regression models: There are two basic types of regression. They
are given below:

i) Simple linear regression

ii) Multiple linear regression
i) Simple linear Regression model: Let us tentatively assume that the regression line
of variable Y on X has the form 𝛽0 + 𝛽1.Then we can write the linear regression model
Y=𝛽0 + 𝛽1 𝑋+ ε,
where:
Y is called response variable (or dependent variable) and
X is called predictor variable (or independent variable).
𝛽0and𝛽1 – structural parameters,
ε – random component.
If it is appropriate to think of X is determined independently of the equation and
to think of Y as explained by the equation, then X is often called on exogenous variable
and y and endogenous variable; in this case X is called and explanatory variable. More
generally, X is also reference to as regressor. When we fit the line, 𝑦 = 𝑎 + 𝑏𝑥 to data,
we speaking y on X.

In the design of n-observations performed on Y and X it is written as follows:

Y= 𝛽0 + 𝛽𝑖 𝑋𝑖+ ε , i=1,2,…. n

Variable may be quantitative (e.g., income) to qualitative (e.g., rural, urban

residence). Quantitative variable may be either discrete (e.g., no. of children ever born)
or continuous (e.g., time). Qualitative variables are also called categorical variables or
factors. In regression analysis all variables whether quantitative or qualitative, are
presented numerically. Qualitative variables are presented by dummy variables.

Generally speaking, a model is description of the relationships connecting the

variables of the interest. The process of model-building consists of putting together a
set of formal expression of these relationship to the point was the behaviour of these
model adequately mimics the behaviour of the system.

The choice of mathematical form of the model (including choice of variables

and the statement of underlying assumptions) is referred to as model specification.
The term specification error is used to indicate an incorrect model specification.

Simplifying assumptions:

i) Linearity: The random error term ∈𝑖 has a mean equal to zero for each x.
when mean value of ᵋ is zero, the mean value of y for a given x is equal to
𝐸(𝑦𝑖 ) = 𝛽0 + 𝛽1 𝑥
ii) Homoscedasticity: this equal varience assumption means that in the
underlying population, the varience of the variable 𝑦𝑖 denoted by 𝜎 2 , is the
same at each 𝑋 = 𝑥𝑖 . Equivalently, the variance of ∈𝑖 is 𝜎 2 at each 𝑋 = 𝑥𝑖 .
iii) Independence: The error term ∈𝑖 are statistically independent. We
called 𝑦𝑖 = 𝛽0 + 𝛽1 +∈𝑖 the underlying population model and 𝑦𝑖 = 𝛽0 + 𝛽1 +
𝑒𝑖 , the estimated model.
iv) Normality: This assumption specifies that the distribution of ∈𝑖 values
should be normal.

Coefficient of determination:

We may ask “ How well does the independent variable explain the dependent in
the regression model” the coefficient of determination is one concept that answer
this question. The coefficient of determination is way to measure the contribution
of independent variable in predicting dependent variable. It is denoted by the
symbol 𝑟 2 . Its value lies between zero to one (0≤ 𝑟 2 ≤ 1). If x contributes
information for predicting y, 𝑟 2 will be greater than zero. When x contributes no
information for predicting y , 𝑟 2 will be near to zero. In regression with single
independent variable 𝑟 2 is same as the square of the correlation between
dependent and independent variable.

PROPERTIES:

(1) Correlation coefficient between two variables x and y is the geometric mean of the two
regression coefficients 𝑏𝑥𝑦 𝑎𝑛𝑑 𝑏𝑦𝑥 . This is known as fundamental property of
regression coefficients i,e 𝜌2 = 𝛽𝑦𝑥 . 𝛽𝑥𝑦

⇒ 𝜌 = √𝛽𝑦𝑥 . 𝛽𝑥𝑦
For sample,
r=√𝑏𝑥𝑦 . 𝑏𝑦𝑥
(2) The signs of regression coefficients and correlation coefficients are always the same.
This is known as signature property of regression coefficients
(3) If 𝛽𝑦𝑥 > 1 ⇔ 𝛽𝑥𝑦 < 1
(4) If the variable x and y are independent, the regression coefficients are zero. This is
known as independent property of regression co efficients.
Notes:(1)Multiple regression suffers from multicollinearity , autocorrelation and
heteroscedasticity

(2) Linear regression is very sensitive to outliers. It may terribly affect the regression line
and eventually the forecasted values.

(3) multicollinearity exist when two or more of the predictors in the regression model
are moderately or highly correlated.

(4) Autocorrelation: it is also known as serial correlation. It is the similarity between

observations as a function of the time lag between them

t-test for testing the significance of an observed regression coefficient: Here the problem is
to test if a random sample (𝑥𝑖 , 𝑦𝑖 ), (i=1,2…n) drawn from a bivariate normal population in
which regression coefficient of y on x is β. This time regression line of y on x (for the given
sample) is:
µ11
Y−𝑦̅ = 𝑏(𝑋 − 𝑥̅ ), 𝑏 = 𝜎𝑥2

The estimate of Y for a given value say 𝑥𝑖 of X as given by the line

𝑦̂ = 𝑦̅ + 𝑏(𝑥𝑖 − 𝑥̅ )

Under 𝐻0 that the regression coefficient is β, Prof. R.A Fisher Proved that the statistics –

(𝑛−2) ∑𝑖(𝑥𝑖 −𝑥̅ )2 1/2

t = (b -β ) ( ∑𝑖(𝑦𝑖 −𝑦̅)2
)

follows t-distribution with (n-2) d.f

Correlation Regression
100% (1)
Correlation Regression
25 pages
1999 - Monte Carlo Statistical Methods
No ratings yet
1999 - Monte Carlo Statistical Methods
521 pages
Decision Making in Operations Management
0% (1)
Decision Making in Operations Management
52 pages
Correlation
100% (1)
Correlation
29 pages
W8GS
100% (1)
W8GS
8 pages
Econometrics For Finance
100% (1)
Econometrics For Finance
54 pages
MGB 3rd Edition (Part 1 - Stat 121 122 - Chap 1 To 4) PDF
No ratings yet
MGB 3rd Edition (Part 1 - Stat 121 122 - Chap 1 To 4) PDF
187 pages
QBW M1V2Ch12 ENG Vw5Kosuc
No ratings yet
QBW M1V2Ch12 ENG Vw5Kosuc
21 pages
Correlation Anad Regression
No ratings yet
Correlation Anad Regression
13 pages
Univariate Continuous Distribution Theory - m347 - 1
No ratings yet
Univariate Continuous Distribution Theory - m347 - 1
30 pages
Correlation
0% (1)
Correlation
22 pages
Easy Education: ECON 2123
No ratings yet
Easy Education: ECON 2123
10 pages
Akritas Probability & Statistics With R For Engineers and Scientists
No ratings yet
Akritas Probability & Statistics With R For Engineers and Scientists
256 pages
Chapter 5
100% (1)
Chapter 5
22 pages
Block 2
No ratings yet
Block 2
111 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
Correlation Regression
No ratings yet
Correlation Regression
58 pages
NOTES SQQS1043 CHAPTER 5 - Student
No ratings yet
NOTES SQQS1043 CHAPTER 5 - Student
74 pages
Regression and Correlation
No ratings yet
Regression and Correlation
28 pages
Unit III Part B
No ratings yet
Unit III Part B
31 pages
G11 Statistics and Probability Q3 M1
No ratings yet
G11 Statistics and Probability Q3 M1
10 pages
Correlation and Regression
No ratings yet
Correlation and Regression
32 pages
CH 6
No ratings yet
CH 6
43 pages
Complex Analysis, Statistical Methods and Probability (CASMP)
No ratings yet
Complex Analysis, Statistical Methods and Probability (CASMP)
2 pages
Linear Regression Analysis - 1
No ratings yet
Linear Regression Analysis - 1
18 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Stat Chapter 6
No ratings yet
Stat Chapter 6
23 pages
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
10 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
34 pages
RM Chap 18 Bivariate Analysis
No ratings yet
RM Chap 18 Bivariate Analysis
30 pages
G10 Probability
No ratings yet
G10 Probability
29 pages
Correction
No ratings yet
Correction
10 pages
Attribute Selection Presentation by - Rohit Ghosh
No ratings yet
Attribute Selection Presentation by - Rohit Ghosh
11 pages
Sem 6 - DSV - Unit 4 - Sampling and Estimation
No ratings yet
Sem 6 - DSV - Unit 4 - Sampling and Estimation
50 pages
Week 12+13
No ratings yet
Week 12+13
47 pages
CH 6
No ratings yet
CH 6
42 pages
CH VII - Regression & Correlation
No ratings yet
CH VII - Regression & Correlation
7 pages
Topic 5-Lecture Notes
No ratings yet
Topic 5-Lecture Notes
12 pages
Correlation & Regression
No ratings yet
Correlation & Regression
26 pages
SL 4.8 Binomial Distribution
No ratings yet
SL 4.8 Binomial Distribution
28 pages
Regression Analysis
No ratings yet
Regression Analysis
6 pages
Correlation, Regression & Curve Fitting
No ratings yet
Correlation, Regression & Curve Fitting
6 pages
Stat II Chapter 6
No ratings yet
Stat II Chapter 6
11 pages
Regression and Correlation (Ch.14) )
No ratings yet
Regression and Correlation (Ch.14) )
7 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
SM 38
No ratings yet
SM 38
28 pages
Probability and Stochastic Processes: Reza Pulungan
No ratings yet
Probability and Stochastic Processes: Reza Pulungan
29 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Ascher 1987
No ratings yet
Ascher 1987
6 pages
Regression
No ratings yet
Regression
7 pages
Brown Playful Scrapbook Digital Marketing Presentation
No ratings yet
Brown Playful Scrapbook Digital Marketing Presentation
11 pages
Simple Linear Correlation-1
No ratings yet
Simple Linear Correlation-1
15 pages
Regression & Correlation 230224 221642
No ratings yet
Regression & Correlation 230224 221642
9 pages
HW1 Group4 Advanced Probability
No ratings yet
HW1 Group4 Advanced Probability
9 pages
Homework 1
No ratings yet
Homework 1
7 pages
DSC 402
No ratings yet
DSC 402
14 pages
Probability and Statistics: B Madhav Reddy Madhav.b@srmap - Edu.in
No ratings yet
Probability and Statistics: B Madhav Reddy Madhav.b@srmap - Edu.in
17 pages
Bivariate Analysis
No ratings yet
Bivariate Analysis
10 pages
CH 5 - Correlation and Regression
No ratings yet
CH 5 - Correlation and Regression
9 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Review Chapter 3
No ratings yet
Review Chapter 3
3 pages
Regression Correlation
No ratings yet
Regression Correlation
22 pages
Inferential Statistics Lecture 4
No ratings yet
Inferential Statistics Lecture 4
4 pages
Business Statistics Method: by Farah Nurul Aisyah (4122001020) Jasmine Alviana Zalzabillah (4122001070)
No ratings yet
Business Statistics Method: by Farah Nurul Aisyah (4122001020) Jasmine Alviana Zalzabillah (4122001070)
35 pages
Notes On Exponential Distribution: January 2008
No ratings yet
Notes On Exponential Distribution: January 2008
13 pages
Correlation and Regression
No ratings yet
Correlation and Regression
5 pages
Oe Statistics Notes
No ratings yet
Oe Statistics Notes
32 pages
ASS#1-FINALS Doromal
No ratings yet
ASS#1-FINALS Doromal
8 pages
Chapter - Six
No ratings yet
Chapter - Six
8 pages
ECN 652 Handout 9 Student
No ratings yet
ECN 652 Handout 9 Student
46 pages
Unit 3-1
No ratings yet
Unit 3-1
12 pages
Regression: Simple Linear Regression Model
No ratings yet
Regression: Simple Linear Regression Model
16 pages
How Can We Explore The Association Between Two Quantitative Variables?
No ratings yet
How Can We Explore The Association Between Two Quantitative Variables?
7 pages
Convergence of Martingales: 1. Maximal Inequalities
No ratings yet
Convergence of Martingales: 1. Maximal Inequalities
12 pages
HW 2
No ratings yet
HW 2
3 pages
Handout 5 Correlation and Regression (Recovered)
No ratings yet
Handout 5 Correlation and Regression (Recovered)
6 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Correlation and Regression
No ratings yet
Correlation and Regression
4 pages
Probability Distribution (Report)
No ratings yet
Probability Distribution (Report)
19 pages
15 MAY - NR - Correlation and Regression
No ratings yet
15 MAY - NR - Correlation and Regression
10 pages
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
No ratings yet
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
14 pages
M Jeanblanc-Picqué 1995 Russ. Math. Surv. 50 R03
No ratings yet
M Jeanblanc-Picqué 1995 Russ. Math. Surv. 50 R03
22 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
Research-Methodology-Litrature-Review of Fii N Fdi 2003
No ratings yet
Research-Methodology-Litrature-Review of Fii N Fdi 2003
12 pages
Tables Exam 20212022
No ratings yet
Tables Exam 20212022
3 pages
Correlation & Simple Regression
No ratings yet
Correlation & Simple Regression
15 pages
Simple Linear Regression and Correlation Analysis: Chapter Five
No ratings yet
Simple Linear Regression and Correlation Analysis: Chapter Five
5 pages