Correlation and Linear Regression
Correlation and Linear Regression
AND LINEAR
REGRESSION
BHAVYA BHUVNESH
BHAVYA VAID
BHUPENDER SINGH
DIVYA RAMACHANDRAN
GAURAV KASHYAP
CORRELATION ANALYSIS
Advantage of measuring association (or correlation) between two or more variable are
as under :
1. Aids in locating the critically important variables on which each other depends.
2. Reduce the range of uncertainty of our prediction. The prediction based on correlation
analysis will be more reliable and near reality.
3. In economic theory we come across several types of variables which shows some kind
of relationship for eg. There exists a relationship between Price, supply, quantity
demanded.
4. Convenience, amenities and service are related to customer satisfaction.
5. In area of health care such as how health care such as how health problems are
related to certain biological or environmental factors.
A statistical technique that is used to analyze the strength
(Magnitude) and direction of the relation between two
quantitative variables is called correlation analysis.
Coefficient of correlation- It is a number that indicates the
strength and direction of statistical relationship between two
variable.
1. r- Both x and y variable are measured on an interval or ratio
scale (numeric data)
The correlation between two ratio scale variable is represented
by letter “r” which takes on value between -1 and +1 only.
Sometimes this measure is called the Pearson product moment
correlation or correlation coefficient.
Methods of correlation analysis
Methods of finding the correlation coefficient between two
variable X and Y
1. Scatter diagram method
2. Karl’s Pearson coefficient correlation method
Question
The following data relate to age of employees and the number of days they
reported sick in a month. Calculate karl Pearson’s coefficient of correlation and
interpret it.
Testing procedure 12
Example 13
REGRESSION ANALYSIS
It reveals average relationship between two or more variables and provides
mechanism for prediction or forecasting.
There are two types of variables –Independent and Dependent
This is a linear relationship of the form Y= a + bx
REGRESSION LINES
If we take two variables X and Y , we have two regression lines as
the regression line of X on Y and regression line of Y on X.
Regression line of Y on X gives most probable values of Y for given
values of X in the form of Y= a+bX.
Regression line of X on Y gives the most probable values of X for
given values of Y in the form of the equation X= a+bY.
TYPES OF REGRESSION MODELS
SIMPLE AND MULTIPLE REGRESSION MODEL
SIMPLE: If regression model characterizes a dependent variable X and only one independent variable Y
then it is a simple regression model.
MULTIPLE: If more than one independent variables are associated then it’s a multiple regression model.
y^ = b0 + b1 x
where ^y called y hat is the value of lying on the fitted regression for a
given x value and e1 = yi – y^i is called the residual that describes the
error in fitting of the regression line to the observation yi .The fitted
value y^ is called the predicted value of y because if actual value of y is
not known, then it would be predicted for a given value of x using the
estimated regression line.
Assumptions for a simple linear
regression model
y^ = a + bx
where y^ = estimated average value of dependent variable y for a given
value of independent variable x
a or b0 = y- intercept that represent average value of y^
b = slope of regression line that represents the expected change
in the value of y for units change in the value of x.
Properties of regression
coefficients
1. The correlation coefficient is the geometric mean of two regression
coefficients that is , r = 𝑏𝑦𝑥 ∗ √𝑏𝑥𝑦
2. If one regression coefficient is greater than one, then other
regression coefficient must be less than one because the value of
correlation coefficient r cannot exceed one.
3. Both regression coefficients must have the same sign ( either
positive or negative). This property rules out the case of opposite
sign of two regression coefficients.
4. The correlation coefficients will have the same sign ( either positive
or negative) as that of the two regression coefficients.
METHODS TO DETERMINE
REGRESSION COEFFICIENTS
Deviation Method