0% found this document useful (0 votes)
11 views5 pages

Correlation and Regression Notes

The document discusses correlation and regression analysis, explaining how correlation measures the relationship between variables without implying causation, while regression analysis predicts the dependent variable based on independent variables. It covers types of correlation (positive, negative, zero) and types of regression (simple linear, multiple, logistic), detailing their applications and formulas. The document emphasizes the importance of understanding these statistical methods for forecasting and analyzing relationships between variables.

Uploaded by

Ajas Km
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Correlation and Regression Notes

The document discusses correlation and regression analysis, explaining how correlation measures the relationship between variables without implying causation, while regression analysis predicts the dependent variable based on independent variables. It covers types of correlation (positive, negative, zero) and types of regression (simple linear, multiple, logistic), detailing their applications and formulas. The document emphasizes the importance of understanding these statistical methods for forecasting and analyzing relationships between variables.

Uploaded by

Ajas Km
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

CORRELATION AND REGRESSION ANALYSIS

Correlation
It is a statistical measure which shows the relationship between two or more
variables moving in the same direction or in opposite direction. With correlation, two or
more variables may be compared to determine if there is a relationship and to measure the
strength of that relationship. The correlation coefficient gives the strength of relationship
between the variables.

 Correlation gives degree and direction of relationship


 Correlation does not require an independent (predictor) variable
 Correlation results do not explain why the relation occurs

The correlation may be either positive, negative or zero. The first role of correlation
is to determine the strength of relationship between the two variables represented on the
x-axis and y-axis. The measure of this magnitude is called the correlation co-efficient. The
data required to compute this coefficient are two continuous measurements (x, y) obtained
on the same entity.
If there is a perfect relationship, a straight line can be drawn through all the data
points. The greater the change in y for a constant change in x, the steeper the slope of the
line. In a less than perfect relationship between two variables, the closer the data points are
located on a straight line, the stronger the relationship and greater the correlation
coefficient. In contrast, a zero correlation would indicate absolutely no linear relationship
between the two variables.

Positive Correlation
One variable increases with increase of the other or decreases with decrease of the
other. Eg: Body temperature and pulse.
Negative Correlation
One variable increases with decrease of the other or decreases with increase of the
other. Eg: Insulin and blood sugar.
Zero Correlation
There is no relation between the variables.
The Coefficient of Correlation
A measure of the strength of linear relationship between two variables that is
defined in terms of the covariance of the variables divided by their standard deviations.
Covariance (x, y)
Correlation coefficient, r =
(S.D. of x ) ( S.D. of y)

The following formulas gives the result of correlation coefficient.

Spearman Rank correlation =

r = 0.85 with n = 5 is not a statistically significant correlation.


r = 0.55 with n = 40 is statistically significant correlation.
Regression Analysis
In regression analysis, researchers control the values of at least one of the variables
and assign objects at random to different levels of these variables. Where correlation simply
described the strength and direction of the relationship, regression analysis provides a
method for describing the nature of the relationship between two or more continuous
variables. Correlation coefficient can support the interpretation associated with regression.
If a linear relationship is established, the magnitude of the effect of the independent
variable can be used to predict the corresponding magnitude of the effect on the dependent
variable.
Regression analysis is a form of predictive modeling technique which investigates the
relationship between a dependent (response) and independent variable(s) (predictor). This
technique is used for forecasting, time series modeling and finding the causal effect
relationship between the variables.
Regression analysis is a statistical method to estimate or predict the values of one
variable (dependent variable) for the given values of independent variable.
> Dependent variable is to be estimated or predicted (response)
> Independent variable is the given variable (predictor)
Example: weight of a baby depends on age.
So age is the independent variable whereas weight is dependent variable.

Uses of Regression Analysis


 Describe one variable with level of other
 Understanding association eg: birth wt. & gestation
 Identify the variable which influence a particular one
 Prediction of dependent variable for given values of independent variable
 To identify the abnormal values or outliers

Types
 Simple Linear Regression (1 response – 1 predictor)
 Multiple Regression (1 response – Many predictors)
 Logistic Regression (Any response or predictors – Nominal / Ordinal)

1. Simple Linear Regression (1 response – 1 predictor)


The dependent variable is continuous, independent variable can be continuous or
discrete, and nature of regression line is linear. Linear is used to denote that the relationship
between two variables can be described by a straight line. With linear regression, a
relationship is established between the two variables and a response for the dependent
variable can be made based on a given value for the independent variable. For example
Injury Severity Score can be used to predict length of hospital stay.
Linear Regression Line
Linear regression is involved with characteristics of a straight line or linear function.
A regression line is computed that best fits between the data points. The line can be
estimated from sample data. In the simple linear regression design, there are only two
variables (x and y). The x-axis represents the independent variable and the y-axis, the
dependent outcome. The first step is to draw a straight line that best fits (distance between
data points and straight line are minimum) between the points. The slope of the line and its
intercept of the y-axis are then used for the regression calculation. This line can be
illustrated as follows :

The calculations involved in the regression line equation can be performed by using the
following values.

The above equations can also be represented as follows :

Regression equation of x on y :

Regression equation of y on x :

Predict x (response) given y (predictor) , it is the regression line of x on y


Predict y (response) given x (predictor), then regression line of y on x
2. Multiple Regression (1 response – Many predictors)
The dependent variable (response) is predicted by using several independent
variables (predictors) You could use multiple regression to understand whether exam
performance can be predicted based on revision time, test anxiety and lecture attendance.
The difference between simple linear regression and multiple linear regression is
that, multiple linear regression has (>1) independent variables, whereas simple linear
regression has only 1 independent variable.
A multiple regression model that relates a y-variable to n -1 predictor variables is
written as

yi=β0 + β1 xi,1 + β2 xi,2 + ……. + βn−1 xi,n−1+ϵi


The β coefficients indicate the relative importance of the various independent
predictor variables.
yi the dependent (response) , xi’s are independent (predictors).

3. Logistic Regression (Any response or predictors – Nominal / Ordinal)


This is the regression model in which the dependent variable is not continuous, ie, it
is categorical. Independent variables can be continuous or discrete, and nature of regression
line is linear. For example Smoking habit (Yes/No) can be used to predict COPD (Yes/No).
Binomial Logistic Regression
When the dependent (variable to predict) is binary (only two levels), eg : Yes/No
Multinomial Logistic Regression
When the dependent (variable to predict) is have more than two levels
eg : Opinion : Agree/Disagree/Neutral

You might also like