0% found this document useful (0 votes)

61 views8 pages

Correlation and Regression

This document discusses correlation and regression analysis. Correlation measures the association between two variables, which can be positive, negative, or no linear association. Regression analysis considers the relationship between a dependent variable and one or more independent variables to find a model that predicts the dependent variable from the independent variables. An example uses height as an independent variable and weight as a dependent variable to demonstrate correlation and regression analysis steps and output.

Uploaded by

MD AL-AMIN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views8 pages

Correlation and Regression

Uploaded by

MD AL-AMIN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Correlation and Regression

Correlation analysis measures the association between two or more variables. In this present section, we
will only consider the association between two variables. It has broadly two types:
1. Two variables are positively associated when large values of one variable tend to occur with large
values of the other variable, and small values tend to occur together as well. As one variable
increases in value, the other variable tends to increase in value. As one variable decreases in value,
the other variable tends to decrease in value.
2. Two variables are negatively associated when large values of one variable tend to occur with small
values of the other variable, and vice versa. As one variable increases in value, the other variable
tends to decrease in value. Or, as one variable decreases in value, the other variable tends to
increase in value.
Strength of association:
◼ If there is a strong linear association, the scatterplot points will tend to fall along a straight line.
◼ If there is a weak linear association, the scatterplot points will be highly variable about the possible
trend line.
◼ There is no linear association if the trend line appears to be horizontal.

Correlation coefficient:
Pearson correlation coefficient, denoted by r , measures the direction and strength of the linear relationship
between two quantitative variables. It is measured as
r=
 ( xi − x )( yi − y ) =  xi yi − nxy
 ( xi − x ) 2  ( yi − y ) 2  ( xi 2 − nx 2 )( yi 2 − ny 2 )
◼ The value of r is always between –1 and +1.
◼ –1 is a perfect negative linear relationship or association between the variables.
◼ +1 is a perfect positive linear relationship.
Measuring the strength:
◼ Values of r near 0 indicate a weak linear association.
◼ The strength of the association increases as you move away from 0 toward either –1 or +1.
◼ Values close to –1 or +1 indicate the points in the scatterplot are close to a straight line

r=1 r>0
10 10
9
8 8
7
6 6
y

5
4 4
3 3
2 2
1 1

2 4 6 8 10 0 2 4 6 8

x
r = -1 r<0
11 10
10 9
9 8
8 7
7
y

6 5
5
4 3
3
2 1

2 4 6 8 10 0 2 4 6 8

Now we want to test that is there is any relation between height and weight. For solving this problem, we
follow the following steps:
Step 1: Analyze→Correlate→Bivariate
Step 2: Bring the height and weight into the Variables box→Click OK

Output:

Here we see that Pearson correlation coefficient is 0.672. This implies that there exists moderate positive
linear relationship between the variables height and weight. The row of Sig (2-tailed) refers to p-value for
the test of population correlation to be equal to zero. Small p-value indicates that threre exist significant
correlation between the variables weight and height in the population.

Regression Analysis

The primary difference between correlation and regression analysis is that in former case we do not
consider any dependent and independent variables. However, in regression analysis we consider one
dependent variable and one or more independent variables and we would like to find the relationship
between these two types of variables. That is we would like to find the changing pattern of the dependent
variable (expressed as y) for change in the independent variable(s) (expressed as x).
Other names of independent variables are: predictors, regressors.
We express the relationship between x and y in terms of a model, which we call regression model. To
explain this, let us consider that there is one independent variable and one dependent variable.

The relationship between x and y is expressed as a regression model y=+x+

Here,  means intercept
 means slope which represents the amount of change of y for a unit change in x
 is called disturbance term or error term. This represents that influence of all the factors other than
x on the dependent variable y.

The scatterplot of x and y is given below. We would like to make a line to represent the relationship. This
straight line is called fitted regression line and is expressed as 𝑦̂ = 𝑎 + 𝑏𝑥.
r=1 r>0
10
𝑦̂ = 𝑎 + 𝑏𝑥.
8

4
3
2
1

4 6
We obtain the8desired10line by Least
0 2
Squares 4 and thus
method 6 the intercept
8 and slope obtained are called
least squares estimates. The xestimates are expressed by the formula:

∑ 𝑥𝑦−𝑛𝑥̅ 𝑦̅
𝑏= ∑ 𝑥 2 −𝑛𝑥̅ 2
and
𝑎 = 𝑦̅ − 𝑏𝑥̅
After finding the values of a and b, we can predict the average value of y for a given value of x.

For practical implementation, let us consider our previous data set as in correlation analysis. Now we
consider height as independent variable and weight as dependent variable. We perform regression analysis
using the following steps:

Step 1: Analyze→Regression → Linear

Step 2: Bring the variable weight in Dependent box and height in Independent(s) box. Click OK.
Output:

Model Summary

Adjusted R Std. Error of the

Model R R Square Square Estimate

1 .463a .215 .194 6.93584

a. Predictors: (Constant), height

ANOVAa

Model Sum of Squares df Mean Square F Sig.

1 Regression 499.665 1 499.665 10.387 .003b

Residual 1828.025 38 48.106

Total 2327.690 39

a. Dependent Variable: weight

b. Predictors: (Constant), height

Coefficientsa

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 113.744 14.635 7.772 .000

height .727 .226 .463 3.223 .003

a. Dependent Variable: weight

Let us interpret the above three tables, namely Model Summary, ANOVA and Coefficients.
R square, as shown in the first table, is called coefficient of determination. Here R2=0.215 indicates that
21.5% variation in the dependent variable (weight) has been explained by the independent variable
(height). Therefore, other factors which are not included in the model are responsible for the rest amount of
1-0.215 = 0.785 or 78.5% variation in the dependent variable.
Second table indicates that the regression model is significant as the pvalue (sig) is smaller than 0.05. In
other words, height has significant effect on weight.
The third table reports the estimated intercept and slope coefficients (column of B) to be 113.744 and 0.727
respectively. The estimated slope of 0.727 indicates that if height increases for one unit (cm), then on an
average weight increases by 0.727 units (pounds).
The last column of the third table indicates the pvalues (here Sig.) for testing whether the population
intercept and slope are different from zero. As both the pvalues (here 0.000 and 0.003) are smaller than
0.05, we can conclude that the population slope and intercept terms are significantly different from zero.

Multiple regression analysis:

When there are more than one independent variables, we call the model multiple linear regression model.
Suppose for the above example, we would like to add the variable age in the list of independent variables.
That is, we would like to fit a regression model of weight on age and height. We will run the analysis and
then interpret the results.
Step 1: Analyze → Regression →Linear
Step 2: Bring the variable weight in Dependent box. Bring age and height in Independent(s) box. Click
OK.

Output:

Model Summary

Adjusted R Std. Error of the

Model R R Square Square Estimate

1 .513a .263 .223 6.81000

a. Predictors: (Constant), height, age

ANOVAa

Model Sum of Squares df Mean Square F Sig.

1 Regression 611.775 2 305.888 6.596 .004b

Residual 1715.915 37 46.376

Total 2327.690 39

a. Dependent Variable: weight

b. Predictors: (Constant), height, age

Coefficientsa

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 115.120 14.397 7.996 .000

age .320 .206 .253 1.555 .129

height .530 .255 .338 2.076 .045

a. Dependent Variable: weight

The table of Model Summary indicates that the R square value is 0.263. Therefore, 26.3% variation in the
variable weight has been explained by the independent variables height and age.

The second table (ANOVA) indicates that the regression is significant, as the pvalue is 0.004. In other
words, at least one of the independent variables has significant effect on the dependent variable (weight).

The third table (Coefficients) shows the regression coeffieints with pvalues and other relevant statistics.
We can see from the last column (pvalue) that the variable height has significant effect as the pvalue is
0.045 which is less than 0.05. However, age does not have significant effect on weight as pvalue (0.129) is
greater than 0.05. The regression coefficient of height is 0.53, which indicates that one unit (cm) increase
in height results in 0.53 unit (pounds) increase in weight when the other variable (age) is in the model.
Selecting subset regression:
Sometimes our data set consists of pretty many independent variables (predictors) and we would like to
choose best subset of predictors. This process is called subset selection. There are three ways of doing so,
namely forward selection, backward elimination and stepwise regression.
The present data set has very few independent variables, we do not implement the method. However, we
can demonstrate the process. While running the regression analysis as done before, just selet Method
Stepwise in the following window:

Residual plots
We need residual analysis for finally comment about the fitted model. We follow the same process of
regression analysis, but nedd to produce some plots as shown below:
Step 1: Analyzse → Regression → Linear

Step 2: Choose dependent and independent Step 3: Bring ZPRED in X and ZRESID in Y.
variable.
Also select Normal Probability Plot for Residual Plots.
Click on the tab Plots Click Continue, then OK.
Output:

If the data comes from Normal distribution, the points in Normal P-P plot should lie on the straight line.
For small data set, some deviation is alright. For the second plot of standardized residuals versus predicted
values, there should not be any pattern for a good fitted model. There seems to be two possible outliers.

Mx3ipg2a PDF
No ratings yet
Mx3ipg2a PDF
2 pages
Unit 3 - Notes
No ratings yet
Unit 3 - Notes
32 pages
HELM Workbook 43 Regression and Correlation
No ratings yet
HELM Workbook 43 Regression and Correlation
32 pages
RAM Guide 080305
No ratings yet
RAM Guide 080305
266 pages
Lec 4
No ratings yet
Lec 4
57 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
Correlation and Regression
No ratings yet
Correlation and Regression
32 pages
Regression Analysis Using SPSS: DR Somesh K Sinha
100% (1)
Regression Analysis Using SPSS: DR Somesh K Sinha
17 pages
Linear Regression
No ratings yet
Linear Regression
30 pages
Corelation With Example
No ratings yet
Corelation With Example
112 pages
Lecture 8 Correlation and Linear Regression
No ratings yet
Lecture 8 Correlation and Linear Regression
66 pages
Dr. Sufian M. Salih / Regression and Correlation
No ratings yet
Dr. Sufian M. Salih / Regression and Correlation
14 pages
Week 8 2025 - Correlation and Regression
No ratings yet
Week 8 2025 - Correlation and Regression
47 pages
Biostatistics Lect 7b - 112025
No ratings yet
Biostatistics Lect 7b - 112025
50 pages
11 Regression JASP
100% (1)
11 Regression JASP
35 pages
Share MBBS Lecture 5 (1) - 1
No ratings yet
Share MBBS Lecture 5 (1) - 1
40 pages
Correlation Regression Tutorial
No ratings yet
Correlation Regression Tutorial
42 pages
Notes For Chapter 5-6
No ratings yet
Notes For Chapter 5-6
27 pages
Acc Gearbox Manual
100% (1)
Acc Gearbox Manual
47 pages
Week 12+13
No ratings yet
Week 12+13
47 pages
Chapter 12
No ratings yet
Chapter 12
12 pages
Stats10 - Chapter+4 2
No ratings yet
Stats10 - Chapter+4 2
54 pages
Lecture 6
No ratings yet
Lecture 6
16 pages
Regression&Corr&Annova
No ratings yet
Regression&Corr&Annova
71 pages
Canela (2019) Coeficiente de Correlación
No ratings yet
Canela (2019) Coeficiente de Correlación
9 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Correlation
No ratings yet
Correlation
22 pages
Screenshot 2023-12-04 at 11.27.14
No ratings yet
Screenshot 2023-12-04 at 11.27.14
32 pages
5 Predicting Data
No ratings yet
5 Predicting Data
7 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Regression: Simple Linear Regression Model
No ratings yet
Regression: Simple Linear Regression Model
16 pages
HD 42
No ratings yet
HD 42
11 pages
Simple Linear Regression
100% (1)
Simple Linear Regression
50 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Fikret Isik - Lecture Notes For Statistics Session - IUFRO Genetics of Host-Parasite Interactions in Forestry - 2011
No ratings yet
Fikret Isik - Lecture Notes For Statistics Session - IUFRO Genetics of Host-Parasite Interactions in Forestry - 2011
47 pages
Correlation
No ratings yet
Correlation
13 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
18 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
No ratings yet
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
16 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
6 Correlation and Linear Regression
No ratings yet
6 Correlation and Linear Regression
32 pages
Correlation and Regression
No ratings yet
Correlation and Regression
10 pages
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
No ratings yet
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
4 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
@regression
No ratings yet
@regression
33 pages
Unit 6 Machine Learning Algorithms
No ratings yet
Unit 6 Machine Learning Algorithms
13 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
SAFE Tutorial v. 12 Ingles
No ratings yet
SAFE Tutorial v. 12 Ingles
112 pages
Corr - Regression Analysis
No ratings yet
Corr - Regression Analysis
19 pages
Correlation Regression And: Learning Outcomes
No ratings yet
Correlation Regression And: Learning Outcomes
16 pages
Unit 6
No ratings yet
Unit 6
17 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
91 pages
Basic Statistics (3685) PPT - Lecture On 22-01-2019
No ratings yet
Basic Statistics (3685) PPT - Lecture On 22-01-2019
29 pages
Biostat Lecture Note 3
No ratings yet
Biostat Lecture Note 3
5 pages
Hyderabad
No ratings yet
Hyderabad
43 pages
5 Bivariate Data. Double The Data, Double The Fun: 5.1 Covariance and Correlation
No ratings yet
5 Bivariate Data. Double The Data, Double The Fun: 5.1 Covariance and Correlation
10 pages
Lecture 6 Linear Regression
No ratings yet
Lecture 6 Linear Regression
8 pages
3 STAT-602 Regression & Correlation
No ratings yet
3 STAT-602 Regression & Correlation
4 pages
Correlation and Regression
No ratings yet
Correlation and Regression
4 pages
Chapter 6 Student
No ratings yet
Chapter 6 Student
21 pages
Customer Master Data, Customer Account Group & Regarding Customer Sales Area
No ratings yet
Customer Master Data, Customer Account Group & Regarding Customer Sales Area
18 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
Signals and Systems Question Bank
No ratings yet
Signals and Systems Question Bank
4 pages
Regression and Correlation
100% (1)
Regression and Correlation
9 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Understanding Correlation and Regression Analysis in SPSS - 2024
No ratings yet
Understanding Correlation and Regression Analysis in SPSS - 2024
5 pages
MCA NewProgrammeGuideCRCFinalJanuary2022
No ratings yet
MCA NewProgrammeGuideCRCFinalJanuary2022
81 pages
Salesforce 100 Interview
No ratings yet
Salesforce 100 Interview
2 pages
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes
No ratings yet
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes
9 pages
XenApp 6.5 Advanced Administratoin - Student Manual
No ratings yet
XenApp 6.5 Advanced Administratoin - Student Manual
310 pages
Regression
No ratings yet
Regression
3 pages
Apple Inc.: Poenaru Alexandru Cristian
No ratings yet
Apple Inc.: Poenaru Alexandru Cristian
36 pages
MB 0044
No ratings yet
MB 0044
8 pages
E Commerce Marketing
No ratings yet
E Commerce Marketing
11 pages
Train Ticketing System
No ratings yet
Train Ticketing System
65 pages
Hype It Guidebook: @goldsheepdesign @steph - Sonic
No ratings yet
Hype It Guidebook: @goldsheepdesign @steph - Sonic
8 pages
A Brief History of FORTRAN:Fortran
No ratings yet
A Brief History of FORTRAN:Fortran
3 pages
Dex2jar Steps
No ratings yet
Dex2jar Steps
6 pages
Writing Application Letters, CV, Communication Letters, Text Messages, E-Mails, and Memoranda
No ratings yet
Writing Application Letters, CV, Communication Letters, Text Messages, E-Mails, and Memoranda
32 pages
Unit 5
No ratings yet
Unit 5
41 pages
6.probability Distribution by Pranto 1104
No ratings yet
6.probability Distribution by Pranto 1104
19 pages
2.measures of Variation by Shakil-1107
No ratings yet
2.measures of Variation by Shakil-1107
18 pages
Sensebot Log
No ratings yet
Sensebot Log
13 pages
MapReduce Patterns, Algorithms, and Use Cases - Highly Scalable Blog
No ratings yet
MapReduce Patterns, Algorithms, and Use Cases - Highly Scalable Blog
7 pages
Sigma Rules in Technical Writing
No ratings yet
Sigma Rules in Technical Writing
4 pages
16344
No ratings yet
16344
3 pages
Experiment No: 03: Name of The Experiment
No ratings yet
Experiment No: 03: Name of The Experiment
4 pages
6021A
No ratings yet
6021A
2 pages
COPIA 2-Plantilla Con Formulas V3-1-InTRENA .
No ratings yet
COPIA 2-Plantilla Con Formulas V3-1-InTRENA .
7 pages
Od 427305939568238100
No ratings yet
Od 427305939568238100
1 page
OpenAI Five Network-Architecture
No ratings yet
OpenAI Five Network-Architecture
1 page
Ibm Devops and Software Engineering: Sahish Pandav
No ratings yet
Ibm Devops and Software Engineering: Sahish Pandav
1 page
500E25 - Use of Certificate Symbols-00000003
No ratings yet
500E25 - Use of Certificate Symbols-00000003
2 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Correlation and Regression

Uploaded by

Correlation and Regression

Uploaded by

Correlation and Regression

The relationship between x and y is expressed as a regression model y=+x+

Step 1: Analyze→Regression → Linear

Adjusted R Std. Error of the

1 .463a .215 .194 6.93584

a. Predictors: (Constant), height

Model Sum of Squares df Mean Square F Sig.

1 Regression 499.665 1 499.665 10.387 .003b

Residual 1828.025 38 48.106

a. Dependent Variable: weight

Model B Std. Error Beta t Sig.

1 (Constant) 113.744 14.635 7.772 .000

height .727 .226 .463 3.223 .003

a. Dependent Variable: weight

Multiple regression analysis:

Adjusted R Std. Error of the

1 .513a .263 .223 6.81000

a. Predictors: (Constant), height, age

Model Sum of Squares df Mean Square F Sig.

1 Regression 611.775 2 305.888 6.596 .004b

Residual 1715.915 37 46.376

a. Dependent Variable: weight

Model B Std. Error Beta t Sig.

1 (Constant) 115.120 14.397 7.996 .000

age .320 .206 .253 1.555 .129

height .530 .255 .338 2.076 .045

You might also like