Correlation and Regression Analysis Using SPSS: December 2019
Correlation and Regression Analysis Using SPSS: December 2019
net/publication/343282545
CITATIONS READS
0 7,420
1 author:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Sarad Chandra Kafle on 29 July 2020.
Abstract
The objective of this study is to share knowledge on how to use Correlation and Regression Analysis
through Statistical Package for Social Science (SPSS). This study has used secondary data to demonstrate
the way of using very popular statistical tool on using correlation and regression analysis for novice
researchers. Among various statistical tools, correlation and regression analysis are mostly used tools in
many research works., e.g. the field of management, medicine, social science and education. However,
not all the researchers may know whether the tools are fit to use, how to carry the analysis and how
to interpret the obtained results. The results shows that novice researchers need to know the proper
knowledge and skill to analyse the quantitative data. The implications of this study is willing to share
the knowledge on correlation and regression analysis and the way of analyzing through very popular
software package SPSS.
Keyword: Statistical tools, Test of Significance, p-value, Hypothesis, Dependent and Independent
variables
1. Introduction
In quantitative study, researcher willing to use very famous statistical tool regression & correlation, however
due to lack of sufficient knowledge on regression & correlation analysis their desired havenot fulfilled or even
they use the tool, the tool haven’t been properly used. To provide clear cut idea on correlation regression, its
use way of interpretation of output of analysis, this research article has been prepared. Relation between two
or more variables can be studied by using Correlation and Regression. Two variables are said to be related if
change in the value of one variable changes the value of other variable. Here the term change implies either
increase or decrease in the value of variable. Relationship between variables can be studied by the method
of correlation or regression. Such an analysis of relationship can be carried for quantitative or qualitative
variable however this paper includes only the analysis of relationship between quantitative variables. Those
variables which are measurable and thus have unit are quantitative variables. Study of relationship between
two quantitative variables at a time is simple regression or simple correlation and relationship between more
than two quantitative variables may be partial correlation or multiple correlation or multiple regression
according to the objective/nature of study and variables included in the study (Sthapit, Yadav, Khanal, &
Dangol, 2017).
Strength of relationship between two or more variables is studied by using Correlation. Correlation is
statistical tool that measures how strong relationship exists between variables. Value of correlation lies in
between -1 and +1. Nearer the value of correlation to zero weak is the relationship between the variables,
similarly if the value of correlation close to one implies higher (close) relation between variables. Hence
correlation is a value which tries to explain degree of association between variables whereas regression tries
to explain the relationship between variables using a mathematical function. (Gupta & Kapoor, 2014).
OCEM Journal of
126 Management, Technology & Social Sciences
1.1 Correlation Analysis:
The correlation analysis refers the degree of relationship between variables. But it does not explain about
which of the variable is cause and which one is the effect. Study of correlation between two variables is
called simple and between more than two variables may be partial or multiple.
Correlation can be studied by two methods, diagrammatic method and mathematic method.
Diagrammatically it is studied with the help of scatter diagram which cannot provide exact value of
correlation in all case. Mathematically many methods and formulae are there however Karl Pearson’s
Method is widely used (Magnello, 2009).
6
14
12 5
10 4
8 3
y Y
6
2
4
2 1
0 0
0 5 10 0 5 10
x X
perfect negative correlation(r = -1) No correlation (r =0)
(Fig Source: (Shrestha, Khanal, & Kafle, 2014) )
OCEM Journal of
Management, Technology & Social Sciences 127
1.3 Karl Pearson’s correlation coefficient:
This is mathematical method to study the degree of association between two variables. It is used to study
the correlation between two quantitative variables and denoted by r. Formula to calculate Karl Pearson’s
correlation coefficient is as follow (Sthapit, Yadav, Khanal, & Dangol, 2017) -
cov (X, Y)
r=
xy
nXY - XY
or, r =
nX - (X)2 nY2 - (Y)2
2
1 1
6 d2 + (m13 - m ) + (m 3 - m ) + ..................
12 1 12 2 2
= 1- 3 ; when ranks are
n -n
repeated (Magnello, 2009)
OCEM Journal of
128 Management, Technology & Social Sciences
1.7 Partial correlation:
Correlation between two variables keeping the effect of remaining variable constant is partial correlation.
If we are interested to study the relationship between two variables X1 and X2 while there exists another
variable X3 then the correlation between X1 and X2 keeping the value of X3 constant is partial correlation
between X1 and X2 keeping X3 constant, denoted by r12.3. Value of partial correlation lies in between -1
and +1.
r12.3 =
√( ) ( )
1. 8 Multiple Correlation:
Correlation between predicted and the actual values of the dependent variable in a linear regression model
that includes an intercept. In other words it is the relationship between dependent variable and joint effect
of independent variable on dependent variable In statistics, the coefficient of multiple correlation is a
measure of how well a given variable can be predicted using a linear function of a set of other variables.
If X1 be dependent variable which is described by X2 and X3 then the correlation between actual value of
X1 predicted value of X1 is denoted by R1.23, in other way, it is the correlation between dependent variable
X1 and joint effect of X2 and X3 on X1 The value of multiple correlation lies in between 0 and 1.
R1.23 = √
OCEM Journal of
Management, Technology & Social Sciences 129
1.11 Multiple Regressions:
Let ‘y’ is the dependent variable and x1, x2, x3.....................................xk be the ‘k’ independent variables. Then the
multiple regression model is defined as
y0 1 x1 2 x 2 ............. k xk e
Where,
y = dependent variable and x1, x2, x3 …………… xk are independent variables.
0 = y-intercept.
1 = Slope of y with variable x1 holding the remaining variables x2, x3 …,xk constant or Regression
coefficient of y on x1 holding the remaining variables x2, x3 …………… xk constant. And so on.
(Dendukuri & Reinhold, 2005)
Some pre-requisities to carry linear regression model are
- There is linear relationship between quantitative dependent and independent variables
- There is no presence of autocorrelation of residuals.
- The mean of residuals is zero.
- There is equal variance of residuall or presence of homoscedasticity.
- The independent variables are uncorrelated with errors.
- There is absence of multicollinearity. (Zaid, 2015)
1.12 SPSS
SPSS refers to Statistical Package for Social Science. It is statistical software which eases to compile and
analyze data. We can compile or entry collected primary data or secondary as same as Microsoft Excel.
Its menu bar is helpful to analyze the data thus entered easily. Many statistical analysis can be carried
using SPSS (Arkkelin, 2014).
Many researchers have applied the correlation and regression analysis in their thesis, articles and their
documents, however; they are not yet confident for the appropriate use of correlation and regression
analysis and how to fit these statistical tools in their research works. In some cases, their interpretation
may mislead their research studies. Many novice researchers are willing to use correlation and regression
analysis but they don’t know how to use these tools during their data analysis. The primary objective of
this study is to share knowledge on regression and correlation analysis and required conditions to use in
their research paper.
Table 2 shows coefficient of determination ( R square) 0.728, which means 72.8% variation in dependent
variable ( Height) is explained by independent variable (Age).
Table 3. ANOVA
Model Sum of Squares df Mean Square F Sig.
Regression 66311.832 1 66311.832 1124.755 .000b
1 Residual 24820.758 421 58.957
Total 91132.590 422
a. Dependent Variable: Height of respondent in cm
b. Predictors: (Constant), Age of respondent in month
Table 3 tries to test overall goodness of fit of fitted regression model. From above table it can be concluded
that the fitted model is significant as P-value of F statistics is 0.00 and it is less than level of significance
level(α = 5% ).
Table 4. Coefficient table
Unstandardized Standardized
Model Coefficients Coefficients t Sig.
B Std. Error Beta
(Constant) 59.350 0.679 87.373 0.000
Age of respondent in month 0.811 0.024 0.853 33.537 0.000
a. Dependent Variable: Height of respondent in cm
Coefficient table helps to determine the regression equation, the column Unstandardized Coefficients and
its sub column ‘B’ provides the regression coefficients. First one is constant or y intercept and second one
is regression coefficient of height (Y) on age(X). Hence the regression equation using coefficient table is
Y = 59.35 + 0.811 X
The regression coefficient of height on age is found to be 0.811 which implies that any child which is
one month elder than other child is 0.811 centimeter taller than earlier. Also, the regression coefficient is
OCEM Journal of
Management, Technology & Social Sciences 131
significant as p-value (0.00) is less than level of significance level (α = 5 % ).
4. Discussion
The results show that using correlation and regression via SPSS is useful for the novice researchers. The
results also highlighted that the using correlation and regression is embedded only in quantitative data. In
practical life researcher can find many quantitative variables which are related to each other, their degree of
relationship can be measured by correlation and how two or more variables are related can be described by
an equation, e.g. an equation is regression equation. Manually, the calculation of regression equation and
correlation is very complex for big data ,so it requires software via SPSS which is very easy and faster.
The results also highlighted that correlation and regression are two key data analysis tools in quantitative
approach because Logistic Regression Model helps in predicting probability of occurrences of y dependent
variable to x independent variables, when the dependent variable is dichotomous. Researchers can use
dichotomous variables, e.g. health status(sick or not), employment status( employed or unemployed),
labour force participation (part or not part of the labour force) and family planning method (which type).
The results also summarized that Logistic Regression Analysis is more flexible method because it makes no
assumptions about the nature of relationship between independent and dependent variables. The limitations
of this study are the secondary data analysis, limited research materials, limited knowledge on statistical
tools, limited literature review, limited areas of research knowledge, limited knowledge on correlation and
regression analysis. Due to these limitations of this research, the current research cannot give the guarantee
for the radiality and validity of data and findings. It is recommended that future research has to focus on
rich literature review and primary research on how correlation and regression can be effectively use in data
analysis processes of quantitative methods. It is also recommended that a details steps of correlation and
regression analysis has to focus in future research study to make helpful for the novice researchers.
Reference
Arkkelin, D. (2014). Using Spss to Understand Research and Data Analysis. Valparaiso: Valparaiso
University.
Dendukuri, N., & Reinhold, C. (2005). Correlation and Regression. American journal of Roentgenology,
3-18.
Draper, N. R., & Smith, H. (2011). Applied Regression Analysis. Noida: Wiley India Pvt. Ltd.
Gujarati, D. N., C, P. D., & Gunasekar, S. (2015). Basic Econometrics. New Delhi: McGraw Hill
Education (india) Pvt. Ltd.
Gupta, S. C., & Kapoor, V. K. (2014). Fundamentals of Mathematical Statistics. Mumbai: Sultan Chand
and Sons.
Magnello, M. (2009). Karl Pearson and the Establishment of Mathematical Statistics. MInternational
Statistical Review / Revue Internationale De Statistique, 3-29.
Mehta, B. C., & Kapoor, K. (2005). Fundamentals of Econometrics. Mumbai: Himalaya Publishing
House.
Montgomery, D. (1982). Introduction to linear Regression Analysis. New Delhi: Willy.
Shrestha, M. P., Khanal, P. R., & Kafle, S. C. (2014). Business Statistics. Kathmandu: Sabdartha
Publication.
Sthapit, A. B., Yadav, R. P., Khanal, S. P., & Dangol, P. M. (2017). Fundamentals of Statistics. Kathmandu:
Asmita Publication.
Zaid Mohamed Ahmed, (2015). Correlation and Regression Analysis; Statistical Economic and Research
and Training Centre for islamic countries.
OCEM Journal of
132 Management, Technology & Social Sciences