Path Analysis
Path Analysis
Path Analysis
What is Path Analysis?
Path analysis is a statistical technique used to assess hypothesized patterns of causal relationships among
a set of variables. It has some features in common with Multiple Linear Regression (the causal relationships
among variables, coefficient interpretations etc.), with Confirmatory Factor Analysis (both are confirmatory
techniques) and with Structural Equation Modeling (both the models are expressed as a series of equations).
It is considered a special case of Structural Equation Modeling since only observable variables are used in
Path Analysis.
As already mentioned, Path analysis is mainly used to test previous theoretical causal patterns thus assessing
the overall goodness of fit of the model is key. This is done testing the following hypotheses:
The test statistic follows a Chi squared distribution, of course we don’t want to reject the null hypothesis.
Moreover, Path Analysis can be used to assess the statical significance of each parameter that is included in
the model following a logic very similar to the one of linear regression models.
Michele Russo 20X4004
Implementation in R
General overview of the method
Path analysis can be implemented in R using the lavaan package. The syntax for this package is very similar
to the linear regression’s one; the only difference is that all the relationships among the variables (both
exogenous and endogenous) are coded into one unique model. The equations that describe model are put in
quotation marks ‘’, the causal relationship is coded using the symbol ~ and the correlation is coded with ~~.
Unless otherwise specified, correlation among all the exogenous variables is assumed.
To estimate the model the sem function is used. This command returns basically four results: the overall
goodness of fit for the model, the estimates for the regression’s parameters, the estimates for variance
(which is the error term) for endogenous variables and the estimates of the covariances among exogenous
variables (correlation if variables are standardized). There are three important arguments that can be added to
enrich the analysis which are rsquare, fit.measures and standardized. If those are set = TRUE they add the
R2 for the fitted models, some additional goodness of fit measures and the standardized coefficients for the
regressions (this is very helpful when variables are measured in different units). Finally, a nice way to depict
the path diagram is using the semPaths function, available in the semPlot package.
Example
The dataset consists of 30 observations over 7 variables, respondents (employees of a big financial company)
were asked to express their agreement regarding satisfaction on the workplace. The variables considered are
measured on a scale from 0 to 100 and can be summarized as follows.
rating Overall rating
complaints Handling of employee complaints
privileges Does not allow special privileges
learning Opportunity to learn
raises Raises based on performance
critical Too critical
advancel Advancement
As already mentioned, Path Analysis is a confirmatory technique thus it is possible to use this instrument to
check whether theorical frameworks match the data. Suppose that we want to understand what the underlying
relationships among the above ratings are to ask our HR department to draft specific policies to improve the
workplace’s environment. From past
Figure 1 experience/psychological research or
similar experiments we suspect that
advance, raises and rating are endogenous
variables and complaints, critical,
privileges and learning are exogenous
variables. Moreover, we believe that
advance and raises are solely determined
by critical and learning and that the overall
rating in impacted by all the other variables
(please note that critical and advance
impact rating both directly trough p16 and
p 14 and indirectly through p76 p 17 ;
p56 p15; p74 p17 and p54 p15.
Michele Russo 20X4004
Finally, it is assumed that there is correlation among all the exogenous variables (double-headed arrows). The
overall representation of the relationships we want to assess is depicted in Figure 1.
We then fit the model using the lavaan package to check whether our initial assumptions were correct and
also to understand what the magnitudes of each variable on the overall rating (like in regression analysis) are.
Following it is reported the Chi-squared statistic and the associated p-value for the goodness of fit of the
model.
Assuming alpha = 1% we cannot reject the null hypothesis thus we assume that the model fits well the data.
However, what we found before should suggest to carefully handle the data (Fixing alpha = 5%, we must
reject the null hypothesis the model does not fit the data well). Using the rsquare = TRUE command it is
possible to see that the overall model (rating as function of all the other variables) shows an R2 = 0.719, which
is quite good; whereas the models for advance and raises have R2 respectively equal to 0.332 and 0.503.
Further inspection in the results shows that – assuming alpha = 5% - the only statistically significant variables
in the model are:
- For Advance ~ learning + critical only learning is statistically significant
- For Raise ~ learning + critical both the exogenous variables are statistically significant
- For rating ~ learning + critical + privileges + raises + complaints + advance only complaint
and learning are statistically significant
References
Pedhazur, Elazar J. Multiple Regression in Behavioral Research. Explanation and prediction. Third Edition.
Chapter 18
https://advstats.psychstat.org/book/path/index.php
https://core.ecu.edu/wuenschk/MV/SEM/Path.pdf
https://www.publichealth.columbia.edu/research/population-health-methods/path-analysis
https://youtu.be/ezT7VgPZJdk
Dataset
https://vincentarelbundock.github.io/Rdatasets/doc/datasets/attitude.html