0% found this document useful (0 votes)
71 views4 pages

Path Analysis

Path analysis is a statistical technique used to test hypothesized causal relationships between variables. It is a special case of structural equation modeling that uses only observable variables. Path analysis involves classifying variables as exogenous or endogenous, depicting relationships in a path diagram, and estimating direct and indirect effects between variables. The technique can be implemented in R using the lavaan package to estimate path models and test how well a hypothesized model fits the data.

Uploaded by

Michele Russo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views4 pages

Path Analysis

Path analysis is a statistical technique used to test hypothesized causal relationships between variables. It is a special case of structural equation modeling that uses only observable variables. Path analysis involves classifying variables as exogenous or endogenous, depicting relationships in a path diagram, and estimating direct and indirect effects between variables. The technique can be implemented in R using the lavaan package to estimate path models and test how well a hypothesized model fits the data.

Uploaded by

Michele Russo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Michele Russo 20X4004

Path Analysis
What is Path Analysis?
Path analysis is a statistical technique used to assess hypothesized patterns of causal relationships among
a set of variables. It has some features in common with Multiple Linear Regression (the causal relationships
among variables, coefficient interpretations etc.), with Confirmatory Factor Analysis (both are confirmatory
techniques) and with Structural Equation Modeling (both the models are expressed as a series of equations).
It is considered a special case of Structural Equation Modeling since only observable variables are used in
Path Analysis.

When should be used?


Path Analysis is a confirmatory data analysis technique. It should be used to assess and test hypothesized
causal models that could be derived either from researchers’ intuition or from theoretical frameworks.

When should not be used?


Path Analysis is not an explanatory data analysis technique, so it is not suitable to discover relationships
among variables like Principal Component Analysis or Factor Analysis.

How does it work?


In Path modeling all the variables are classified either as exogenous variables or endogenous variables. The
former are always assumed to be independent variables whereas the latter could be either dependent or
independent. More specifically, exogenous variables are assumed not to have any causal relationship with any
other variables in the model; on the contrary, endogenous variables are “partially explained” by other variables
inside the model. Usually for endogenous variables it is assumed that their variance is not completely
explained by the model hence an error term is added, one per each endogenous variable. Finally, correlation
among the exogenous variables is also assumed.
Given the complexity of this technique (the number of relationships to be estimated can grow very fast), the
hypothesized model is usually depicted using the so-called path diagram, which basically is a graphical
representation of the relationships that we want to test. As a general convention, latent variables
(unobservable, directly unmeasurable) are represented as ovals, whereas rectangles describe observable
variables. In the case of path Analysis only rectangles are used because there are no unobservable factors. In
addition, there are essentially two ways of connecting variables: a single arrow exiting from one variable and
entering in another implies a relationship of causation, on the other hand, a double-headed arrow implies
correlation. Moreover, Path analysis allows only one-way causal relationships (no feedback loops). Finally,
all the assumptions for multiple regression analysis must hold (linearity in parameters, no autocorrelation
of errors, homoskedasticity etc.)
Path Analysis is also used to assess the direct and indirect effect (thus mediated from other variables) of
exogenous/endogenous variables on endogenous ones.
Usually, Path models have a lot of parameters to be estimated, this is usually done using the Maximum
Likelihood Estimation method but also other alternative techniques could be applied.

As already mentioned, Path analysis is mainly used to test previous theoretical causal patterns thus assessing
the overall goodness of fit of the model is key. This is done testing the following hypotheses:

𝐻0 : 𝑇ℎ𝑒 𝑚𝑜𝑑𝑒𝑙 𝑓𝑖𝑡𝑠 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑤𝑒𝑙𝑙


{
𝐻1 : 𝑇ℎ𝑒 𝑚𝑜𝑑𝑒𝑙 𝑑𝑜𝑒𝑠𝑛′𝑡 𝑓𝑖𝑡 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑤𝑒𝑙𝑙

The test statistic follows a Chi squared distribution, of course we don’t want to reject the null hypothesis.
Moreover, Path Analysis can be used to assess the statical significance of each parameter that is included in
the model following a logic very similar to the one of linear regression models.
Michele Russo 20X4004

Implementation in R
General overview of the method

Path analysis can be implemented in R using the lavaan package. The syntax for this package is very similar
to the linear regression’s one; the only difference is that all the relationships among the variables (both
exogenous and endogenous) are coded into one unique model. The equations that describe model are put in
quotation marks ‘’, the causal relationship is coded using the symbol ~ and the correlation is coded with ~~.
Unless otherwise specified, correlation among all the exogenous variables is assumed.
To estimate the model the sem function is used. This command returns basically four results: the overall
goodness of fit for the model, the estimates for the regression’s parameters, the estimates for variance
(which is the error term) for endogenous variables and the estimates of the covariances among exogenous
variables (correlation if variables are standardized). There are three important arguments that can be added to
enrich the analysis which are rsquare, fit.measures and standardized. If those are set = TRUE they add the
R2 for the fitted models, some additional goodness of fit measures and the standardized coefficients for the
regressions (this is very helpful when variables are measured in different units). Finally, a nice way to depict
the path diagram is using the semPaths function, available in the semPlot package.

Example

The dataset consists of 30 observations over 7 variables, respondents (employees of a big financial company)
were asked to express their agreement regarding satisfaction on the workplace. The variables considered are
measured on a scale from 0 to 100 and can be summarized as follows.
rating Overall rating
complaints Handling of employee complaints
privileges Does not allow special privileges
learning Opportunity to learn
raises Raises based on performance
critical Too critical
advancel Advancement
As already mentioned, Path Analysis is a confirmatory technique thus it is possible to use this instrument to
check whether theorical frameworks match the data. Suppose that we want to understand what the underlying
relationships among the above ratings are to ask our HR department to draft specific policies to improve the
workplace’s environment. From past
Figure 1 experience/psychological research or
similar experiments we suspect that
advance, raises and rating are endogenous
variables and complaints, critical,
privileges and learning are exogenous
variables. Moreover, we believe that
advance and raises are solely determined
by critical and learning and that the overall
rating in impacted by all the other variables
(please note that critical and advance
impact rating both directly trough p16 and
p 14 and indirectly through p76 p 17 ;
p56 p15; p74 p17 and p54 p15.
Michele Russo 20X4004

Finally, it is assumed that there is correlation among all the exogenous variables (double-headed arrows). The
overall representation of the relationships we want to assess is depicted in Figure 1.
We then fit the model using the lavaan package to check whether our initial assumptions were correct and
also to understand what the magnitudes of each variable on the overall rating (like in regression analysis) are.
Following it is reported the Chi-squared statistic and the associated p-value for the goodness of fit of the
model.

Assuming alpha = 1% we cannot reject the null hypothesis thus we assume that the model fits well the data.
However, what we found before should suggest to carefully handle the data (Fixing alpha = 5%, we must
reject the null hypothesis the model does not fit the data well). Using the rsquare = TRUE command it is
possible to see that the overall model (rating as function of all the other variables) shows an R2 = 0.719, which
is quite good; whereas the models for advance and raises have R2 respectively equal to 0.332 and 0.503.
Further inspection in the results shows that – assuming alpha = 5% - the only statistically significant variables
in the model are:
- For Advance ~ learning + critical only learning is statistically significant
- For Raise ~ learning + critical both the exogenous variables are statistically significant
- For rating ~ learning + critical + privileges + raises + complaints + advance only complaint
and learning are statistically significant

Figure 2 Figure 2 reports the Path diagram, which


shows graphically the estimated model
and the coefficients (unstandardized
solutions and correlation among all the
exogenous variables are assumed)

Finally, the fit.measures = TRUE


command has been run in order to
visualize some additional measures of
goodness of fit for the above model. Two
informative measures are the Tucker-
Lewis Index (TLI) and the Comparative
Fit Index (CFI), we want both to be quite
high (very close to 1, 0.9 is assumed as a
good value). In this model the CFI is equal
to 0.866 and the TLI is 0.599. These results
are in line with what we have discovered
before: the model fits the data quite well,
but it is not so strong. A possible
explanation could be given by the number
of observations: just 30 answers could not
be sufficient to guarantee very strong and reliable estimates. Adding further observations would be a good
action to check whether this is really a model that correctly fits the data or not. In conclusion - if we assume
the above model to be reliable - I recommend the company to act mainly on the complaints and learning
variables: for instance, offering extra learning opportunities (MBA programs) and better managing
employees’ complaints.
Michele Russo 20X4004

References

Pedhazur, Elazar J. Multiple Regression in Behavioral Research. Explanation and prediction. Third Edition.
Chapter 18

https://advstats.psychstat.org/book/path/index.php

https://core.ecu.edu/wuenschk/MV/SEM/Path.pdf

https://www.publichealth.columbia.edu/research/population-health-methods/path-analysis

https://youtu.be/ezT7VgPZJdk

Dataset

https://vincentarelbundock.github.io/Rdatasets/doc/datasets/attitude.html

You might also like