0% found this document useful (0 votes)

45 views7 pages

Factorial Analysis of Variance

The document discusses various statistical analyses including one-way and two-way ANOVA, MANOVA, and regression analysis. One-way ANOVA is used to compare means across one independent categorical variable. Two-way ANOVA allows comparison of means across two independent categorical variables and their interaction. MANOVA extends ANOVA to examine differences in means across multiple dependent variables. Regression analysis is used to model relationships between independent interval variables and a dependent interval variable and determine significance of individual predictors.

Uploaded by

Hisham Elhadidi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views7 pages

Factorial Analysis of Variance

Uploaded by

Hisham Elhadidi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Factorial Analysis of Variance

One way Analysis of Variance (ANOVA)

Example:
It is assumed that the credit_card_debt differs according to Job_categories.
Test this assumption using 95% confidence interval.

Independent Variable (IV): Job_Categories Var_Type: Categorical Called: Factor

Dependent Variable (DV): Credit_card_debt Var_Type: Interval

SPSS:
Choose From menu [Analyze]  Compare MeansOne way ANOVA
Steps:
1. Add the IV & DV
2. Choose {Options}Select (Descriptive)Press {Continue}
3. Choose {Post-hoc}Select (LSD) Press {Continue}
4. Press {Ok}

Two-way Analysis of Variance (ANOVA)

Example:
It is assumed that Credit_Card_debt differs according to Gender & Age & the interaction between them.
Test this assumption using 95% Confidence interval.

Independent Variable (IV): Gender Var_Type: Categorical Called:Factor1

Independent Variable (IV): Age_cat Var_Type: Categorical Called:Factor2

Dependent Variable (DV): Credit_card_debt Var_Type: Interval

SPSS:
Choose From menu [Analyze]  General Linear ModelUnivariate
Steps:
1. Add the Credit_card_debt to the Dependent Variable (DV)
2. Add the Age_Cat & the Gender to the Fixed Factor(s) List (IV)
3. Choose {Options} Select (Descriptive)Press {OK}
4. Choose {Post-Hoc}Add the factors (In this example; we add only the Age_cat because the Gender is
only 2 items) from the left pan to the (post Hoc test for:) panSelect (LSD)Press {Continue}
5. Choose {Plots} Select the factor with the more categories to the (horizontal axis) pan , and the Factor
with the lowest categories to the (separate lines) panClick ADD Press {Continue}
6. Press {Ok}

MANOVA : Multivariate analysis of variance

MANOVA is a technique which determines the effects of independent categorical variables on multiple
continuous dependent variables. It is usually used to compare several groups with respect to multiple continuous
variables.

Example:
It is being said that Start_Salary & The Current_Salary differs according to Gender & Minority & the interaction
between them.
Test this assumption using 95% Confidence interval.
Independent Variable (IV): Gender Var_Type: Categorical Called:Factor1
Independent Variable (IV): Minority Var_Type: Categorical Called:Factor2

Dependent Variable (DV): Beginning_Salary Var_Type: Interval

Dependent Variable (DV): Current_Salary Var_Type: Interval

SPSS:
Choose DB: Employee Data
Choose From menu [Analyze]  General Linear ModelMultivariate
Steps:
1. Add the Beginning_Salary to the Dependent Variable (DV)
2. Add the Current_Salary to the Dependent Variable (DV)

3. Add the Gender & the Minority to the Fixed Factor(s) List (IV)

4. Choose {Options} Select (Descriptive)Select (Residual SSCP Matrix)Press {Continue}

5. Choose {Plot} Choose Minority for Horizontal Axis pan, and Choose Gender for the Gender for the
Separate line Pan Click Add Press {Continue}
6. Press {Ok}

Check the Residual SSCP Matrix:

To know the correlation between the 2 IVs

Check the Multi-variant tests Table:

Check the significance at the Wilk’s lambda lines

Check the Tests of Between-Subjects Effects Table:

We look there to know which independent variable is affecting the group of DVs

What is Wilks’ Lambda?

Wilks’ lambda (Λ) is a test statistic that’s reported in results from MANOVA , discriminant analysis, and other
multivariate procedures. Other similar test statistics include Pillai’s trace criterion and Roy’s ger criterion.
 In MANOVA, Λ tests if there are differences between group means for a particular combination
of dependent variables. It is similar to the F-test statistic in ANOVA. Lambda is a measure of the
percent variance in dependent variables not explained by differences in levels of the independent
variable. A value of zero means that there isn’t any variance not explained by the independent
variable (which is ideal). In other words, the closer to zero the statistic is, the more the variable in
question contributes to the model. You would reject the null hypothesis when Wilk’s lambda is
close to zero, although this should be done in combination with a small p-value.
 In discriminant analysis, Wilk’s lambda tests how well each level of independent variable contributes
to the model. The scale ranges from 0 to 1, where 0 means total discrimination, and 1 means no
discrimination. Each independent variable is tested by putting it into the model and then taking it out
— generating a Λ statistic. The significance of the change in Λ is measured with an F-test; if the F-
value is greater than the critical value, the variable is kept in the model. This stepwise procedure is
usually performed using software like Minitab, R, or SPSS. The following SPSS output shows which
variables (from a list of a dozen or more) were kept in using this procedure.
 Source: https://www.statisticshowto.com/wilks-lambda/

Why Using the ANCOVA & MNCOVA?

https://youtu.be/p7fU02WRQ7Y
1- To increase the sensitivity of the test for main effects and interactions by reducing the error term
 The error term is adjusted for, and hopefully reduced by, the relationship between the DV and the
CV(s).
 CVs are used to assess the “noise” where noise is the undesirable variance in the DV that is
estimated by scores on the CV.
2- To adjust the means of the levels of DV itself to what they would be if all subjects scored equally on the
CV(s).
 Differences between subjects on CV(s) are removed so that presumably, the only difference that
remain are related to the effects of the grouping IV(s)
 The CV(s) enhance prediction of the DV, but there is not implication of causality.
After adjusting the means by the ANCOVA:

Extra Sources:
Link: https://www.ibm.com/support/pages/corrected-model-sums-squares-unianova-and-glm-multivariate

Problem
What is the meaning of the 'Corrected Model' term in the 'Tests of Between-Subjects Effects' Table in output for
SPSS UNIANOVA or GLM Multivariate?

Resolving The Problem

Before defining the corrected model, it may be helpful to provide some context for the term "Corrected" by noting
that your 'Between-Subjects Effects' table should have rows for each of "Total" and "Corrected Total". The Total
Sums of Squares are sums of squares around 0, i.e. if you simply squared each value of the dependent variable and
summed these squares, you would get the Total SS. The corrected Sums of Squares are the sums of squares around
the grand mean of the dependent variable. If you subtracted the grand mean from each observation of the dependent
variable and squared that deviation, the sum of these squared deviations would be the Corrected Total SS. It is
'corrected' for the grand mean.
The Corrected model SS are sums of squares that can be attributed to the set of all the between-subjects effects,
excluding the intercept, i.e. all the fixed and random factors and covariates and their interactions that are listed on
the Between-Subjects table. These are sums of squares around the grand mean as well, and therefore the 'corrected
model' SS. The F-test for the corrected model is a test of whether the model as a whole accounts for any variance in
the dependent variable.

Regression Analysis

Example: Study the relationship between Beginning salary & Months since hiring And
between Current Salary

Independent Variable (IV): Beginning Salary Var_Type: Interval

Independent Variable (IV): Month since hired Var_Type: Interval

Dependent Variable (DV): Current Salary Var_type: Interval

Step 1: State the hypotheses: H0: βi = 0; H1: βi ≠ 0

Step 2: Decide on a level of significance: α = 0.05
Step 3: Select the appropriate test statistic:
The test statistic is the t distribution with n-(k+1) degrees of freedom.
Step 4: Formulate a decision rule:
Reject H0 if t > t/2,n-(k+1) or t < -t/2,n-(k+1)
Reject H0 if t > 2.120 or t < -2.120

A general rule is: if the correlation between two independent variables is between -0.70 and 0.70,
multicollinearity between the two variables is most likely not a problem.
If the VIF for an independent variable is more than 10, multicollinearity is likely and the independent
variable should be removed from the analysis.

SPSS:
Choose DB: Employee Data
Choose From menu [Analyze]  Regression Linear
Steps:

1. Add the Current_Salary to the Dependent Variable section

2. Add the Beginning Salary & the Month Since Hired to the IV Section

3. Choose {Statistics}  Check (Collinearity Diagnostics) & (Burbin Watson) Press {Continue}
4. Press {Ok}

Check Model Summary Table: Check R2

Check the Durbin-Watson (1.5 to 2.5)
Check ANOVA Table: Check the Significance of the regression line (<0.005)
Check the Coefficient Table: Check the IV 1 significance in the coefficient table (<0.005)
Check the IV 2 significance in the coefficient table (<0.005)
Check the VIF for Both IVs (Must be less than 5)—indicate multicollinearity
Write the regression equation using the Unstandardized B from the Coefficients Table
Current_Salary = 1.914 Beginning_Salary + 172.297 Months_Since_hire – 12120.813

Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -12120.813 3082.981 -3.932 <.001
Beginning Salary 1.914 .046 .882 41.271 <.001
Months since Hire 172.297 36.276 .102 4.750 <.001
a. Dependent Variable: Current Salary
If you want ot check the relative importance of the 2 IVs , check the Standardized Beta

Extra Resource:
What is The Durbin Watson Test?
The Durbin Watson Test is a measure of autocorrelation (also called serial correlation)
in residuals from regression analysis. Autocorrelation is the similarity of a time series over successive time
intervals. It can lead to underestimates of the standard error and can cause you to think predictors
are significant when they are not.
The Durbin Watson test reports a test statistic, with a value from 0 to 4, where:

 2 is no autocorrelation.
 0 to <2 is positive autocorrelation (common in time series data).
 >2 to 4 is negative autocorrelation (less common in time series data).

A rule of thumb is that test statistic values in the range of 1.5 to 2.5 are relatively normal. Values outside of this
range could be cause for concern.

When an important ID variable was not presented to the regression model. The residuals of the scattered dots tend
to follow a pattern which would mean that the regression model needs revisiting to include/ search for the missing
IDV. (From Shady’s notes)

What is Serial Correlation / Autocorrelation?

Serial correlation (also called Autocorrelation) is where error terms in a time series transfer from one period to
another. In other words, the error for one time period a is correlated with the error for a subsequent time period b.
For example, an underestimate for one quarter’s profits can result in an underestimate of profits for subsequent
quarters.

Types of Autocorrelations:
The most common form of autocorrelation is first-order serial correlation, which can either be positive or
negative.
 Positive serial correlation is where a positive error in one period carries over into a positive error for
the following period.
 Negative serial correlation is where a negative error in one period carries over into a negative error
for the following period.
Second-order serial correlation is where an error affects data two time periods later. This can happen when your
data has seasonality. Orders higher than second-order do happen, but they are rare.
What is the difference between collinearity and interaction?
https://stats.stackexchange.com/questions/113733/what-is-the-difference-between-collinearity-and-interaction

An interaction may arise when considering the relationship among three or more variables, and describes a
situation in which the simultaneous influence of two variables on a third is not additive. Most commonly,
interactions are considered in the context of regression analyses.

The presence of interactions can have important implications for the interpretation of statistical models. If two
variables of interest interact, the relationship between each of the interacting variables and a third "dependent
variable" depends on the value of the other interacting variable. In practice, this makes it more difficult to predict
the consequences of changing the value of a variable, particularly if the variables it interacts with are hard to
measure or difficult to control.

Collinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model
are highly correlated, meaning that one can be linearly predicted from the others with a non-trivial degree of
accuracy. In this situation the coefficient estimates of the multiple regression may change erratically in response to
small changes in the model or the data. Collinearity does not reduce the predictive power or reliability of the model
as a whole, at least within the sample data themselves; it only affects calculations regarding individual predictors.
That is, a multiple regression model with correlated predictors can indicate how well the entire bundle of predictors
predicts the outcome variable, but it may not give valid results about any individual predictor, or about which
predictors are redundant with respect to others.