0% found this document useful (0 votes)
559 views5 pages

What Is Wilks Lambda

Wilks' lambda is a measure used in discriminant analysis to assess how well a function separates cases into groups. Smaller values indicate greater discriminatory ability. It is calculated based on the ratio of error and total sums of squares and cross products matrices. In discriminant analysis, Wilk's lambda tests how well each independent variable contributes to the model, with smaller values indicating better discrimination between groups. A stepwise procedure is used to determine which variables significantly improve discrimination and should be included in the model.

Uploaded by

titi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
559 views5 pages

What Is Wilks Lambda

Wilks' lambda is a measure used in discriminant analysis to assess how well a function separates cases into groups. Smaller values indicate greater discriminatory ability. It is calculated based on the ratio of error and total sums of squares and cross products matrices. In discriminant analysis, Wilk's lambda tests how well each independent variable contributes to the model, with smaller values indicating better discrimination between groups. A stepwise procedure is used to determine which variables significantly improve discrimination and should be included in the model.

Uploaded by

titi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

What is Wilks' lambda? How it is computed?

What is its role in a discriminant


analysis?
What is Wilks’ Lambda?
Wilks' lambda is a measure of how well each function separates cases into groups. It
is equal to the proportion of the total variance in the discriminant scores not
explained by differences among the groups. Smaller values of Wilks' lambda indicate
greater discriminatory ability of the function.
Given by the formulae.

Here, the determinant of the error sums of squares and cross products matrix E is
divided by the determinant of the total sum of squares and cross products matrix T =
H + E. If H is large relative to E, then |H + E| will be large relative to |E|. Thus, we will
reject the null hypothesis if Wilks lambda is small (close to zero).
Wilks’ lambda assesses the differences between two or more groups on multiple
variables at once. It is the multivariate version of the F-test statistic in one-way
ANOVA, which examines the differences between multiple groups on one variable.
In other words, Wilks’ lambda tests for differences between groups on a vector of
variables. Before introducing the method formally, it is important to understand the
assumptions behind it.
Assumptions behind the Method
All statistical tests rely on some underlying assumptions. If your data violate any of
the assumptions, you might still be able to perform the test, but the test results may
not have desirable statistical properties (unbiasedness, etc.) and so should be taken
with caution. Moreover, understanding the assumptions of a statistical will help you
improve your research-design and data-collection efforts. There
Are six critical assumptions underlying the use of Wilks’ lambda:
• The dependent variables in question are measured on an ordinal or
Continuous scale. A common example of ordinal variables is the Likert scale, e.g., a
5-point scale from “strongly disagree” to “strongly agree.” Continuous variables are
quite common, such as body weight, exam scores, blood pressure, heart rate, etc.
• The subjects are independent of each other. This relates to the sampling design of
the data collection process and hence cannot be directly evaluated with the data.
• Relationships between the continuous variables of interest, if they exist, are linear.
This can be visually inspected by plotting every pair of the variables with a
scatterplot.
• Variances and covariance’s of the continuous dependent variables are equal
across groups.
• There are no outliers in the sample. This can be visually inspected using boxplots
or identified quantitatively using Mahalanobis distance.
• The dependent variables are normally distributed within groups. In practice, it is
common that this condition is not met, but Wilks’ lambda is fairly robust to moderate
violations of the normality assumption.
Calculating Wilks’ Lambda
Wilks’ lambda assesses the proportion of variance in the dependent variables that is
not accounted for by the intergroup variations. If this proportion is small, then it
suggests that the dependent variables vary a lot across groups and the groups might
have different mean values for the dependent variables.
To calculate Wilks’ lambda, we first need to introduce a few notations. Suppose that
there are n subjects in total, divided into m groups with nj subjects in-group

j, and P dependent variables. Denote the value of variable p for subject i in group
j as xijp, and let Xij be the column vector(xij1, ⋯ , xijP)T. Wilks’ Lambda can be
calculated as the following:
• Calculate the intergroup cross product matrix H (the variations across
Groups):
Wilks’ lambda is small when |H| is larger than |E| or the intergroup variation is larger
than the residual variation. The distribution of Wilks’ lambda is complicated, but it
can be approximated by an F-distribution. One tends to reject the null hypothesis of
no difference if the resulting Wilks’ lambda is small with a significant p-value.

What is its role in a discriminant analysis?


Introduction
In discriminant analysis, Wilk’s lambda tests how well each level of independent
variable contributes to the model. The scale ranges from 0 to 1, where 0 means total
discrimination, and 1 means no discrimination. Each independent variable is tested
by putting it into the model and then taking it out — generating a Λ statistic. The
significance of the change in Λ is measured with an F-test; if the F-value is greater
than the critical value, the variable is kept in the model. This stepwise procedure is
usually performed using software like Minitab, R, or SPSS. The following SPSS
output shows which variables (from a list of a dozen or more) were kept in using this
procedure
Discriminant function analysis is used to determine which continuous variables
discriminate between two or more naturally occurring groups. For example, a
researcher may want to investigate which variables discriminate between fruits eaten
by (1) primates, (2) birds, or (3) squirrels. For that purpose, the researcher could
collect data on numerous fruit characteristics of those species eaten by each of the
animal groups. Most fruits will naturally fall into one of the three categories.
Discriminant analysis could then be used to determine which variables are the best
predictors of whether a fruit will be eaten by birds, primates, or squirrels.
Logistic regression answers the same questions as discriminant analysis. It is often
preferred to discriminate analysis, as it is more flexible in its assumptions and types
of data that can be analyzed. Logistic regression can handle both categorical and
continuous variables, and the predictors do not have to be normally distributed,
linearly related, or of equal variance within each group (Tabachnick and Fidell 1996).
Discriminant function analysis is multivariate analysis of variance (MANOVA)
reversed. In MANOVA, the independent variables are the groups and the dependent
variables are the predictors. In DA, the independent variables are the predictors and
the dependent variables are the groups. As previously mentioned, DA is usually
used to predict membership in naturally occurring groups. It answers the question:
can a combination of variables be used to predict group membership? Usually,
several variables are included in a study to see which ones contribute to the
discrimination between groups.
Discriminant function analysis is broken into a 2-step process:
(1) Testing significance of a set of discriminant functions, and;
(2) Classification.
The first step is computationally identical to MANOVA. There is a matrix of total
variances and covariances; likewise, there is a matrix of pooled within-group
variances and covariances. The two matrices are compared via multivariate F tests
in order to determine whether or not there are any significant differences (with regard
to all variables) between groups. One first performs the multivariate test, and, if
statistically significant, proceeds to see which of the variables have significantly
different means across the groups. Once group means are found to be statistically
significant, classification of variables is undertaken. DA automatically determines
some optimal combination of variables so that the first function provides the most
overall discrimination between groups, the second provides second most, and so on.
Moreover, the functions will be independent or orthogonal, that is, their contributions
to the discrimination between groups will not overlap. The first function picks up the
most variation; the second function picks up the greatest part of the unexplained
variation, etc... Computationally, a canonical correlation analysis is performed that
will determine the successive functions and canonical roots. Classification is then
possible from the canonical functions. Subjects are classified in the groups in which
they had the highest classification scores. The maximum number of discriminant
functions will be equal to the degrees of freedom, or the number of
variables in the analysis, whichever is smaller.

You might also like