5th Module SDS
5th Module SDS
2)Discriminant Analysis:
1) Discriminant analysis primarily works by classifying the observations
into multiple groups
2) A linear discriminant function is built to classify the observations.
3) It provides us the ability to understand which variables has more
impact on the discriminant function.
5)CLUSTER ANALYSIS:
Cluster analysis helps us to reduce a very large data set into small
groups or individual data elements.
This is done based on similarity and other characteristics.
Very popular in market segmentation analysis.
ADVANTAGES OF MULTIVARIATE ANALYSIS:
There are numerous advantages but here are the highlights
1) Arriving at accurate data-driven conclusions and insights.
2) Checking for Data anomalies and consistency (Mean unexpected or
irregular pattern inconsistence or errors within a dataset.
3) Feature engineering. (Process of creating or selecting relevant and
informative features from raw data to improve performance of the
machine.
4) Data cleaning (preprocessing)
5) Handling under fitting and over fitting.
MANOVA:
MANOVA or Multivariate Analysis of variance is statistical technique
used to analyze the relationships between multiple dependent
variables and one or more independent variables. It extends the
analysis of variance (ANOVA) to cases where there are two or more
dependent varibles.
In a MANOVA model the dependent variables are typically
continuous variables and the independent variables are categorical(or
grouping)variables.The main objective of MANOVA is to determine
whether there are significant differences between groups on the
combination of dependent variables. It allows researchers to
investigate whether the groups differ not only on individual dependent
variable but also on their joint relationship.
The MONOVA model assumes that the dependent variables are
multivariate Normally distributed within each group and have equal
covariance matrices across groups. It also assumes that there is a linear
relationship between dependent variables and independent variables.
The analysis begins with testing of hypothesis.
Procedure for conducting MANOVA
Set up the Hypotheses:
Step4: Assumptions:
i)Independence: observations within each group and between groups
should be independent.
ii)Multivariate normality: The dependent variables should be normally
Lawley
iii) Homogeneity of variance –Covariance matrices: The variance-
Covariance matrices of the dependent variables should be equal across
groups.
iv)Homoginity of regression: The relationship between the
independent variables and the dependent variables should be linear
and homogenous across groups.
v)Select appropriate test statistic: Wilks Lambda (Ʌ) is the most
common test statistic used in MANOVA. It measures the properties of
variance in the dependent variables that is not accounted for by the
group differences.
Other test statistic like Pillai’s trace,Hotelling Lawlley trace and Roy’s
largest root can also be used depending on the specific research
question and assumptions.