MVDAUnit 5
MVDAUnit 5
In multiple linear regression, the objective is to model one quantitative variable (called the
dependent variable) as a linear combination of others variables (called the independent
variables). The purpose of discriminant analysis is to obtain a model to predict a single
qualitative variable from one or more independent variable(s). In most cases the dependent
variable consists of two groups or classifications, like, high versus normal blood pressure,
loan defaulting versus non defaulting, use versus non use of internet banking etc. The choice
between three candidates, A, B or C in an election is an example where the dependent
variable consists of more than two groups.
where, F is a latent variable formed by the linear combination of the dependent variable, X1,
X2 ,… Xp are the p independent variables, is the error term and 0, 1 , 2 ,…, p are the
discriminant coefficients.
The objective discriminant analysis is to test if the classifications of groups in a variable Y
depends on at least one of the Xi’s.
In terms of hypothesis, it can be written as:
Assumptions
Discriminant function
The number of functions computed is one less than the number of groups in the dependent
variable. That is, for two groups – one function, for three groups - two functions, and so on.
When there are two functions, the first function maximizes the differences between the
groups in the dependent variable. The second function is orthogonal to the first (uncorrelated
with it) and maximizes the differences between the groups in the dependent variable,
controlling for the first function. Though mathematically different, each discriminant function
is a dimension which differentiates a case into groups in the dependent variable based on its
values on the independent variables. In discriminant analysis, the first function will be the
most powerful in differentiation the dimensions and the subsequent functions may or may not
represent additional significant differentiation.
Discriminant Coefficient
The discriminant function coefficients are partial coefficients that reflect the unique
contribution of each variable to the classification of the groups in the dependent variable. A
discrminant score that belongs to a latent variable can be obtained for each case by applying
the coefficients to the values in the respective independent variables. The standardized
discriminant coefficients, like beta weights in regression, are used to assess the relative
classifying importance of the independent variables. Structure coefficients are the
correlations between a given independent variable and the discriminant scores. The higher the
value, the higher if the association between the independent variable and the dicriminant
function. Looking at all the structure coefficients for a function allows the researcher to
assign a label to the dimension it measures.
Group centroid
Group centroids are the mean discriminant scores for each group in the dependent variable
for each of the discriminant functions. For two groups in the dependent variable there is a
single dicriminant function. The centroids are in a unidimensional space, one center for each
group. For three groups in the dependent variable there are two dicriminant functions.
Hence, the centroids are in a two dimensional space. By connecting the centroids a canonical
plot can be created depicting a discriminant function space.
Eigenvalue
Eigenvalue, also called the characteristic roots, is a ratio between the explained and
unexplained variation in a model. For a good model the eigen value must be more than one.
In discriminant analysis there is one eigenvalue for each discriminant function. The bigger
the eigenvalue, the stronger is the discriminating power of the function. In an analysis with
three groups, the ratio between two eigenvalues indicates the relative discriminating power of
the one discriminant function over the other. For example, if the ratio of two eigenvalues is
1.6, the first discriminant function accounts for 60% more of the between-group variance for
the three groups in the dependent variable compared to the second discriminant function.
Relative percentage of a discriminant function is the function's eigenvalue divided by the sum
of all eigenvalues of all discriminant functions in the model. It represents the percent of
discriminating power for the model associated with a given discriminant function. Usually,
the relative percentage of the first functions will be high. If the values for the subsequent
functions are small, then a single function is as good as two or more function in the
classification.
Canonical correlation
The canonical correlation is a measure of the association between the groups in the dependent
variable and the discriminant function. A high value implies a high level of association
between the two and vice-versa.
Wilks's lambda
In discriminant analysis, the Wilk’s Lamba is used to test the significance of the discriminant
functions. Mathematically, it is one minus the explained variation and the value ranges from
0 to 1. Unlike the F-statistics in linear regression, when the value lambda for a function is
small, the function is significant.
Classification matrix
The classification matrix is a simple cross tabulation of the observed and predicted
memberships. For a good prediction, the values in the diagonal must be high and the values
off the diagonal must be close to 0.
Box's M
Like in other multivariate data analysis, the Box's M tests the assumption of equality of
variance-covariance matrices in the groups. A big Box's M indicated by a small p-value
indicates violation of this assumption. However, when the sample size is big, Box’s M is
usually large. In such situations, the natural logarithm of the variance-covariance matrices for
the groups are compared.
Sample size
As a rule, the sample size of the smallest group should exceed the number of independent
variables. Though the general agreement is that there should be at least 5 cases for each
independent variable, it is best to model with at least 20 cases for each independent variable.
Multivariate Analysis of Variance (MANOVA)
ANOVA allows you to assess the impact of one or more factors on a single dependent
variable at a time. In this case, you could examine how different tire models (the factor)
affect fuel efficiency (the dependent variable). On the other hand, with MANOVA, you can
simultaneously explore the effects on two or more dependent variables. Here, you could
analyze how different tire models (the factor) collectively influence multiple performance
indicators such as fuel efficiency and tire durability (the dependent variables).
MANOVA allows you to explore whether significant differences exist between groups across
a combination of dependent variables. By considering multiple dependent variables
simultaneously, MANOVA provides a more comprehensive understanding of group
differences and patterns. Conducting separate ANOVAs on multiple dependent variables can
increase the chance of false positives (Type I error). MANOVA manages this error rate while
analyzing the effect of independent variables on multiple dependent variables simultaneously.
MANOVA helps you address questions such as:
How do different wing configurations in aircraft designs impact factors such as structural
strength, weight, and aerodynamic efficiency?
Are there notable differences in certain car characteristics—such as fuel efficiency or safety
ratings—based on the country of manufacture?
Assumptions of MANOVA
In general, when conducting a MANOVA test, the following assumptions are made regarding
the input data:
Assumption of Normality: The data within each group follows a normal distribution.
Assumption of Independence: The observations within and between groups are mutually
independent.
These assumptions are important to ensure the validity, accuracy, and reliability of the
MANOVA analysis and results.
In multiple regression, the terms univariate and multivariate refer to the number of
independent variables, but for ANOVA and MANOVA the terminology applies to the use of
single or multiple dependent variables. The univariate techniques for analyzing group
differences are the t test (two groups) and analysis of variance (ANOVA) for two or more
groups. The multivariate equivalent procedures are the Hotelling T2 and multivariate
analysis of variance, respectively.
Example Problem
A researcher is studying the effects of diet type, exercise type, and age group on weight
loss and muscle gain. The independent variables are:
Data Collected
Participant Diet Type Exercise Type Age Group Weight Loss (kg) Muscle Gain (kg)
1 A X 1 3 1.5
2 A Y 2 4 2
3 A Z 3 5 1.8
4 B X 1 3.5 2
5 B Y 2 4.2 2.5
6 B Z 3 5.5 2.8
7 C X 1 4 1.9
8 C Y 2 4.5 2.1
9 C Z 3 6 2.7
Here, we have:
To begin, we compute the means for each combination of the independent variables.
Diet Type A
Exercise Type Age Group Weight Loss (kg) Muscle Gain (kg)
X 1 3 1.5
Y 2 4 2
Z 3 5 1.8
Diet Type B
Exercise Type Age Group Weight Loss (kg) Muscle Gain (kg)
X 1 3.5 2
Y 2 4.2 2.5
Z 3 5.5 2.8
Diet Type C
Exercise Type Age Group Weight Loss (kg) Muscle Gain (kg)
X 1 4 1.9
Y 2 4.5 2.1
Z 3 6 2.7
Step 2: Compute Grand Means Next, we compute the grand mean across all participants for
both dependent variables.
Now, we need to compute the within-group and between-group SSCP matrices. These
matrices capture the variability within and between the groups for both dependent variables.
Conjoint analysis is typically conducted via a specialized survey that asks consumers to rank
the importance of the specific features in question. Analyzing the results allows the firm to
then assign a value to each one.
Conjoint analysis can take various forms. Some of the most common include:
• Choice-Based Conjoint (CBC) Analysis: This is one of the most common forms of
conjoint analysis and is used to identify how a respondent values combinations of features.
• Adaptive Conjoint Analysis (ACA): This form of analysis customizes each respondent's
survey experience based on their answers to early questions. It’s often leveraged in studies
where several features or attributes are being evaluated to streamline the process and
extract the most valuable insights from each respondent.
• Full-Profile Conjoint Analysis: This form of analysis presents the respondent with a series
of full product descriptions and asks them to select the one they’d be most inclined to buy.
• MaxDiff Conjoint Analysis: This form of analysis presents multiple options to the
respondent, which they’re asked to organize on a scale of “best” to “worst” (or “most
likely to buy” to “least likely to buy”).
• he type of conjoint analysis a company uses is determined by the goals driving its
analysis (i.e., what does it hope to learn?) and, potentially, the type of product or
service being evaluated. It’s possible to combine multiple conjoint analysis types into
“hybrid models” to take advantage of the benefits of each.
The insights a company gleans from conjoint analysis of its product features can be leveraged
in several ways. Most often, conjoint analysis impacts pricing strategy, sales and marketing
efforts, and research and development plans.
Conjoint analysis can inform more than just a company’s pricing strategy; it can also inform
how it markets and sells its offerings. When a company knows which features its customers
value most, it can lean into them in its advertisements, marketing copy, and promotions.
On the other hand, a company may find that its customers aren’t uniform in assigning value
to different features. In such a case, conjoint analysis can be a powerful means of segmenting
customers based on their interests and how they value features—allowing for more targeted
communication.
For example, an online store selling chocolate may find through conjoint analysis that its
customers primarily value two features: Quality and the fact that a portion of each sale goes
toward funding environmental sustainability efforts. The company can then use that
information to send different messaging and appeal to each segment's specific value.
Conjoint Analysis in Research & Development
Conjoint analysis can also inform a company’s research and development pipeline. The
insights gleaned can help determine which new features are added to its products or services,
along with whether there’s enough market demand for an entirely new product.
For example, consider a smartphone manufacturer that conducts a conjoint analysis and
discovers its customers value larger screens over all other features. With this information, the
company might logically conclude that the best use of its product development budget and
resources would be to develop larger screens. If, however, future analyses reveal that
customer value has shifted to a different feature—for example, audio quality—the company
may use that information to pivot its product development plans.
Additionally, a company may use conjoint analysis to narrow down its product or service’s
features. Returning to the smartphone example: There’s only so much space within a
smartphone for components. How a phone manufacturer’s customers value different features
can inform which components make it into the end product—and which are cut.
One example is Apple’s 2016 decision to remove the headphone jack from the iPhone to free
up space for other components. It’s reasonable to assume this decision was reached after
analysis revealed that customers valued other features above a headphone jack.
Conjoint analysis has its roots in academic research from the 1960s and has been used
commercially since the 1970s. In 1964, two mathematicians, Duncan Luce and John
Tukey published a rather indigestible (by modern standards) article called ‘Simultaneous
conjoint measurement: A new type of fundamental measurement’. In abstract terms, they
sketched the idea of “measuring the intrinsic goodness of certain characteristics of objects by
measuring the goodness of an object as a whole”.
The article did not mention data collection, products, features, prices, or other elements that
we associate with conjoint analysis today, but it spurred academic interest in the topic and
perhaps gave rise to the name “conjoint”. It not only kick-started the topic but also set the
tone for future developments in the area. Over time, it has become technical to the point of
inaccessibility to most people, led by American academics with a strong emphasis on the
statistical workings of survey research.
Green and Srinivasin (1978) agree that the theory of conjoint measurement was developed in
Luce and Tukey’s paper but that “the first detailed, consumer-orientated” approach was
Green and Rao’s (1971) ‘Conjoint Measurement for Quantifying Judgmental Data’. In
1974, Professor Paul E. Green penned ‘On the Design of Choice Experiments Involving
Multifactor Alternatives’, cementing the impact of conjoint analysis in market research.
Over the next few decades, conjoint analysis became an increasingly popular method across
the globe with notable studies in the 1980s and 90s highlighting its growing adoption and
development during this time.
Conjoint surveys are continuously developing on a range of software platforms, through
which many different flavours of conjoint analysis can be enjoyed. Today, conjoint analysis
thrives as a widespread tool built on a robust methodology and is used by market researchers
daily as an indispensable tool for understanding consumer trade-offs.
Conjoint analysis is one of the most effective models for extracting consumer preferences
during the purchasing process. This data is then turned into a quantitative measurement using
statistical analysis. It evaluates products or services in a way no other method can.
Researchers consider conjoint analysis as the best survey method for determining customer
values. It consists of creating, distributing, and analyzing surveys among customers to model
their purchasing decision based on response analysis.
https://www.questionpro.com/blog/what-is-conjoint-analysis/
https://sawtoothsoftware.com/conjoint-analysis
https://blog.hubspot.com/marketing/conjoint-analysis
https://learningloop.io/plays/conjoint-analysis
Logistic Regression
Logistic regression is used for binary classification where we use sigmoid function, that takes
input as independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for
an input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to
Class 0. It’s referred to as regression because it is the extension of linear regression but is
mainly used for classification problems.
• Logistic regression predicts the output of a categorical dependent variable. Therefore, the
outcome must be a categorical or discrete value.
• It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0
and 1, it gives the probabilistic values which lie between 0 and 1.
• In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).
Logistic Function – Sigmoid Function
• The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
• It maps any real value into another value within a range of 0 and 1. The value of the logistic
regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve
like the “S” form.
• The S-form curve is called the Sigmoid function or the logistic function.
• In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a value
below the threshold values tends to 0.
Types of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three types:
1. Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered
types of the dependent variable, such as “cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as “low”, “Medium”, or “High”.
Assumptions of Logistic Regression
We will explore the assumptions of logistic regression as understanding these assumptions is
important to ensure that we are using appropriate application of the model. The assumption
include:
1. Independent observations: Each observation is independent of the other. meaning there is
no correlation between any input variables.
2. Binary dependent variables: It takes the assumption that the dependent variable must be
binary or dichotomous, meaning it can take only two values. For more than two categories
SoftMax functions are used.
3. Linearity relationship between independent variables and log odds: The relationship
between the independent variables and the log odds of the dependent variable should be
linear.
4. No outliers: There should be no outliers in the dataset.
5. Large sample size: The sample size is sufficiently large
Terminologies involved in Logistic Regression
Here are some common terms involved in logistic regression:
• Independent variables: The input characteristics or predictor factors applied to the
dependent variable’s predictions.
• Dependent variable: The target variable in a logistic regression model, which we are trying
to predict.
• Logistic function: The formula used to represent how the independent and dependent
variables relate to one another. The logistic function transforms the input variables into a
probability value between 0 and 1, which represents the likelihood of the dependent
variable being 1 or 0.
• Odds: It is the ratio of something occurring to something not occurring. it is different from
probability as the probability is the ratio of something occurring to everything that could
possibly occur.
• Log-odds: The log-odds, also known as the logit function, is the natural logarithm of the
odds. In logistic regression, the log odds of the dependent variable are modeled as a linear
combination of the independent variables and the intercept.
• Coefficient: The logistic regression model’s estimated parameters, show how the
independent and dependent variables relate to one another.
• Intercept: A constant term in the logistic regression model, which represents the log odds
when all independent variables are equal to zero.
• Maximum likelihood estimation: The method used to estimate the coefficients of the
logistic regression model, which maximizes the likelihood of observing the data given the
model.
https://stats.oarc.ucla.edu/stata/dae/logistic-regression/
• Prepare the data: The data should be in a format where each row represents a single
observation and each column represents a different variable. The target variable (the
variable you want to predict) should be binary (yes/no, true/false, 0/1).
• Train the model: We teach the model by showing it the training data. This involves
finding the values of the model parameters that minimize the error in the training data.
• Evaluate the model: The model is evaluated on the held-out test data to assess its
performance on unseen data.
• Use the model to make predictions: After the model has been trained and assessed,
it can be used to forecast outcomes on new data.
2. The technique strives to maintain the original proximities between datasets; objects
that are similar are positioned closer together, while dissimilar objects are placed
further apart in the reduced space.
• It, on the other hand, helps the psychologists to realize the mechanism of the
perception of the similarities or the differences between the stimuli, for example, the
words, the images, or the sounds.
• Market research applies MDS to the tasks of brand positioning, product positioning,
and market segmentation.
• The marketers employ the MDS to visualize and interpret the consumer perceptions of
the brands, products or services, which is hence they to make the decisions
strategically and for the marketing campaigns.
• MDS is employed in geography and cartography to see and learn the spatial
relationships between places, areas, or geographical features.
• It permits the cartographers to make maps that are true to the actual nature of the
geographical entities and their close proximity to each other.
• MDS is utilized in sociology and the social sciences for the analysis of the social
networks, intergroup relationships, and cultural differences.
• The sociologists employ the MDS to the survey data, the questionnaire responses or
the relational data to understand the social structures and dynamics.
• The adaptable nature of the scheme makes it suitable for various disciplines and data
types, thus, allowing it to fit into any research category.
• It assists in discovering the hidden structures inside the data, thus, revealing the
underlying patterns and relationships which may not be easily noticed.
• It helps to the hypothesis testing and the clustering analysis, thus the data-driven
decision-making which is the basis of the scales.
• Sensitivity to outliers: The MDS results can be distorted by outliers, which in turn
can affect the image or the interpretation of the connections.