0% found this document useful (0 votes)
42 views94 pages

BRM - Topic 8

This document provides an overview of correlation, factor analysis, and regression techniques. It discusses correlation analysis and how it identifies the strength of relationships between variables. Product moment correlation and partial correlation are explained. Factor analysis is defined as a technique used for data reduction and summarization to identify underlying dimensions or factors that explain correlations among variables. Applications of factor analysis include market segmentation, product research, and advertising studies. Benefits include spotting trends in data and pinpointing the number of factors. Steps for conducting factor analysis are outlined.

Uploaded by

Hoa Ngomiinh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views94 pages

BRM - Topic 8

This document provides an overview of correlation, factor analysis, and regression techniques. It discusses correlation analysis and how it identifies the strength of relationships between variables. Product moment correlation and partial correlation are explained. Factor analysis is defined as a technique used for data reduction and summarization to identify underlying dimensions or factors that explain correlations among variables. Applications of factor analysis include market segmentation, product research, and advertising studies. Benefits include spotting trends in data and pinpointing the number of factors. Steps for conducting factor analysis are outlined.

Uploaded by

Hoa Ngomiinh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

Business Research Methods

Correlation
Factor analysis
&
Regression
GROUP 3
Table of Content

I - Technique Overview
1. Correlation
2. Factor Analysis
3. Regression

II - Technique Practices
Technique
Overview
Correlation
Analysis
Correlation?

Correlation analysis is a statistical method that identifies


the strength of a relationship between two or more variables.
Product Moment Correlation

Summarizes the strength of association between two metric


(interval or ratio scaled) variables, say X and Y.

An index used to determine whether a linear or straight-line relationship


exists between X and Y.

Originally proposed by Karl Pearson (the Pearson correlation coefficient, simple


correlation, bivariate correlation, or merely the correlation coefficient).
Product
Moment
Correlation
A researcher wants to
explain attitudes
toward a respondent’s
city of residence in
terms of duration of
residence in the city

Example
r = 0.9361 - Respondents’
duration of residence in the city
is strongly associated with their
attitude toward the city.

The positive sign of r implies a


positive relationship
When it is computed for a population rather than a sample, the product moment
correlation is denoted by ρ, the Greek letter rho. The coefficient r is an estimator of ρ
the critical value of t for a two-tailed
test and a = 0.05 is 2.228

the null hypothesis of no relationship


between X and Y is rejected
high r = this relationship is strong
Partial Correlation

Measures the association between two variables after controlling


for or adjusting for the effects of one or more additional variables
predicted values of x based on a knowledge of z
predicted values of y based on a knowledge of z
The researcher wanted
to calculate the
association between
attitude toward the
city, and duration of
residence, after
controlling for a third
variable, importance
attached to weather

Example
The simple correlations between the variables are:

Partial correlation

controlling for the effect of importance attached to weather has little effect
on the association between attitude toward the city and duration of residence
Partial correlations have an order associated with them. The order
indicates how many variables are being adjusted or controlled.

The simple correlation coefficient, r, has a zero-order, as it does


not control for any additional variables when measuring the
association between two variables.
Part correlation coefficient

The correlation between Y and X when the linear effects of the other
independent variables have been removed from X but not from Y.
Let’s calculate the part correlation between attitude toward the city and the
duration of residence, when the linear effects of the importance attached to
weather have been removed from the duration of residence.

Example
Non Metric Correlation

If the non metric variables are ordinal and numeric, Spearman’s rho, ρs, and
Kendall’s tau, τ, are two measures of non metric correlation that can be used to
examine the correlation between them.

Both these measures use rankings rather than the absolute values of the variables
In the absence of ties, Spearman's ρs yields a closer
approximation to the Pearson product moment correlation
coefficient, ρ, than Kendall's τ. In these cases, the absolute
magnitude of τ tends to be smaller than Pearson's.

On the other hand, when the data contain a large number


of tied ranks, Kendall's τ seems more appropriate.
Situations to use

- Customer Satisfaction and Loyalty


- Price and Demand
- Advertising Exposure and Brand Awareness
- Social Media Engagement and Sales
Information collected

- Strength of Relationship
- Direction of Relationship
- Patterns and Trends
Answers to
Product Moment Correlation
- How strongly are sales related to advertising expenditures?
- Is there an association between market share and size of the sales force?
- Are consumers’ perceptions of quality related to their perceptions of prices?

Partial Correlation
- How strongly are sales related to advertising expenditures when the effect of price
is controlled?
- Is there an association between market share and size of the sales force after
adjusting for the effect of sales promotion?
- Are consumers’ perceptions of quality related to their perceptions of prices when
the effect of brand image is controlled?
Factor
Analysis
Definition
Factor analysis is a general name denoting a class of procedures
primarily used for data reduction and summarization.

Factor analysis is an interdependence technique in that an


entire set of interdependent relationships is examined without
making the distinction between dependent and independent
variables.
To identify underlying dimensions, or factors,
that explain the correlations among a set of
variables.

Objectives
To identify a new, smaller, set of uncorrelated
variables to replace the original set of
Factor analysis is used in the following correlated variables in subsequent multivariate
circumstances: analysis (regression or discriminant analysis).

To identify a smaller set of salient variables


from a larger set for use in subsequent
multivariate analysis.
Applications The technique has numerous applications in
marketing research

Market Segmentation
Identifying the underlying variables on which to group the customers
New car buyers might be grouped based on the relative emphasis they place on economy,
convenience, performance, comfort, and luxury ---> 5 segments: economy seekers, convenience
seekers, performance seekers, comfort seekers, and luxury seekers.

Product Research
Determine the brand attributes that influence consumer choice
Toothpaste brands might be evaluated in terms of protection against cavities, whiteness of
teeth, taste, fresh breath, and price.

Advertising Studies
Understand the media consumption habits of the target market
The users of frozen foods may be heavy viewers of cable TV, see a lot of movies, and listen to
country music.
Benefits
Spotting trends
Example:
A retail company conducts factor analysis on
customer purchase data.
They discover that customers who buy certain
products also tend to purchase related
accessories.
This insight allows the company to create
targeted product bundles and improve cross-
selling strategies
Benefits
Pinpoint the number of factors in a data set
Example:
Studying employees’ performance in the
company. By using factor analysis, they identify
key factors such as working habits,
attendance, and overtimes involvement.
This helps managers focus on these critical
aspects to improve overall performance
outcomes.
Benefits
Streamlines segmenting data
Example:
An e-commerce business planning a customer
segmentation study uses factor analysis to
streamline data inputs.
They find that customer preferences for fast
delivery, product variety, and pricing structure
are interconnected ---> the creation of distinct
customer segments for targeted marketing
campaigns and personalized services.
Business Questions
Market Product Employee
Segmentation Development Satisfaction
What attributes are most
What are the key factors What factors contribute most to
important to our customers
influencing customer preferences employee satisfaction in the
when considering a new
in our market? workplace?
product?

Brand Customer Marketing


Perception Satisfaction Effectiveness
What factors contribute to how Which factors have the most
What factors contribute most to
our brand is perceived in the significant impact on customer
the success of our marketing
market? satisfaction with our
campaigns?
products/services?
Conducting
Formulate
Identify the objectives of the factor analysis.
Determine the variables for factor analysis based on past research, theory, and the
researcher's judgment.
Verify that variables are measured on an interval or ratio scale for accurate analysis.
Use an appropriate sample size, ideally at least four or five times the number of variables.
In situations with small sample sizes, interpret results cautiously, as the ratio may be
considerably lower in some marketing research scenarios.

Example of Factor Analysis:


A researcher aims to identify
underlying consumer preferences
for toothpaste benefits.
30 respondents were interviewed
using mall-intercept
interviewing.
Respondents rated agreement on
statements using a 7-point scale.
Construct the Correlation Matrix
The analytical process is based on a matrix of correlations between the
variables.

Bartlett's test of sphericity:


Can be used to test the null hypothesis that the variables are uncorrelated in the
population: in other words, the population correlation matrix is an identity
matrix.
If this hypothesis cannot be rejected, then the appropriateness of factor analysis
should be questioned.

Kaiser-Meyer-Olkin (KMO):
Measure of sampling adequacy.
Small values of the KMO statistic indicate that the correlations between pairs of
variables cannot be explained by other variables and that factor analysis may not
be appropriate.
A KMO value greater than 0.5 is generally desirable.
Construct the Correlation Matrix
Determine the Method of Factor Analysis
Once it has been determined that factor analysis is suitable for analyzing the data,
an appropriate method must be selected

Principal Components Analysis:


Considers total variance in the data.
Diagonal of the correlation matrix has unities, bringing full variance into the factor
matrix.
Used to find the minimum factors accounting for maximum variance for subsequent
multivariate analysis.
Factors are termed principal components.

Common Factor Analysis (Principal Axis Factoring):


Estimates factors based on common variance.
Communalities in the diagonal of the correlation matrix.
Suitable when identifying underlying dimensions is the focus and common variance
is of interest.
Determine the Method of Factor Analysis
Determine the Method of Factor Analysis
Determine the Number of Factors
A Priori Determination:
Specify the number of factors based on prior knowledge.
Extraction stops when the desired number is reached.

Determination Based on Eigenvalues:


Retain factors with eigenvalues greater than 1.0.
Factors with variance less than 1.0 are excluded.
Conservative for fewer than 20 variables.

Determination Based on Scree Plot:


Plot eigenvalues against the number of factors.
Look for a break (scree) indicating the true number of factors.
Usually one or a few more than the eigenvalue criterion.
Determine the Number of Factors
Determine the Number of Factors
Determination Based on Percentage of Variance:
Extract factors to reach a cumulative percentage of variance.
Suggested minimum: factors should account for at least 60 percent of the
variance.

Determination Based on Split-Half Reliability:


Split the sample in half and perform factor analysis.
Retain factors with high correspondence of loadings across both subsamples.

Determination Based on Significance Tests:


Test the statistical significance of eigenvalues.
Retain only statistically significant factors.
Caution: With large samples (>200), many factors may be statistically significant
but practically less significant in variance.
Rotate Factors
Rotation simplifies the matrix for easier interpretation.
Desired Factor-Variable Relationships:
Factors should have significant loadings on only some variables.
Variables should have significant loadings with only a few factors, ideally just
one.

Orthogonal Rotation:
Axes are maintained at right angles.
Commonly used method: Varimax.
Minimizes variables with high loadings on a factor, enhancing interpretability.
Results in uncorrelated factors.

Oblique Rotation:
Axes are not at right angles, allowing for correlated factors.
Useful when factors in the population are likely to be strongly correlated.
Can simplify the factor pattern matrix by permitting correlations among factors.
Rotate Factors

Factor Matrix Before and After Rotation


Interpret Factors
A factor can then be interpreted in terms of the variables that load
high on it.
Another useful aid in interpretation is to plot the variables, using
the factor loadings as coordinates. Variables at the end of an axis
are those that have high loadings on only that factor, and hence
describe the factor.

Factor Loading Plot


Calculate Factor Scores
The factor scores for the ith factor may be estimated as
follows:

Fi = Wi1 X1 + Wi2 X2 + Wi3 X3 + . . . + Wik Xk

where

Fi = estimate of ith factor


Wi = weight or factor score coefficient
k = number of variables
Select Surrogate Variables
Selecting Surrogate Variables:
Choose the variable with the highest loading on each
factor from the factor matrix.
Decision Challenges:
If multiple variables have similar high loadings, decide
based on theoretical and measurement considerations.
Determine the Model Fit
Correlation Examination:
Reproduce correlations between variables using
estimated correlations with factors.
Model Fit Assessment:
Compare observed correlations from the input matrix
with reproduced correlations from the factor matrix.
Differences are called residuals and are examined to
assess model fit.
Regression
Analysis
Regression
Analysis
Regression analysis is a powerful and flexible procedure for
analyzing associative relationships between a metric
dependent variable and one or more independent variables.

It can be used to determine:

Whether a How strong Mathematical Prediction of Control other


relationship the equations future values independent
exists relationship illustrate variables
Linear Relationship
Regression analysis, specifically linear regression, is fitting
when anticipating consistent, proportional changes in the
dependent variable associated with variations in the
independent variables, assuming a linear relationship between

The use of
them.

Quantitative Data
regression Regression is appropriate for quantitative variables on an
interval or ratio scale. For instance, predicting salary based on

analysis
experience, education, and age exemplifies a fitting
application of regression analysis.

Multivariate Relationships
If you have more than one independent variable and you want
to understand how they collectively influence the dependent
variable, multiple regression analysis can be employed.
Bivariate
Regression
Bivariate regression is a procedure for deriving a mathematical
relationship, in the form of an equation, between a single metric
dependent or criterion variable and a single metric independent
or predictor variable. The analysis is similar in many ways to
determining the simple correlation between two variables.

Can variation in sales be explained in terms of variation in advertising expenditures?


Can the variation in market share be accounted for by the size of the sales force?
Are consumers’ perceptions of quality determined by their perceptions of price?
Conducting
Bivariate
Regression
Analysis
Scatter
Diagram
A scatter diagram, or
scattergram, is a plot of
the values of two variables
for all the cases or
observations. A scatter
diagram is useful for
determining the form of
the relationship between
the variables.
Bivariate Regression
Formula
In the bivariate regression In deterministic regression, Y is solely
model, the general form of a determined by X, but real-world
straight line is: relationships are seldom deterministic.
To accommodate the probabilistic or
stochastic nature, the regression model
incorporates an error term.
Ordinary least squares
(OLS) regression

This technique determines


the best-fitting line by
minimizing the square of the
vertical distances of all the
points from the line
The best-fitting line is called
the regression line
Parameters Estimation
In most cases, the slope and intercept are unknown and are
estimated from the sample observations using the equation
Multiple Regression

Multiple regression involves a single dependent variable and two


or more independent variables. The questions raised in the
context of bivariate regression can also be answered via multiple
regression by considering additional independent variables.

Can variation in sales be explained in terms of variation in advertising expenditures,


prices, and level of distribution?
What is the contribution of advertising expenditures in explaining the variation in sales
when the levels of prices and distribution are controlled?
Multiple Regression
Estimated Regression Equation:
Coefficient of Multiple Determination
2 2
R and Adjusted R

The strength of association in multiple


regression is measured by the square of the
2
multiple correlation coefficient, R , which is also
called the coefficient of multiple determination.

2
It is good practice to use adjusted R rather than
2 2
R because adjusted R only increases when a
new independent variable contributes
significantly to explain total variations in the
dependent variable.
Collinearity and
Multicollinearity
When the independent variables X1, X2, . . . , Xm are related to
each other instead of being independent, we have a condition
known as multicollinearity. If only two predictors are correlated,
we have collinearity.
Multicollinearity may be detected through “variance inflation
factors (VIF)”
Significance Testing
The purpose of significance testing in regressio analysis is to
determine how an independant variable affects the depent
variable (positively, negatively or no effects at all)

Common hypotheses are:

That variable has no effects

That variable has great effects


Positively if > 0
Negatively if < 0
Testing Parameters
To go more easily, we use t-test to perform hypothesis testing

T-statistic (or T-ratio):

Degree of freedom:

The greater the number of observations n, the more distinct the sampling distribution of the
estimated parameter β is closer to the standard distribution and the easier it is to reject the
hypothesis, with k is the number of variables explained in the model.
Two-sided Test
Hypotheses:

We reject the null hypothesis if


absolute value of t-statistic is
greater than critical value,
otherwise fail to reject

Reject Ho if |t-ratio| > ca/2


One-sided Test (Positively)
Hypotheses:

We reject the null hypothesis if


the value of t-statistic is greater
than critical value, otherwise fail
to reject

Reject Ho if t-ratio > c a ---> Xj may affect positively Y


One-sided Test (Negatively)
Hypotheses:

We reject the null hypothesis if


the value of t-statistic is less
than critical value, otherwise fail
to reject

Reject Ho if t-ratio < c a ---> Xj may affect negatively Y


Technique
Practices
Correlation &
Regression
Analysis
1. Are there any relationships among EMP, SAL, TRA, EXP,
MAS, SUP and HWL? Which of them are significant?

To know the relationships, we use Correlation analysis


Put all the variables together
Pearson Correlation:
From -1 to 1: Relationship
0: No relationship

Significance:
<0.05 is significant
2. The EMP is potentially influenced by both SAL and MAS. If this is the case, to
which extent the variation of EMP is uniquely due to SAL? And, uniquely due to MAS?
The extent that EMP is EMP is significantly
uniquely due to SAL & MAS influenced by SAL & MAS
Variables with a higher absolute Beta coefficient will have stronger unique
influence on EMP
3. Whether increasing salary is a good solution for improving employees’ performance? To which extent
employees’ salary predicts their performance?
R Square = .387 indicates that 38.7% of EMP can be explained by SAL

Sig. = .000 < .05 ---> can predict EMP based on SAL

EMP = 2.027+ 0.586SAL


4. Whether providing more supports from managers is a good solution for improving employees’ performance?
Identify the extent to which employees’ performance is explained by managers’ support?
R Square = .009 indicates that only 9% of EMP can be explained by SUP

Sig. = .045 < .05 ---> can predict EMP based on SUP

EMP = 4.734 + 0.077SUP


--> The managers spend
more time and resources
on employees but their
performance increase
slightly
5. Do salary, training, experience, managers’ support, management style and heavy workload predict employees’
performance? To which extent employees’ performance is explained by all these factors? Among the influencing
factors, which is the most and the least important factor in determining employees’ performance?
R Square = .555 indicates that 55.5% of EMP can be predict by HWL, TRA, SAL, EXP, SUP, MAS

sig. = 0.000 <0.05 ---> the research model can help predict EMP
The most important factor is training (TRA ) (0.269)
The least important factor is heavy workload (HWL) (0.021)

EMP = 1,442 + 0.184SAL + 0.247TRA + 0.164EXP + 0.197MAS

Even though HWL and SUP have Sig. > 0.05 , we don’t remove them from the model and re-run the analysis
Factor
Analysis
Practice
1. Theoretically, how many attributes the managers should pay attention to?
Theoretically, the manager should pay
attention to all the attributes:
A1: Acceleration
A2: Engine size
A3: Sporty shape
A4: Modern design
A5: Airbag
A6: Safety rating
A7: Stability control
A8: Max speed
A9: Colour
A10: ABS brake
A11: Entertainment systems
A12: Number of cylinders
A13: Panoramic sunroof
A14: Number of seats
A15: Voice recognition
2. Practically, what can be the disadvantage if the managers pay attention to those
attributes?

Practically, the manager should only


focus on the attributes with the mean
closer to 5. like:
A5: Airbag
A6: Safety rating
A10: ABS brake
A11: Entertainment systems
A14: Number of seats
A15: Voice recognition
3.Because of the resource constraints, the managers can only spend their resources on few
key factors that consumers expect from the car they intend to buy. How to identify those
key factors? What are they?
3.Because of the resource constraints, the managers can only spend their resources on few
key factors that consumers expect from the car they intend to buy. How to identify those
key factors? What are they?
3.Because of the resource constraints, the managers can only spend their resources on few
key factors that consumers expect from the car they intend to buy. How to identify those
key factors? What are they?
3.Because of the resource constraints, the managers can only spend their resources on few
key factors that consumers expect from the car they intend to buy. How to identify those
key factors? What are they?
3.Because of the resource constraints, the managers can only spend their resources on few
key factors that consumers expect from the car they intend to buy. How to identify those
key factors? What are they?
3.Because of the resource constraints, the managers can only spend their resources on few
key factors that consumers expect from the car they intend to buy. How to identify those
key factors? What are they?

Sig. = 0.000 < 0.05 --->


KMO = .856 > 0.5
------> Appropriate to analyze
3.Because of the resource constraints, the managers can only spend their resources on few
key factors that consumers expect from the car they intend to buy. How to identify those
key factors? What are they?

15 variables is measure 4 factors


3.Because of the resource constraints, the managers can only spend their resources on few
key factors that consumers expect from the car they intend to buy. How to identify those
key factors? What are they?

Component 1: Safetiness Component 2: Engines


A5: Airbag A1: Acceleration
A7: Stability control A2: Engine size
A10: ABS brake A8: Max speed

Component 4: Design
Component 3: Utilities
A3: Sporty shape
A11: Entertainment systems
A4: Modern design
A14: Number of seats
A9: Colour
A15: Voice recognition
A13: Panoramic sunroof
Business Research Methods

Thank
you very
much!
GROUP 3

You might also like