Factor Analysis
and
Principal Component Analysis (PCA)
Factor Analysis
Factor analysis is a general name denoting a class of procedures primarily
used for data reduction and summarization.
Variables are not classified as either dependent or independent. The whole
set of interdependent relationships among variables is examined in order to
define a set of common dimension called factors.
Procedure used to reduce a large amount of questions into few variables
(Factors) accordingly their relevance
Factor analysis is commonly used
Data reduction
Scale development
The evaluation of the psychometric quality of a measure
In business research, Factor Analysis is significant:
Dimensionality Reduction: It helps in reducing the number of variables to a
smaller set of factors that capture the essential information in the data. This
simplification aids in data interpretation and understanding.
Identifying Hidden Patterns: Factor Analysis can uncover hidden patterns or
structures within data that may not be apparent from individual variables
alone. This can provide valuable insights into underlying relationships among
variables.
Variable Selection: It assists in selecting a subset of variables that are most
representative of the underlying factors, thus aiding in variable selection for
further analysis or modeling.
Market Segmentation: Factor Analysis can be used in market research to
identify customer segments based on underlying preferences or behavior
patterns, allowing businesses to tailor their marketing strategies accordingly.
Principal Component Analysis (PCA)
PCA was invented in 1901 by Karl Pearson
Independently developed by Harold Hotelling in the 1930
Principal component analysis (PCA) is a technique used to emphasize
variation and bring out strong patterns in a datasets.
Process (steps) involved in PCA
Standardization: The first step is to standardize the data by subtracting the mean and dividing by the
standard deviation for each variable. This ensures that all variables are on the same scale and have
equal weight in the analysis.
Covariance Matrix Computation: Next, the covariance matrix of the standardized data is computed.
This matrix represents the pairwise covariances between all pairs of variables.
Eigenvector Decomposition: PCA then decomposes the covariance matrix into its eigenvectors and
eigenvalues. Eigenvectors represent the directions (or principal components) of maximum variance
in the data, while eigenvalues indicate the amount of variance explained by each eigenvector.
Selection of Principal Components: Principal components are selected based on their corresponding
eigenvalues. The principal components with the highest eigenvalues explain the most variance in
the data and are retained for further analysis.
Projection: Finally, the original data is projected onto the selected principal components, resulting
in a new set of variables (principal component scores) that capture most of the variability in the
data.
Chi-Square Test
It was developed by Karl Pearson in 1900
Chi Square test is a non parametric test and it is used for testing the hypothesis and is not
useful for estimation.
Chi-square test is a useful measure of comparing experimentally obtained result with those
expected theoretically and based on the hypothesis
It is a mathematical expression, representing the ratio between experimentally
obtained result (O) and the theoretically expected result (E) based on certain
hypothesis. It use data in the form of frequencies.
If there is no difference between actual and observed frequencies, the value of
chi-square is zero.