STAT502
STAT502
On
Principle component analysis, Factor analysis and
Multidimensional scaling
Definition
The central idea of principal component analysis (PCA) is to reduce the dimensionality of a data
set consisting of a large number of interrelated variables, while retaining as much as possible of
the variation present in the data set. This is achieved by transforming to a new set of variables, the
principal components (PCs), which are uncorrelated, and which are ordered so that the first few
retain most of the variation present in all of the original variables.
"Or"
It is a way of identifying patterns in data, and expressing the data in such a way as to highlight
their similarities and differences. Since patterns in data can be hard to find in data of high
dimension, where the luxury of graphical representation is not available, PCA is a powerful tool
for analyzing data.
Goals of PCA
The goals of PCA are to:
1. extract the most important information from the data table;
2. compress the size of the data set by keeping only this important information;
3. simplify the description of the data set; and
4. Analyze the structure of the observations and the variables.
5. Compress the data, by reducing the number of dimensions, without much loss of
information.
6. This technique used in image compression.
Principal Component Analysis (PCA) is used to reduce the dimensionality of a data set by
finding a new set of variables, smaller than the original set of variables, retaining most of the
sample’s information, and useful for the regression and classification of data.
Overall, PCA is a powerful tool for data analysis and can help to simplify complex datasets,
making them easier to understand and work with.
Step 1: Standardization
First,
we need to standardize our dataset to ensure that each variable has a mean of 0 and a standard
deviation of 1.
(𝑿−𝝁)
Z=
𝝈
Factor analysis a statistical tool used to examine the inter relationships among various
variables. It investigates several variables simultaneously and tries to locate them into
a small number of dimensions that are referred to as factors.
Factor analysis is a very useful and popular method of multivariate technique, mostly
used in social and behavioral sciences. This technique applicable when there is a
systematic interdependence among a set of observed manifest variables, and the
researcher is interested in finding out something more fundamental or latent which
creates this communality (commonness). For example, we may have data on farmers'
education, occupation, land, house, farm power, material possession, social
participation etc.
Factor analysis makes use of several assumptions in order to produce the outcomes:
➢ There will not be any outliers in the data.
➢ The sample size will be greater than the size of the factor.
➢ Since the method is interdependent, there will be no perfect multicollinearity between
any of the variables.
➢ When in a sequence of random variables, all the variables have the same finite variance,
known as being homoscedastic. Since factor analysis works as a linear function, it will
not need homoscedasticity between variables.
➢ There is the assumption of linearity. This means that even non-linear variables can be
used, but once transferred, they become linear variables.
1. The number and nature of dimensions consumers use to perceive different brands
in the marketplace
2. The positioning of current brands on these dimensions
3. The positioning of consumers ideal brand on these dimensions
1. Psychology and Cognitive Science: the process of decision making. It, on the other
hand, helps the psychologists to realize the mechanism of the perception of the
similarities or the differences between the stimuli, for example, the words, the images, or
the sounds.
2. Market Research and Marketing: Market research applies MDS to the tasks of brand
positioning, product positioning, and market segmentation. The marketers employ the
MDS to visualize and interpret the consumer perceptions of the brands, products or
services, which is hence they to make the decisions strategically and for the marketing
campaigns.