Factor analysis is a statistical method used to identify underlying relationships among variables, simplifying data interpretation by reducing complexity through latent factors. It is widely applied in various fields to condense large datasets into smaller sets of factors that capture most of the variance. The process involves data collection, correlation analysis, factor extraction, and interpretation, with methods including exploratory and confirmatory factor analysis, as well as various extraction and rotation techniques.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
28 views54 pages
Factor Analysis
Factor analysis is a statistical method used to identify underlying relationships among variables, simplifying data interpretation by reducing complexity through latent factors. It is widely applied in various fields to condense large datasets into smaller sets of factors that capture most of the variance. The process involves data collection, correlation analysis, factor extraction, and interpretation, with methods including exploratory and confirmatory factor analysis, as well as various extraction and rotation techniques.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54
Factor Analysis
What is factor analysis
• Factor analysis is a statistical method used to identify patterns or underlying relationships among a set of variables. Its primary objective is to reduce the complexity in a dataset by identifying and explaining the underlying structure (latent factors) that influences observed variables. • This method is widely used in various fields such as psychology, sociology, economics, and other social sciences, as well as in market research and data analysis. • Factor analysis helps in dimensionality reduction, where a large number of variables are condensed into a smaller set of factors that capture most of the variance or information present in the original dataset. This process simplifies data interpretation and facilitates understanding the underlying structure influencing the observed variables. Process • The process of factor analysis involves analyzing the correlation matrix or covariance matrix of observed variables to uncover latent factors. These latent factors are unobservable variables that help explain the relationships and variations observed in the data. Example • Imagine you have a dataset that includes information about the academic performance of students. This dataset consists of various variables such as: 1. Math test score 2. Science test score 3. English test score 4. Attendance percentage 5. Study hours per week 6. Extracurricular activities participation • You're interested in exploring if there are underlying factors that contribute to academic performance. You suspect that academic performance might not just depend on individual subjects (math, science, English) but might be influenced by factors like 'study habits', 'attendance', and 'engagement in extracurricular activities'. • You decide to perform factor analysis on this dataset to uncover potential latent factors that explain the correlations among these variables. Steps 1. Data Collection: Gather data on the variables mentioned above for a sample of students. 2. Correlation or Covariance Matrix: Compute the correlation or covariance matrix of these variables. This matrix represents how each variable correlates or covaries with every other variable in the dataset. 3. Factor Extraction: Use factor analysis techniques to extract the underlying factors. For instance, applying an exploratory factor analysis (EFA) could reveal that the observed variables (test scores, attendance, study hours, etc.) are associated with a few underlying factors, say 'academic diligence', 'participation', and 'subject mastery'. 4. Interpretation: Examine the factor loadings - these show how strongly each variable is associated with each factor. Higher loadings indicate a stronger relationship between the variable and the factor. In our example, you might find that 'study hours' and 'attendance' have high loadings on the 'academic diligence' factor, while 'math test score' and 'science test score' have high loadings on the 'subject mastery' factor. 5. Naming and Understanding Factors: Based on the variables with high loadings on each factor, you can interpret and name the factors. For instance, 'academic diligence' might represent factors related to study habits and attendance, while 'subject mastery' might represent factors related to academic performance in specific subjects. 6. Further Analysis or Application: Once you've identified these factors, you can use this information for various purposes such as understanding what influences academic performance, designing interventions to improve student outcomes, or developing predictive models. Exploratory FA (EFA) • In EFA, the aim is to explore and uncover the underlying structure or patterns within the data without any prior assumptions about the number of factors or how variables are related to them. It helps in identifying the number of factors and their potential meaning based on the patterns observed in the data. Confirmatory FA (CFA) • CFA is used to confirm or test a hypothesized factor structure that is based on prior theory or expectations. It evaluates how well the observed variables relate to the pre-defined factors and assesses the fit of the model to the data. Purpose 1. To reduce a large no. of variable to a smaller no. of factors for modeling purposes, where the large number of variables precludes modeling all the measures, individually. As such factor analysis is integrated in structural equation modeling, helping create the latent variables modeled by SEM (structure equation model).
Meaning- When there are too many variables to analyze individually,
factor analysis helps condense or summarize them into a smaller set of underlying factors. These factors, called latent variables, capture the shared variance among the original variables. This condensed representation is useful for structural equation modeling (SEM), a statistical method that examines complex relationships between variables. .2. To select a subset of variables from a large set based on which original variable have the highest correlations with the principal component factors. • Meaning - When dealing with a large number of variables, factor analysis identifies which original variables have the strongest relationships or correlations with each other. It helps select a subset of variables that are highly correlated with the principal factors or underlying dimensions. This selection simplifies the analysis by focusing on the most relevant variables. • 3. To create a set of factors to be treated as uncorrelated variable as one approach to handling multicollinearity regression. • Meaning- In regression analysis, multicollinearity occurs when predictor variables are highly correlated with each other. Factor analysis can create new factors or variables that are not strongly correlated (uncorrelated). These new factors can then be used in regression models instead of the original variables, reducing multicollinearity issues and improving the stability of regression coefficients. Assumption • Factor analysis is a part of the Multiple General Linear Hypothesis (MGLH), family of procedures and makes many of the same assumptions as multiple regressions. Linear relationship interval or near-interval data, untruncated data, proper specification (relevant variable included extraneous are excluded), lack of high multicollinearity and multivariable normality for purpose of significant testing. Factor analysis generates a table on which the rows and the observed row indicator variables and the columns are the factor or latent variables which explain as much of the variable in those variables as possible. The cells in this table are factor loadings and the meaning of the factors must be induced from seeing which variables are most heavily loaded on which factors this inferential process can be fraught with difficulty as diverse researchers impute different tables. methods • Principal component method 2. Principal axes method 3. Summation method 4. Centroid method Principal Component Method: • The principal component method is a popular technique in factor analysis that identifies new variables, called principal components, which are linear combinations of the original variables. These components are ordered in terms of the amount of variation they explain in the data, with the first component capturing the maximum variance and subsequent components capturing decreasing amounts of variance. Principal Axes Method • The principal axes method is another approach in factor analysis that aims to extract factors or dimensions that maximize the variance of the observed variables. It considers the correlations among variables and aims to find factors that account for the most variance in the dataset along orthogonal (uncorrelated) axes. Summation Method • The summation method is a simpler technique used in factor analysis. It involves creating factor scores by summing or averaging standardized scores of the observed variables. Factors are derived by combining the variables into linear combinations without considering the underlying structure of the data or correlations between variables. Centroid Method: • The centroid method is a factor analysis approach that calculates the centroids (means) of the variables in a multidimensional space. It constructs factors based on the distances between the variables and these centroids, aiming to identify factors that are as distant as possible from each other while still summarizing the data effectively. Eigen value index • in the context of factor analysis, eigenvalues play a significant role in determining the number of factors to retain from a dataset. • In factor analysis, when extracting factors from a set of observed variables, the eigenvalue represents the amount of variance explained by each factor. Specifically, when the initial correlation matrix of variables is decomposed into its constituent factors, the eigenvalues represent the variances of these factors. • higher eigenvalues indicate that the associated factor explains a larger proportion of the total variance in the original variables. • The eigenvalue index, in the context of factor analysis, is often used in determining the number of factors to retain. The Kaiser criterion, for example, suggests retaining factors with eigenvalues greater than 1.0. This criterion assumes that factors with eigenvalues exceeding 1.0 explain more variance than a single original variable, hence are worth retaining. • Percentage of Variance: Orthogonal Rotation • Orthogonal rotation is a technique used in factor analysis to rotate or reorient the extracted factors in a way that they become uncorrelated or orthogonal to each other. This rotation method aims to simplify the interpretation of factors by maximizing the clarity of the underlying structure of the variables. • Orthogonal rotation methods help in producing factors that are easier to interpret and explain because they reduce the interrelatedness between factors • Here factors are rotated such that the original factors as well as rotated factors are orthogonal. The line between the factors axis remains 90• Types 1. Varimax Rotation: Varimax rotation maximizes the variance of factor loadings within each factor and aims to minimize the number of variables that have high loadings on each factor. This rotation simplifies the interpretation by creating more distinct and interpretable factors. 2. Quartimax Rotation: Quartimax rotation focuses on minimizing the number of factors that contribute to each variable, leading to factors that are simpler and more specialized. 3. Orthomax Rotation: Orthomax rotation is a flexible rotation method that allows researchers to adjust the level of orthogonality (uncorrelatedness) between factors. It balances the simplicity of interpretation and the degree of orthogonality among factors. Promax rotation • Promax rotation is a popular and effective technique used in factor analysis to rotate the extracted factors and make them more interpretable. Unlike orthogonal rotation methods that aim to produce uncorrelated (orthogonal) factors, Promax is an oblique rotation method that allows factors to be correlated with each other. • The factors are rotated such that the line between original and rotated factors is more than or less than 90• • Promax rotation is particularly beneficial when factors in factor analysis are expected to be related or correlated in real-world scenarios, as it allows for a more realistic representation of the relationships between these factors. Promax rotation 1. Correlated Factors: It recognizes that in real-world situations, factors may be related or correlated with each other. Promax rotation allows the resulting factors to be correlated, acknowledging the possibility of interrelatedness among the underlying constructs. 2. Simplicity and Interpretability: While permitting correlation between factors, Promax rotation still aims to simplify the factor structure and enhance interpretability. It tries to identify a simpler structure by grouping variables into factors with correlated structures that are easier to understand and explain. 3. Varied Degrees of Correlation: Promax rotation allows researchers to adjust the degree of inter-factor correlation, giving them flexibility in determining how correlated the factors should be based on the research context. 4. Mathematical Approach: Promax rotation utilizes mathematical techniques, such as oblique transformation, to rotate the factors and obtain a new factor structure where the factors are correlated in a meaningful way, promoting easier interpretation of the underlying relationships between variables. Principal Component Method: • Principal Component Analysis (PCA) is a factor extraction technique that identifies a set of linearly uncorrelated variables called principal components. These components are derived by maximizing the variance of the observed variables and ordering them based on the amount of variance they explain. PCA does not consider the underlying structure or correlations among variables and aims to capture the maximum amount of variance in the dataset. Principal Axis Method • The Principal Axis Factoring (PAF) method extracts factors based on the correlations among observed variables. It identifies factors that account for the common variance among the variables and aims to capture the shared variance while minimizing unique or error variance. This method considers the relationships between variables and extracts factors accordingly. Unweighted Least Squares (ULS): • Unweighted Least Squares is a factor extraction method that minimizes the differences between the observed and reproduced covariance matrices. It aims to find a factor solution that best fits the observed data by minimizing the discrepancies between the observed correlations and those implied by the factor model. ULS is less sensitive to deviations from normality compared to other methods. Maximum Likelihood: • Maximum Likelihood estimation is a method that estimates factor loadings and variances by maximizing the likelihood function. It assumes that the observed data follow a multivariate normal distribution and estimates the parameters that maximize the likelihood of observing the given data. Maximum Likelihood is commonly used when the data meet the assumption of normality. Inter-Factor Correlation Matrix • In factor analysis, after extracting factors from a set of observed variables, the inter-factor correlation matrix shows the correlations between the extracted factors. This matrix indicates the degree of correlation or association between different factors. If factors are orthogonal (uncorrelated), the off-diagonal elements of the matrix will ideally be close to zero, indicating no correlation between factors. Factor Rotation Method • Factor rotation is a technique used after factor extraction to reorient the factors for better interpretability. Rotation methods, such as Varimax, Promax, or Orthogonal rotation, aim to produce a simpler and clearer structure by changing the orientation of the factors. These methods can either keep factors uncorrelated (orthogonal rotation) or allow factors to be correlated (oblique rotation), depending on the research context. Factor Extraction Method • Factor extraction methods are techniques used to identify underlying factors from a set of observed variables. These methods, such as Principal Component Analysis (PCA), Principal Axis Factoring (PAF), Maximum Likelihood, or Unweighted Least Squares, aim to extract factors that explain the common variance among the variables while minimizing unique or error variance. Each extraction method has its assumptions and procedures for deriving factors from the observed data. Cattell's Scree Test (Cattell's Scree Plot) • Cattell's Scree Test is a graphical method proposed by Raymond Cattell used to determine the number of factors to retain in factor analysis. It involves plotting eigenvalues in descending order against the number of factors and visually identifying the "elbow" or point where the eigenvalues sharply drop, resembling a slope (scree) of a hill. This point indicates the number of significant factors to retain in the analysis, assisting researchers in determining the optimal number of factors to keep based on the steepness of the plot. Oblique Factors (Correlated) • Imagine you're studying student performance in two subjects, Math and Physics. In an oblique factor analysis, you expect that doing well in Math might also relate to doing well in Physics. If a student excels in Math, they might also likely perform well in Physics, as these subjects could be related or connected. In this case, the factors (Math and Physics) are correlated or related to each other. Orthogonal Factors (Uncorrelated): • Now, consider studying two completely unrelated aspects, like shoe size and reading speed. In an orthogonal factor analysis, you expect no relationship between these factors. A person's shoe size doesn't affect their reading speed, and vice versa. So, these factors are considered orthogonal or uncorrelated, meaning they are independent and have no connection to each other. • In summary, oblique factors in factor analysis represent factors that are related or correlated, like subjects that might influence each other's performance. On the other hand, orthogonal factors represent factors that are unrelated or independent, like characteristics that have no influence on each other, such as shoe size and reading speed. The choice between using oblique or orthogonal factors depends on whether you expect the factors being studied to be related or independent in your research context. Methods for Obtaining Oblique (Correlated) Factors: 1. Promax Rotation: Promax is an oblique rotation method that allows factors to be correlated with each other. It simplifies the factor structure while allowing for factors to be related, acknowledging the possibility of interrelatedness among the underlying constructs. 2. Oblimin Rotation: Oblimin is another oblique rotation method that allows for correlated factors. It provides control over the degree of correlation between factors, allowing researchers to adjust the inter-factor relationships based on the research context. 3. Direct Oblimin Rotation: Direct Oblimin is a variant of the Oblimin method that restricts the correlations between factors to a specific pattern, such as zeroing out specific correlations or constraining the amount of inter-factor correlation. Methods for Obtaining Orthogonal (Uncorrelated) Factors: 1. Varimax Rotation: Varimax is an orthogonal rotation method that aims to maximize the variance of factor loadings within each factor and minimize the number of variables that have high loadings on each factor. It produces factors that are uncorrelated or orthogonal to each other. 2. Quartimax Rotation: Quartimax is another orthogonal rotation method that focuses on minimizing the number of factors that contribute to each variable, leading to simpler and more specialized factors with minimal inter-factor correlations. 3. Equamax Rotation: Equamax is an orthogonal rotation method that allows researchers to strike a balance between maximizing factor simplicity and ensuring orthogonality among factors. It provides control over the amount of variance explained by each factor and the degree of inter-factor correlation. Previous Year Questions Psychopedia Question • Scree plot and Eigen value Answer Ans