How to Calculate Correlation in R with Missing Values Last Updated : 08 May, 2025 Comments Improve Suggest changes Like Article Like Report When calculating correlation in R, missing values are excluded by default using a method called pairwise deletion. This means R ignores any observation where a variable in the pair is missing.How to Calculate Correlation in R with Missing ValuesThere are several ways to calculate correlation in R when the data contains missing values:Using cor() with "complete.obs" to exclude any rows with missing data.Using cor() with "pairwise.complete.obs" to calculate correlation for each variable pair using all available data.Manually handling missing values using data cleaning techniques.Applying imputation and then using cov() or cor() to calculate correlation on the cleaned data.1. Using cor() with complete.obsIn this example, we use the cor() function to calculate the correlation coefficient between x and y. By specifying use = 'complete.obs',it calculate the correlation coefficient using only complete observations. The resulting correlation coefficient is then printed to the console. R data <- data.frame( A = c(1, 2, 3, NA, 5), B = c(5, NA, 7, 8, 9), C = c(10, 11, 12, 13, NA) ) correlation_matrix <- cor(data, use = "complete.obs") print(correlation_matrix) Output: A B CA 1 1 1B 1 1 1C 1 1 12. Using cor() with pairwise.complete.obsIn this example, we use the cor() function again, by specifying use = 'pairwise.complete.obs', it calculates correlation matrix based on pairwise complete observations. The resulting correlation matrix is then printed to the console. R df <- data.frame( x = c(1, 2, 3, NA, 5), y = c(4, NA, 6, 7, 8) ) correlation_matrix <- cor(df, use = 'pairwise.complete.obs') print(correlation_matrix) Output: x yx 1 1y 1 13. Calculate Correlation with Missing Values by Handling Missing Values ManuallyIn this approach ,missing values are manually handled by removing rows with missing values before calculating the correlation matrix. It ensures that only complete data is used in the correlation calculation. R data <- data.frame( x = c(1, 2, 3, NA, 5), y = c(3, NA, 4, 5, 6) ) complete_data <- na.omit(data) correlation_matrix <- cor(complete_data) correlation_matrix Output: x yx 1.0000000 0.9819805y 0.9819805 1.00000004. Using the cov() and cor() Functions with ImputationIn this method, we impute missing values with the mean of each column before calculating the correlation coefficients using all available data. R data <- data.frame( x = c(1, 2, 3, NA, 5), y = c(3, NA, 4, 5, 6) ) imputed_data <- apply(data, 2, function(x) ifelse(is.na(x), mean(x, na.rm = TRUE), x)) covariance_matrix <- cov(imputed_data) correlation_matrix <- cor(imputed_data) correlation_matrix Output: x yx 1.0000000 0.8882165y 0.8882165 1.0000000In this article, we’ll explorec different ways to handle missing values when computing correlation. Comment More infoAdvertise with us Next Article How to Calculate Correlation in R with Missing Values A abhaystriver Follow Improve Article Tags : R Language Dev Scripter R Basics Dev Scripter 2024 Similar Reads How to Calculate Correlation Between Multiple Variables in R? In this article, we will discuss how to calculate Correlation between Multiple variables in R Programming Language. Correlation is used to get the relation between two or more variables: The result is 0 if there is no correlation between two variablesThe result is 1 if there is a positive correlatio 4 min read How to Calculate Partial Correlation in R? In this article, we will discuss how to calculate Partial Correlation in the R Programming Language. Partial Correlation helps measure the degree of association between two random variables when there is the effect of other variables that control them. in partial correlation in machine learning It g 3 min read How to Calculate Rolling Correlation in R? In this article, we will discuss Rolling Correlation in R Programming Language. Correlation is used to get the relationship between two variables. It will result in 1 if the correlation is positive.It will result in -1 if the correlation is negative.it will result in 0 if there is no correlation. Ro 2 min read How to Calculate Partial Correlation Matrix With Excel VBA? Correlation is the way to measure the relation between two variables. The value of the correlation lies between -1 to 1. If the value is greater than 0 then both the values are positively correlated, if the value of the correlation is 0 then there is no such relation between the two variables and if 4 min read How to Calculate Cross Correlation in R? In this article we will discuss how to calculate cross correlation in R programming language. Correlation is used to get the relation between two or more variables. The result is 0, if there is no correlation between two variablesThe result is 1, if there is positive correlation between two variable 1 min read How to Calculate Polychoric Correlation in R? In this article, we will discuss how to calculate polychoric correlation in R Programming Language. Calculate Polychoric Correlation in R Correlation measures the relationship between two variables. we can say the correlation is positive if the value is 1, the correlation is negative if the value is 2 min read How to Calculate Point-Biserial Correlation in R? In this article, we will discuss how to calculate Point Biserial correlation in R Programming Language. Correlation measures the relationship between two variables. we can say the correlation is positive if the value is 1, the correlation is negative if the value is -1, else 0. Point biserial correl 2 min read How to find missing values in a factor in R Missing values are a regular occurrence in data analysis, and they might limit the precision and trustworthiness of your findings. When working with factors in R, the process gets considerably more complex. Have no fear! This article is your guide through the maze of missing values in R factors. We' 2 min read How to Calculate Partial Correlation in Excel? Partial correlation helps find the correlation between the two variables by removing the effect of the third variable. There can be situations when the relations between variables can be many. This could reduce the accuracy of correlation or could also give wrong results. Partial correlation removes 5 min read Like