How to Create a Correlation Matrix using Pandas?
Last Updated :
27 May, 2025
Correlation Matrix is a statistical technique used to measure the relationship between two variables. Using Pandas, you can easily generate a correlation matrix to understand how features relate whether they move together, in opposite directions, or show no clear trend. Let’s explore various effective methods to create a correlation matrix using Pandas, NumPy and SciPy.
Using DataFrame.corr()
This method computes the Pearson correlation coefficient, measuring the linear relationship between columns. Values range from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear relationship. The diagonal is always 1 because each column perfectly correlates with itself.
Python
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 4, 3, 2, 1],
'C': [2, 3, 2, 3, 2]
})
res = df.corr()
print(res)
Output A B C
A 1.0 -1.0 0.0
B -1.0 1.0 0.0
C 0.0 0.0 1.0
Explanation: Columns A and B have a perfect negative correlation (-1) because as A increases, B decreases. Column C shows no linear correlation with others, indicated by values near 0.
Using DataFrame.corr(method='spearman') or 'kendall'
These compute rank-based correlations instead of using raw values. Spearman measures how well a monotonic relationship fits (useful for non-linear but consistent trends), while Kendall compares data orderings differently. Both work well with non-linear or ordinal data.
Python
import pandas as pd
df = pd.DataFrame({
'X': [10, 20, 30, 40, 50],
'Y': [50, 40, 30, 20, 10],
'Z': [5, 7, 6, 8, 7]
})
# Spearman correlation matrix
a = df.corr(method='spearman')
print(a)
# Kendall correlation matrix
b = df.corr(method='kendall')
print(b)
Output
Using DataFrame.corr(method='spearman') or 'kendall'Explanation: X and Y have a perfect negative correlation (-1). Z shows moderate positive correlation with X and moderate negative with Y, reflecting consistent but not perfect monotonic relationships in both Spearman and Kendall matrices.
Using numpy.corrcoef()
Calculates the Pearson correlation matrix directly on NumPy arrays. It’s fast but doesn’t handle labels, you’d convert results back to a DataFrame for clarity.
Python
import numpy as np
import pandas as pd
df = pd.DataFrame({
'M': [7, 9, 5, 8, 6],
'N': [1, 2, 3, 4, 5],
'O': [10, 9, 8, 7, 6]
})
a = np.corrcoef(df.values.T)
b = pd.DataFrame(a, index=df.columns, columns=df.columns)
print(b)
Output
Using numpy.corrcoef()Explanation: M and N have a weak negative correlation, M and O show moderate positive correlation and N and O have a strong negative correlation. The matrix reflects these varying linear relationships.
Using scipy.stats.pearsonr
Computes Pearson correlation for each pair of columns individually, returning both the coefficient and a p-value for significance. Offers detailed stats but requires manual looping over pairs.
Python
import pandas as pd
from scipy.stats import pearsonr
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 4, 3, 2, 1],
'C': [2, 3, 2, 3, 2]
})
a = pd.DataFrame(index=df.columns, columns=df.columns)
for c1 in df.columns:
for c2 in df.columns:
corr, _ = pearsonr(df[c1], df[c2])
a.loc[c1, c2] = corr
b = a.astype(float)
print(b)
Output
Using scipy.stats.pearsonrExplanation: A and B have a perfect negative correlation (-1), reflecting their opposite linear trends. A and C, as well as B and C, show no correlation (0), indicating no linear relationship. The matrix clearly captures these relationships between the variables.
Related articles:
Similar Reads
Create a correlation Matrix using Python
Correlation matrix is a table that shows how different variables are related to each other. Each cell in the table displays a number i.e. correlation coefficient which tells us how strongly two variables are together. It helps in quickly spotting patterns, understand relationships and making better
2 min read
How to create a correlation heatmap in Python?
Seaborn is a powerful Python library based on Matplotlib, designed for data visualization. It provides an intuitive way to represent data using statistical graphics. One such visualization is a heatmap, which is used to display data variation through a color palette. In this article, we focus on cor
3 min read
Convert covariance matrix to correlation matrix using Python
In this article, we will be discussing the relationship between Covariance and Correlation and program our own function for calculating covariance and correlation using python. Covariance: It tells us how two quantities are related to one another say we want to calculate the covariance between x and
5 min read
Plotting Correlation Matrix using Python
Correlation means an association, It is a measure of the extent to which two variables are related. 1. Positive Correlation: When two variables increase together and decrease together. They are positively correlated. '1' is a perfect positive correlation. For example - demand and profit are positiv
3 min read
How to create a Triangle Correlation Heatmap in seaborn - Python?
Seaborn is a Python library that is based on matplotlib and is used for data visualization. It provides a medium to present data in a statistical graph format as an informative and attractive medium to impart some information. A heatmap is one of the components supported by seaborn where variation i
4 min read
Calculate Cramér's Coefficient Matrix Using Pandas
In statistics, understanding relationships between categorical variables is crucial. One such tool for measuring association between two categorical variables is Cramer's V, an extension of the chi-square test. Unlike correlation, which is used for continuous data, Cramer's V is specifically designe
4 min read
How to Calculate Autocorrelation in Python?
Correlation generally determines the relationship between two variables. Correlation is calculated between the variable and itself at previous time steps, such a correlation is called Autocorrelation. Method 1 : Using lagplot() The daily minimum temperatures dataset is used for this example. As the
3 min read
Using pandas crosstab to create a bar plot
In this article, we will discuss how to create a bar plot by using pandas crosstab in Python. First Lets us know more about the crosstab, It is a simple cross-tabulation of two or more variables. What is cross-tabulation? It is a simple cross-tabulation that help us to understand the relationship be
3 min read
How to Plot a Dataframe using Pandas
Pandas plotting is an interface to Matplotlib, that allows to generate high-quality plots directly from a DataFrame or Series. The .plot() method is the core function for plotting data in Pandas. Depending on the kind of plot we want to create, we can specify various parameters such as plot type (ki
8 min read
How to Join Pandas DataFrames using Merge?
Joining and merging DataFrames is that the core process to start  out with data analysis and machine learning tasks. It's one of the toolkits which each Data Analyst or Data Scientist should master because in most cases data comes from multiple sources and files. In this tutorial, you'll how to join
3 min read