How to Create a Correlation Matrix using Pandas?
Last Updated :
27 May, 2025
Correlation Matrix is a statistical technique used to measure the relationship between two variables. Using Pandas, you can easily generate a correlation matrix to understand how features relate whether they move together, in opposite directions, or show no clear trend. Let’s explore various effective methods to create a correlation matrix using Pandas, NumPy and SciPy.
Using DataFrame.corr()
This method computes the Pearson correlation coefficient, measuring the linear relationship between columns. Values range from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear relationship. The diagonal is always 1 because each column perfectly correlates with itself.
Python
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 4, 3, 2, 1],
'C': [2, 3, 2, 3, 2]
})
res = df.corr()
print(res)
Output A B C
A 1.0 -1.0 0.0
B -1.0 1.0 0.0
C 0.0 0.0 1.0
Explanation: Columns A and B have a perfect negative correlation (-1) because as A increases, B decreases. Column C shows no linear correlation with others, indicated by values near 0.
Using DataFrame.corr(method='spearman') or 'kendall'
These compute rank-based correlations instead of using raw values. Spearman measures how well a monotonic relationship fits (useful for non-linear but consistent trends), while Kendall compares data orderings differently. Both work well with non-linear or ordinal data.
Python
import pandas as pd
df = pd.DataFrame({
'X': [10, 20, 30, 40, 50],
'Y': [50, 40, 30, 20, 10],
'Z': [5, 7, 6, 8, 7]
})
# Spearman correlation matrix
a = df.corr(method='spearman')
print(a)
# Kendall correlation matrix
b = df.corr(method='kendall')
print(b)
Output
Using DataFrame.corr(method='spearman') or 'kendall'Explanation: X and Y have a perfect negative correlation (-1). Z shows moderate positive correlation with X and moderate negative with Y, reflecting consistent but not perfect monotonic relationships in both Spearman and Kendall matrices.
Using numpy.corrcoef()
Calculates the Pearson correlation matrix directly on NumPy arrays. It’s fast but doesn’t handle labels, you’d convert results back to a DataFrame for clarity.
Python
import numpy as np
import pandas as pd
df = pd.DataFrame({
'M': [7, 9, 5, 8, 6],
'N': [1, 2, 3, 4, 5],
'O': [10, 9, 8, 7, 6]
})
a = np.corrcoef(df.values.T)
b = pd.DataFrame(a, index=df.columns, columns=df.columns)
print(b)
Output
Using numpy.corrcoef()Explanation: M and N have a weak negative correlation, M and O show moderate positive correlation and N and O have a strong negative correlation. The matrix reflects these varying linear relationships.
Using scipy.stats.pearsonr
Computes Pearson correlation for each pair of columns individually, returning both the coefficient and a p-value for significance. Offers detailed stats but requires manual looping over pairs.
Python
import pandas as pd
from scipy.stats import pearsonr
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 4, 3, 2, 1],
'C': [2, 3, 2, 3, 2]
})
a = pd.DataFrame(index=df.columns, columns=df.columns)
for c1 in df.columns:
for c2 in df.columns:
corr, _ = pearsonr(df[c1], df[c2])
a.loc[c1, c2] = corr
b = a.astype(float)
print(b)
Output
Using scipy.stats.pearsonrExplanation: A and B have a perfect negative correlation (-1), reflecting their opposite linear trends. A and C, as well as B and C, show no correlation (0), indicating no linear relationship. The matrix clearly captures these relationships between the variables.
Related articles:
Similar Reads
Create a correlation Matrix using Python Correlation matrix is a table that shows how different variables are related to each other. Each cell in the table displays a number i.e. correlation coefficient which tells us how strongly two variables are together. It helps in quickly spotting patterns, understand relationships and making better
2 min read
How to create a correlation heatmap in Python? Seaborn is a powerful Python library based on Matplotlib, designed for data visualization. It provides an intuitive way to represent data using statistical graphics. One such visualization is a heatmap, which is used to display data variation through a color palette. In this article, we focus on cor
3 min read
Convert covariance matrix to correlation matrix using Python In this article, we will be discussing the relationship between Covariance and Correlation and program our own function for calculating covariance and correlation using python. Covariance: It tells us how two quantities are related to one another say we want to calculate the covariance between x and
5 min read
Plotting Correlation Matrix using Python Correlation means an association, It is a measure of the extent to which two variables are related. 1. Positive Correlation: When two variables increase together and decrease together. They are positively correlated. '1' is a perfect positive correlation. For example - demand and profit are positiv
3 min read
How to create a Triangle Correlation Heatmap in seaborn - Python? Seaborn is a Python library that is based on matplotlib and is used for data visualization. It provides a medium to present data in a statistical graph format as an informative and attractive medium to impart some information. A heatmap is one of the components supported by seaborn where variation i
4 min read
Calculate Cramér's Coefficient Matrix Using Pandas In statistics, understanding relationships between categorical variables is crucial. One such tool for measuring association between two categorical variables is Cramer's V, an extension of the chi-square test. Unlike correlation, which is used for continuous data, Cramer's V is specifically designe
4 min read