How to Create a Correlation Matrix using Pandas?

Last Updated : 27 May, 2025

Correlation Matrix is a statistical technique used to measure the relationship between two variables. Using Pandas, you can easily generate a correlation matrix to understand how features relate whether they move together, in opposite directions, or show no clear trend. Let’s explore various effective methods to create a correlation matrix using Pandas, NumPy and SciPy.

Using DataFrame.corr()

This method computes the Pearson correlation coefficient, measuring the linear relationship between columns. Values range from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear relationship. The diagonal is always 1 because each column perfectly correlates with itself.

Python

import pandas as pd
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 4, 3, 2, 1],
    'C': [2, 3, 2, 3, 2]
})

res = df.corr()
print(res)

Output

     A    B    C
A  1.0 -1.0  0.0
B -1.0  1.0  0.0
C  0.0  0.0  1.0

Explanation: Columns A and B have a perfect negative correlation (-1) because as A increases, B decreases. Column C shows no linear correlation with others, indicated by values near 0.

Using DataFrame.corr(method='spearman') or 'kendall'

These compute rank-based correlations instead of using raw values. Spearman measures how well a monotonic relationship fits (useful for non-linear but consistent trends), while Kendall compares data orderings differently. Both work well with non-linear or ordinal data.

Python

import pandas as pd
df = pd.DataFrame({
    'X': [10, 20, 30, 40, 50],
    'Y': [50, 40, 30, 20, 10],
    'Z': [5, 7, 6, 8, 7]
})

# Spearman correlation matrix
a = df.corr(method='spearman')
print(a)

# Kendall correlation matrix
b = df.corr(method='kendall')
print(b)

Output

Explanation: X and Y have a perfect negative correlation (-1). Z shows moderate positive correlation with X and moderate negative with Y, reflecting consistent but not perfect monotonic relationships in both Spearman and Kendall matrices.

Using numpy.corrcoef()

Calculates the Pearson correlation matrix directly on NumPy arrays. It’s fast but doesn’t handle labels, you’d convert results back to a DataFrame for clarity.

Python

import numpy as np
import pandas as pd
df = pd.DataFrame({
    'M': [7, 9, 5, 8, 6],
    'N': [1, 2, 3, 4, 5],
    'O': [10, 9, 8, 7, 6]
})

a = np.corrcoef(df.values.T)
b = pd.DataFrame(a, index=df.columns, columns=df.columns)
print(b)

Output

Explanation: M and N have a weak negative correlation, M and O show moderate positive correlation and N and O have a strong negative correlation. The matrix reflects these varying linear relationships.

Using scipy.stats.pearsonr

Computes Pearson correlation for each pair of columns individually, returning both the coefficient and a p-value for significance. Offers detailed stats but requires manual looping over pairs.

Python

import pandas as pd
from scipy.stats import pearsonr
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 4, 3, 2, 1],
    'C': [2, 3, 2, 3, 2]
})

a = pd.DataFrame(index=df.columns, columns=df.columns)
for c1 in df.columns:
    for c2 in df.columns:
        corr, _ = pearsonr(df[c1], df[c2])
        a.loc[c1, c2] = corr

b = a.astype(float)
print(b)

Output

Explanation: A and B have a perfect negative correlation (-1), reflecting their opposite linear trends. A and C, as well as B and C, show no correlation (0), indicating no linear relationship. The matrix clearly captures these relationships between the variables.

Related articles:

dataframe.corr()
numpy
SciPy

Create a correlation Matrix using Python

romy421kumari

Improve

Article Tags :

Practice Tags :

python

How to Create a Correlation Matrix using Pandas?

Using DataFrame.corr()

Using DataFrame.corr(method='spearman') or 'kendall'

Using numpy.corrcoef()

Using scipy.stats.pearsonr

Similar Reads

Thank You!

What kind of Experience do you want to share?