Open In App

How to Create a Correlation Matrix using Pandas?

Last Updated : 27 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Correlation Matrix is a statistical technique used to measure the relationship between two variables. Using Pandas, you can easily generate a correlation matrix to understand how features relate whether they move together, in opposite directions, or show no clear trend. Let’s explore various effective methods to create a correlation matrix using Pandas, NumPy and SciPy.

Using DataFrame.corr()

This method computes the Pearson correlation coefficient, measuring the linear relationship between columns. Values range from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear relationship. The diagonal is always 1 because each column perfectly correlates with itself.

Python
import pandas as pd
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 4, 3, 2, 1],
    'C': [2, 3, 2, 3, 2]
})

res = df.corr()
print(res)

Output
     A    B    C
A  1.0 -1.0  0.0
B -1.0  1.0  0.0
C  0.0  0.0  1.0

Explanation: Columns A and B have a perfect negative correlation (-1) because as A increases, B decreases. Column C shows no linear correlation with others, indicated by values near 0.

Using DataFrame.corr(method='spearman') or 'kendall'

These compute rank-based correlations instead of using raw values. Spearman measures how well a monotonic relationship fits (useful for non-linear but consistent trends), while Kendall compares data orderings differently. Both work well with non-linear or ordinal data.

Python
import pandas as pd
df = pd.DataFrame({
    'X': [10, 20, 30, 40, 50],
    'Y': [50, 40, 30, 20, 10],
    'Z': [5, 7, 6, 8, 7]
})

# Spearman correlation matrix
a = df.corr(method='spearman')
print(a)

# Kendall correlation matrix
b = df.corr(method='kendall')
print(b)  

Output

Output
Using DataFrame.corr(method='spearman') or 'kendall'

Explanation: X and Y have a perfect negative correlation (-1). Z shows moderate positive correlation with X and moderate negative with Y, reflecting consistent but not perfect monotonic relationships in both Spearman and Kendall matrices.

Using numpy.corrcoef()

Calculates the Pearson correlation matrix directly on NumPy arrays. It’s fast but doesn’t handle labels, you’d convert results back to a DataFrame for clarity.

Python
import numpy as np
import pandas as pd
df = pd.DataFrame({
    'M': [7, 9, 5, 8, 6],
    'N': [1, 2, 3, 4, 5],
    'O': [10, 9, 8, 7, 6]
})

a = np.corrcoef(df.values.T)
b = pd.DataFrame(a, index=df.columns, columns=df.columns)
print(b)

Output

Output
Using numpy.corrcoef()

Explanation: M and N have a weak negative correlation, M and O show moderate positive correlation and N and O have a strong negative correlation. The matrix reflects these varying linear relationships.

Using scipy.stats.pearsonr

Computes Pearson correlation for each pair of columns individually, returning both the coefficient and a p-value for significance. Offers detailed stats but requires manual looping over pairs.

Python
import pandas as pd
from scipy.stats import pearsonr
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 4, 3, 2, 1],
    'C': [2, 3, 2, 3, 2]
})

a = pd.DataFrame(index=df.columns, columns=df.columns)
for c1 in df.columns:
    for c2 in df.columns:
        corr, _ = pearsonr(df[c1], df[c2])
        a.loc[c1, c2] = corr

b = a.astype(float)
print(b)

Output

Output
Using scipy.stats.pearsonr

Explanation: A and B have a perfect negative correlation (-1), reflecting their opposite linear trends. A and C, as well as B and C, show no correlation (0), indicating no linear relationship. The matrix clearly captures these relationships between the variables.

Related articles:


Next Article
Practice Tags :

Similar Reads