0% found this document useful (0 votes)
2 views4 pages

Exp 2 A

The document outlines a program to compute and visualize the correlation matrix of the California Housing dataset, aiming to understand relationships between features. It describes the correlation matrix, its applications, and the formula for calculating correlation coefficients. The program includes generating a heatmap and a pair plot to represent the correlations visually.

Uploaded by

Shobha Hiremath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

Exp 2 A

The document outlines a program to compute and visualize the correlation matrix of the California Housing dataset, aiming to understand relationships between features. It describes the correlation matrix, its applications, and the formula for calculating correlation coefficients. The program includes generating a heatmap and a pair plot to represent the correlations visually.

Uploaded by

Shobha Hiremath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Term work 2: Develop a program to Compute the correlation matrix to understand the

relationships between pairs of features. Visualize the correlation matrix using a


heatmap to know which variables have strong positive/negative correlations. Create a
pair plot to visualize pairwise relationships between features. Use California Housing
dataset
Objective :

i. To find the correlation between the attributes in the dataset


ii. Represent the correlation obtained between each pair of attributes/features
iii. Represent the correlation matrix using the heatmap

Output:

The correlation for each of the attributes mentioned in the California dataset calculated

The representation of the correlation between all attributes by heat map

The graphs to be plotted to represent correlation between each of the attributes

Description:
i. Introduction to correlation matrix
A correlation matrix is a table displaying correlation coefficients that measure the strength
and direction of relationships between variables.
The matrix shows how all the possible pairs of values in a table are related to each other.
It is a powerful tool for summarizing a large data set and finding and showing patterns in the
data. It is often shown as a table, with each variable listed in both the rows and the columns
and the correlation coefficient between each pair of variables written in each cell.
The correlation coefficient ranges from -1 to +1, where -1 means a perfect negative
correlation, +1 means a perfect positive correlation, and 0 means there is zero
correlation between the variables

ii. Application of correlation matrix


A correlation matrix is a valuable tool for gaining insights into your dataset
For example, if you’re trying to predict the price of a car based on factors like fuel type,
transmission, or age, the correlation matrix helps you understand the relationships between
these variables.
 A value of 1 indicates a strong positive relationship between two variables.
 A value of 0 suggests no relationship between them.
 A value of -1 signals a strong negative or inverse relationship.
By using a correlation matrix, it can be easily analyze and visualize the connections in the
data.
This makes it an essential step for data scientists before building machine learning models.
Understanding which variables are correlated helps you identify the most influential factors
for your model.
iii. Calculation of correlation
The correlation between two attributes/features can be calculated as

r = (nΣXY – ΣXΣY) / sqrt((nΣX^2 – (ΣX)^2)(nΣY^2 – (ΣY)^2))


Where:
 r = correlation coefficient
 n = number of observations
 ΣXY = sum of the product of each pair of corresponding observations of the two
variables
 ΣX = sum of the observations of the first variable
 ΣY = sum of the observations of the second variable
 ΣX^2 = sum of the squares of the observations of the first variable
 ΣY^2 = sum of the squares of the observations of the second variable

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing

# Fetch the California Housing dataset


california_housing = fetch_california_housing()
data = pd.DataFrame(california_housing.data, columns=california_housing.feature_names)

# Compute the correlation matrix


correlation_matrix = data.corr()

# Visualize the correlation matrix using a heatmap


plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Matrix Heatmap')
plt.show()

# Create a pair plot to visualize pairwise relationships between features


sns.pairplot(data)
plt.suptitle('Pair Plot of California Housing Features', y=1.02)
plt.show()

Output of program
Note: Above figure to be drawn on blank plane page of journal in output section

You might also like