0% found this document useful (0 votes)
19 views

ML Lab program 1& 2

The document outlines two machine learning lab exercises using the California Housing dataset. The first exercise involves creating histograms and box plots for numerical features to analyze their distributions and identify outliers. The second exercise focuses on computing and visualizing the correlation matrix and pair plots to explore relationships between features.

Uploaded by

ahanasayeed.786
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

ML Lab program 1& 2

The document outlines two machine learning lab exercises using the California Housing dataset. The first exercise involves creating histograms and box plots for numerical features to analyze their distributions and identify outliers. The second exercise focuses on computing and visualizing the correlation matrix and pair plots to explore relationships between features.

Uploaded by

ahanasayeed.786
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Machine Learning Lab

1. Develop a program to create histograms for all numerical features and analyze the distribution of
each feature. Generate box plots for all numerical features and identify any outliers. Use
California Housing dataset.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing

# Load the dataset


california_housing = fetch_california_housing()
data = pd.DataFrame(california_housing.data, columns=california_housing.feature_names)
data['MedHouseVal'] = california_housing.target # Add the target variable to the dataframe

# Plot histograms for all numerical features


data.hist(bins=30, figsize=(15, 10))
plt.suptitle('Histograms of Numerical Features')
plt.show()

# Plot box plots for all numerical features


plt.figure(figsize=(15, 10))
for i, column in enumerate(data.columns):
plt.subplot(3, 3, i+1)
sns.boxplot(y=data[column])
plt.title(column)
plt.suptitle('Box Plots of Numerical Features')
plt.tight_layout()
plt.show()

# Identify outliers using the IQR method


for column in data.columns:
Q1 = data[column].quantile(0.25)
Q3 = data[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = data[(data[column] < lower_bound) | (data[column] > upper_bound)]
print(f"Outliers in {column}: {len(outliers)}")

Output:

Dept. of CSE, SJMIT Chitradurga


Machine Learning Lab

Dept. of CSE, SJMIT Chitradurga


Machine Learning Lab

2. Develop a program to Compute the correlation matrix to understand the relationships between
pairs of features. Visualize the correlation matrix using a heatmap to know which variables have
strong positive/negative correlations. Create a pair plot to visualize pairwise relationships between
features. Use California Housing dataset.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing

# Load the dataset


california_housing = fetch_california_housing()
data = pd.DataFrame(california_housing.data, columns=california_housing.feature_names)
data['MedHouseVal'] = california_housing.target # Add the target variable to the dataframe

# Compute the correlation matrix


correlation_matrix = data.corr()
print("Correlation Matrix:")
print(correlation_matrix)

# Plot the heatmap


plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Matrix Heatmap')
plt.show()

# Create a pair plot


sns.pairplot(data)
plt.suptitle('Pair Plot of Numerical Features', y=1.02)
plt.show()

Output:

Dept. of CSE, SJMIT Chitradurga


Machine Learning Lab

Dept. of CSE, SJMIT Chitradurga


Machine Learning Lab

Dept. of CSE, SJMIT Chitradurga


Machine Learning Lab

Dept. of CSE, SJMIT Chitradurga

You might also like