ML Lab Manual PRGM 2&3
ML Lab Manual PRGM 2&3
2. Develop a program to Load a dataset with at least two numerical columns (e.g., Iris, Titanic). Plot a
scatter plot of two variables and calculate their Pearson correlation coefficient. Write a program to
compute the covariance and correlation matrix for a dataset. Visualize the correlation matrix using a
heatmap to know which variables have strong positive/negative correlations.
Solution:
• Pearson correlation coefficient
The Pearson correlation coefficient (r) measures the linear relationship between two variables. It tells us
how strongly and in what direction two variables are related.
• Covariance
Covariance measures the direction of the relationship between two variables. It determines whether two
variables move together (positive covariance) or in opposite directions (negative covariance).
• Correlation Matrix
A correlation matrix is a table showing the Pearson correlation coefficients between multiple variables
in a dataset. It helps in understanding the relationships among different numerical variable.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Select two numerical columns for scatter plot and correlation calculations
x_col = 'sepal_length'
y_col = 'petal_length'
Covariance Matrix:
sepal_length sepal_width petal_length petal_width
sepal_length 0.685694 -0.042434 1.274315 0.516271
sepal_width -0.042434 0.189979 -0.329656 -0.121639
petal_length 1.274315 -0.329656 3.116278 1.295609
petal_width 0.516271 -0.121639 1.295609 0.581006
Correlation Matrix:
sepal_length sepal_width petal_length petal_width
sepal_length 1.000000 -0.117570 0.871754 0.817941
sepal_width -0.117570 1.000000 -0.428440 -0.366126
petal_length 0.871754 -0.428440 1.000000 0.962865
petal_width 0.817941 -0.366126 0.962865 1.000000
PROGRAM 3
3. Develop a program to implement Principal Component Analysis (PCA) for reducing the dimensionality
of the Iris dataset from 4 features to 2.
Solution:
• Principal Component Analysis(Algorithm)
Principal Component Analysis (PCA) is a dimensionality reduction technique used to transform a high-
dimensional dataset into a lower-dimensional space while preserving the most important information
(variance). It helps in simplifying data visualization and improving computational efficiency in machine
learning models.
Steps in PCA Algorithm:
Step 1: Standardization of Data
Since PCA is affected by scale, we normalize or standardize the dataset by subtracting the mean and
dividing by the standard deviation:
Step 2: Compute the Covariance Matrix
To understand how variables relate to each other, we compute the covariance matrix: