DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING
Experiment 3.3
Student Name: Varad Nikam UID: 21BCS9038
Branch: CSE Section/Group: 648/A
th
Semester: 5 Date of Performance:
Subject: AIML Subject Code: 21CSH-316
_________________________________________________________________
Aim: implement Exploratory Data Analysis on any data set.
Objective: To learn About Different data exploratory analysis Techniques.
Dataset: Iris Dataset: This dataset is commonly used for practicing classification algorithms,
and it includes measurements of various features for three species of iris flowers (setosa,
versicolor, and virginica). The goal in the example was to demonstrate how the KNN algorithm
can be applied to classify instances into different classes based on their feature values.
Source: Fisher, R.A. "The use of multiple measurements in taxonomic problems" (1936).
Description: The Iris dataset consists of 150 samples of iris flowers from three different species
(setosa, versicolor, and virginica). Each sample includes measurements of sepal length, sepal
width, petal length, and petal width.
Use Case: Commonly used for practicing classification algorithms.
EDA: Exploratory Data Analysis (EDA) is applied to investigate the data and summarize the key
insights. It will give the basic understanding of data, it‟s distribution, null values and much
more.
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
iris = load_iris()
data = pd.DataFrame(data=np.c_[iris['data'], iris['target']],
columns=iris['feature_names'] + ['target'])
print("\033[1mDATA INFO\033[0m")
print(data.info())
print("\033[1mDATA SUMMARY\033[0m")
print(data.describe())
DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING
print("\033[1mDistribution of categorical variables\033[0m")
print(data['target'].value_counts())
print("\033[1mMISSING DATA\033[0m")
print(data.isnull().sum())
print("\033[1mHISTOGRAM\033[0m")
plt.figure(figsize=(12, 8))
sns.histplot(data['sepal length (cm)'], bins=20, kde=True)
plt.title('Sepal Length Distribution')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Count')
plt.show()
plt.figure(figsize=(8, 6))
sns.countplot(data=data, x='target')
plt.title('Distribution of Iris Species')
plt.xlabel('Species')
plt.ylabel('Count')
plt.show()
# Plot correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(data.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()
Output :
DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING
DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING
Learning Outcomes: Understood various plots and graphs.
Implemented various data analysis methods on provided dataset.
Learnt about different data analysis tools.