0% found this document useful (0 votes)
18 views4 pages

Varad Aiml 3.3

Uploaded by

VARAD NIKAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views4 pages

Varad Aiml 3.3

Uploaded by

VARAD NIKAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

DEPARTMENT OF

COMPUTER SCIENCE & ENGINEERING

Experiment 3.3
Student Name: Varad Nikam UID: 21BCS9038
Branch: CSE Section/Group: 648/A
th
Semester: 5 Date of Performance:
Subject: AIML Subject Code: 21CSH-316
_________________________________________________________________
Aim: implement Exploratory Data Analysis on any data set.

Objective: To learn About Different data exploratory analysis Techniques.


Dataset: Iris Dataset: This dataset is commonly used for practicing classification algorithms,
and it includes measurements of various features for three species of iris flowers (setosa,
versicolor, and virginica). The goal in the example was to demonstrate how the KNN algorithm
can be applied to classify instances into different classes based on their feature values.
Source: Fisher, R.A. "The use of multiple measurements in taxonomic problems" (1936).
Description: The Iris dataset consists of 150 samples of iris flowers from three different species
(setosa, versicolor, and virginica). Each sample includes measurements of sepal length, sepal
width, petal length, and petal width.
Use Case: Commonly used for practicing classification algorithms.
EDA: Exploratory Data Analysis (EDA) is applied to investigate the data and summarize the key
insights. It will give the basic understanding of data, it‟s distribution, null values and much
more.

Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris

iris = load_iris()
data = pd.DataFrame(data=np.c_[iris['data'], iris['target']],
columns=iris['feature_names'] + ['target'])

print("\033[1mDATA INFO\033[0m")
print(data.info())

print("\033[1mDATA SUMMARY\033[0m")
print(data.describe())
DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING

print("\033[1mDistribution of categorical variables\033[0m")

print(data['target'].value_counts())
print("\033[1mMISSING DATA\033[0m")
print(data.isnull().sum())

print("\033[1mHISTOGRAM\033[0m")
plt.figure(figsize=(12, 8))
sns.histplot(data['sepal length (cm)'], bins=20, kde=True)
plt.title('Sepal Length Distribution')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Count')
plt.show()
plt.figure(figsize=(8, 6))
sns.countplot(data=data, x='target')
plt.title('Distribution of Iris Species')
plt.xlabel('Species')
plt.ylabel('Count')
plt.show()

# Plot correlation heatmap


plt.figure(figsize=(10, 8))
sns.heatmap(data.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

Output :
DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING
DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING

Learning Outcomes: Understood various plots and graphs.


Implemented various data analysis methods on provided dataset.
Learnt about different data analysis tools.

You might also like