0% found this document useful (0 votes)
48 views9 pages

Ass 10 DSBDL

The document describes an experiment to visualize features of the Iris flower dataset using histograms and box plots. It includes: 1. Loading and examining the Iris dataset, which contains 4 numeric features and 1 categorical class feature for 150 samples. 2. Creating histograms of each numeric feature to illustrate their distributions. 3. Generating box plots of the numeric features to display their distributions and identify outliers. The document provides code samples to load the dataset, select specific features, generate histograms and box plots, and conclude with observations about visualizing the Iris data.

Uploaded by

Anvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views9 pages

Ass 10 DSBDL

The document describes an experiment to visualize features of the Iris flower dataset using histograms and box plots. It includes: 1. Loading and examining the Iris dataset, which contains 4 numeric features and 1 categorical class feature for 150 samples. 2. Creating histograms of each numeric feature to illustrate their distributions. 3. Generating box plots of the numeric features to display their distributions and identify outliers. The document provides code samples to load the dataset, select specific features, generate histograms and box plots, and conclude with observations about visualizing the Iris data.

Uploaded by

Anvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Experiment No.

10
Aim: Data Visualization III
Download the Iris flower dataset or any other dataset into a DataFrame.
(e.g., https://archive.ics.uci.edu/ml/datasets/Iris ). Scan the dataset and give the inference
as:
1. List down the features and their types (e.g., numeric, nominal) available in the
dataset.
2. Create a histogram for each feature in the dataset to illustrate the feature
distributions.
3. Create a box plot for each feature in the dataset.
4. Compare distributions and identify outliers.

Introduction:

Iris dataset is the Hello World for the Data Science, so if you have started your career in Data
Science and Machine Learning you will be practicing basic ML algorithms on this famous
dataset. Iris dataset contains five columns such as Petal Length, Petal Width, Sepal Length, Sepal
Width and Species Type.
Iris is a flowering plant, the researchers have measured various features of the different iris
flowers and recorded digitally.

Code: Reading the dataset “Iris.csv”.


1. List down the features and their types (e.g., numeric, nominal) available in the
dataset.

Code: Displaying the number of features and names of the columns.


The column() function prints all the columns of the dataset in a list form.
data.columns

Code: Displaying only specific columns.


In any dataset, it is sometimes needed to work upon only specific features or columns,
so we can do this by the following code.
specific_data=data[["Id","Species"]]
#data[["column_name1","column_name2","column_name3"]]

#now we will print the first 10 columns of the specific_data


dataframe.
print(specific_data.head(10))

Histograms

Histograms allow seeing the distribution of data for various columns. It can be used for
uni as well as bi-variate analysis.

Histograms with Distplot Plot

Distplot is used basically for the univariant set of observations and visualizes it through
a histogram i.e. only one observation and hence we choose one particular column of the
dataset.

The Box Plot:

The box plot is used to display the distribution of the categorical data in the form of quartiles.
The center of the box shows the median value. The value from the lower whisker to the bottom
of the box shows the first quartile. From the bottom of the box to the middle of the box lies the
second quartile. From the middle of the box to the top of the box lies the third quartile and finally
from the top of the box to the top whisker lies the last quartile.
Attribute Information about data set:
Attribute Information:
-> sepal length in cm
-> sepal width in cm
-> petal length in cm
-> petal width in cm
-> class:
Iris Setosa
Iris Versicolour
Iris Virginica

Number of Instances: 150

Summary Statistics:
Min Max Mean SD Class Correlation
sepal length: 4.3 7.9 5.84 0.83 0.7826
sepal width: 2.0 4.4 3.05 0.43 -0.4194
petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)

Class Distribution: 33.3% for each of 3 classes.


2. Create a histogram for each feature in the dataset to illustrate the feature
distributions.

Code #1: Histogram for Sepal Length

plt.figure(figsize = (10, 7))


x = data["SepalLengthCm"]

plt.hist(x, bins = 20, color = "green")

plt.title("Sepal Length in cm")

plt.xlabel("Sepal_Length_cm")

plt.ylabel("Count")

Code #2: Histogram for Sepal Width

plt.figure(figsize = (10, 7))


x = data.SepalWidthCm

plt.hist(x, bins = 20, color = "green")

plt.title("Sepal Width in cm")

plt.xlabel("Sepal_Width_cm")

plt.ylabel("Count")

plt.show()

Code #3: Histogram for Petal Length


plt.figure(figsize = (10, 7))

x = data.PetalLengthCm

plt.hist(x, bins = 20, color = "green")

plt.title("Petal Length in cm")

plt.xlabel("Petal_Length_cm")

plt.ylabel("Count")

plt.show()

Code #4: Histogram for Petal Width


plt.figure(figsize = (10, 7))

x = data.PetalWidthCm

plt.hist(x, bins = 20, color = "green")

plt.title("Petal Width in cm")

plt.xlabel("Petal_Width_cm")

plt.ylabel("Count")

plt.show()
3. Create a box plot for each feature in the dataset.

Code #5: Data preparation for Box Plot

# removing Id column

new_data = data[["SepalLengthCm", "SepalWidthCm", "PetalLengthCm",


"PetalWidthCm"]]

print(new_data.head())

Code #6: Box Plot for Iris Data

plt.figure(figsize = (10, 7))

new_data.boxplot()
Conclusion: Thus we have studied data visualization on Iris data set with
histogram and boxplot.

You might also like