PR 10
PR 10
10
Data Visualization III
Download the Iris flower dataset or any other dataset into a DataFrame. (e.g.,
https://archive.ics.uci.edu/ml/datasets/Iris ). Scan the dataset and give the inference as:
1. List down the features and their types (e.g., numeric, nominal) available in the dataset.
2. Create a histogram for each feature in the dataset to illustrate the feature distributions.
3. Create a boxplot for each feature in the dataset.
4. Compare distributions and identify outliers.
Step 1:
import pandas as pd
import numpy as np
import random
Step 2:
df=pd.read_csv("iris.csv")
df.describe()
Step 3:
df.info()
Step 4:
df.head()
Step 5:
df.tail()
Step 6:
df.isnull().sum()
Step 7:
a=df.groupby(['sepal.length','sepal.width','petal.length','petal.width']).count()
a
Step 8:
b=df.groupby(['variety']).count()
b
Step 9:
v=pd.Categorical(['sepal.length','sepal.width','petal.length','petal.width'],categories
=['Setosa','Versicolor','Virginica'],ordered=False)
v
Step 10:
s1=df.groupby(['variety','sepal.length']).count()
s1
Step 11:
import matplotlib.pyplot as m
m.hist(['s1'])
Step 12:
s2=df.groupby(['variety','sepal.width']).count()
s2
Step 13:
m.hist(['s2'])
Step 14:
s3=df.groupby(['variety','petal.length']).count()
s3
Step 15:
m.hist(['s3'])
Step 16:
s4=df.groupby(['variety','petal.width']).count()
s4
Step 17:
m.hist(['s4'])
Step 18:
m.hist(['s1','s2','s3','s4'])
Step 19:
m.hist(data=df,x='sepal.length')
Step 20:
m.hist(data=df,x='sepal.width')
Step 21:
m.hist(data=df,x='petal.length')
Step 22:
m.hist(data=df,x='petal.width')
Step 23:
m.boxplot(df['sepal.length'],vert=False)
Step 24:
m.boxplot(df['sepal.width'],vert=False)
Step 25:
m.boxplot(df['petal.length'],vert=False)
Step 26:
m.boxplot(df['petal.width'],vert=False)
Step 27:
import seaborn as sns
sns.boxplot(x='variety',y='sepal.length',data=df)
Step 28:
sns.boxplot(x='variety',y='sepal.width',data=df)
Step 29:
sns.boxplot(x='variety',y='petal.width',data=df)
Step 30:
sns.boxplot(x='variety',y='petal.width',data=df)