Part A Assignment 10
Part A Assignment 10
Assignment 10
Data Visualization III Download the Iris flower dataset or any other dataset into a DataFrame. (eg https://archive.ics.uci.edu/ml/datasets/Iris
(https://archive.ics.uci.edu/ml/datasets/Iris) ). Scan the dataset and give the inference as:
1. How many features are there and what are their types (e.g., numeric, nominal)?
2. Create a histogram for each feature in the dataset to illustrate the feature distributions.
3. Create a boxplot for each feature in the dataset. Compare distributions and identify outliers.
import pandas as pd
col_names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width','Species']
In [6]: df.head()
Out[6]:
Sepal_Length Sepal_Width Petal_Length Petal_Width Species
Q1. How many features are there and what are their types?
column = len(list(df))
In [8]: df.info()
<class 'pandas.core.frame.DataFrame'>
In [9]: np.unique(df['Species'])
Q2. Data Visualization-Create a histogram for each feature in the dataset to illustrate the feature distributions. Plot each histogram.
The Seaborn library is built on top of Matplotlib and offers many advanced data visualization capabilities.
Though, the Seaborn library can be used to draw a variety of charts such as matrix plots, grid plots, regression plots etc.,
import matplotlib
%matplotlib inline
axes[0,0].hist(df["Sepal_Length"]);
axes[0,1].hist(df["Sepal_Width"]);
axes[1,0].hist(df["Petal_Length"]);
axes[1,1].hist(df["Petal_Width"]);
Q4. Create a boxplot for each feature in the dataset. All of the boxplots should be combined into a single plot. Compare distributions and identify outliers.
seaborn.set_style(style=None, rc=None)
Parameters
style: dict, or one of {darkgrid, whitegrid, dark, white, ticks}
A dictionary of parameters or the name of a preconfigured style.
sns.set_style("whitegrid")
ax = fig.add_subplot(111)
In [ ]: