I2IT DataVisualizationI - JupyterLab
I2IT DataVisualizationI - JupyterLab
*Data Visualization I*
*1. Use the inbuilt dataset 'titanic'. The dataset contains 891 rows and contains information about the
passengers who boarded the unfortunate Titanic ship. Use the Seaborn library to see if we can find
any patterns in the data.*
*2. Write a code to check how the price of the ticket (column name: 'fare') for each passenger is
distributed by plotting a histogram.*
In [3]: df = sns.load_dataset('titanic')
In [4]: df.notnull()
Out[4]: survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_t
0 True True True True True True True True True True True False
1 True True True True True True True True True True True True
2 True True True True True True True True True True True False
3 True True True True True True True True True True True True
4 True True True True True True True True True True True False
... ... ... ... ... ... ... ... ... ... ... ... ...
886 True True True True True True True True True True True False
887 True True True True True True True True True True True True
888 True True True False True True True True True True True False
889 True True True True True True True True True True True True
890 True True True True True True True True True True True False
In [5]: df.head(3)
Out[5]: survived pclass sex age sibsp parch fare embarked class who adult_male deck emb
In [6]: df.columns
In [7]: df['pclass'].unique()
In [8]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 survived 891 non-null int64
1 pclass 891 non-null int64
2 sex 891 non-null object
3 age 714 non-null float64
4 sibsp 891 non-null int64
5 parch 891 non-null int64
6 fare 891 non-null float64
7 embarked 889 non-null object
8 class 891 non-null category
9 who 891 non-null object
10 adult_male 891 non-null bool
11 deck 203 non-null category
12 embark_town 889 non-null object
13 alive 891 non-null object
14 alone 891 non-null bool
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.7+ KB
Inference: Revealing age distributions based on both gender and embarkation towns, the majority of
passengers originate from Southampton Town, primarily consisting of males. Following closely is the town of
Cherbourg, which ranks as the second-highest in passenger numbers, succeeded by Queenstown.
In [12]: sns.displot(df, x = 'age', hue = 'sex', col = 'embark_town')
Inference: This bar plot depicts the average age for each passenger class. The analysis indicates a clear age
hierarchy, with the average age highest for Class 1 passengers (38.2 years), followed by Class 2 passengers
(29.8 years), and Class 3 passengers having the lowest average age (25.1 years).
In [14]: sns.barplot(data = df, x = 'pclass', y = 'age')
Inference: In all passenger classes, the bar plot with age on the y-axis, differentiated by class and further by
gender, reveals a consistent pattern. The average age of male passengers is higher than that of their female
counterparts within each class.
In [17]: sns.countplot(x='pclass',data=df,hue='survived')
In [20]: sns.countplot(x='pclass',data=df,hue='sex')
In [21]: sns.countplot(x='pclass',data=df,hue='embark_town')
In [24]: sns.catplot(x='embark_town',kind='count',data=df,hue='survived')
In [26]: sns.catplot(x='embark_town',kind='count',data=df,hue='survived',col='pclass')
Iris Dataset
In [27]: df = sns.load_dataset('iris')
In [28]: df.head(3)
In [29]: df.columns
Inference: The displot allows for a visual comparison of sepal length distributions among the three species of
Iris: Setosa, Versicolor, and Virginica. The species 'Setosa' often has the smallest sepal length among the three
Iris species ('Setosa', 'Versicolor', and 'Virginica')
Inference: The displot visually compares the sepal width distributions for the three Iris species: Setosa,
Versicolor, and Virginica. The species 'Setosa' has broader sepal width compared to the other two species,
'Versicolor' and 'Virginica,' which tend to have relatively similar sepal width values
Inference: The displot with 'petal_width' on the x-axis, organized into separate columns for each species in the
Iris dataset, provides insights into the distribution of petal widths among different species. Setosa has the
smallest petal width while Virginica has highest petal width among the three Iris species (Setosa, Versicolor,
and Virginica).
In [40]: sns.histplot(x='sepal_length',data=df)
Inference: The sepal width histogram demonstrates a roughly normal distribution with some variability. The
predominant sepal width is approximately 3.0 cm, showcasing a spread across the range of 2.0 cm to 4.5 cm.
Despite this variation, there's a subtle skew towards higher sepal widths, evident from the slightly elongated
right tail of the distribution.
In [41]: sns.histplot(x='sepal_width',data=df)
Out[41]: <Axes: xlabel='sepal_width', ylabel='Count'>
Inference: The petal length histogram highlights distinct patterns among various iris species. Setosa flowers
predominantly feature shorter petal lengths, clustering around 1-2 cm. Versicolor flowers display moderate
petal lengths, primarily falling within the range of 3-5 cm. In contrast, Virginica flowers tend to have the
longest petals, spanning lengths from 4.5 cm to 7.0 cm. These clear distinctions emphasize the characteristic
petal length differences between the iris species.
In [42]: sns.histplot(x='petal_length',data=df)
In [43]: sns.histplot(x='petal_width',data=df)