0% found this document useful (0 votes)
2 views

ds9

The document contains Python code for analyzing the Titanic dataset using the Seaborn and Matplotlib libraries. It includes data loading, exploration, visualization of survival rates by passenger class and gender, and analysis of missing values and correlations among numeric variables. Various plots such as histograms, count plots, and heatmaps are generated to illustrate the findings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ds9

The document contains Python code for analyzing the Titanic dataset using the Seaborn and Matplotlib libraries. It includes data loading, exploration, visualization of survival rates by passenger class and gender, and analysis of missing values and correlations among numeric variables. Various plots such as histograms, count plots, and heatmaps are generated to illustrate the findings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

aqz5xktfk

April 22, 2025

[19]: import seaborn as sns


import matplotlib.pyplot as plt
import pandas as pd

[15]: titanic = sns.load_dataset('titanic')


titanic.head()

[15]: survived pclass sex age sibsp parch fare embarked class \
0 0 3 male 22.0 1 0 7.2500 S Third
1 1 1 female 38.0 1 0 71.2833 C First
2 1 3 female 26.0 0 0 7.9250 S Third
3 1 1 female 35.0 1 0 53.1000 S First
4 0 3 male 35.0 0 0 8.0500 S Third

who adult_male deck embark_town alive alone


0 man True NaN Southampton no False
1 woman False C Cherbourg yes False
2 woman False NaN Southampton yes True
3 woman False C Southampton yes False
4 man True NaN Southampton no True

[23]: titanic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 survived 891 non-null int64
1 pclass 891 non-null int64
2 sex 891 non-null object
3 age 714 non-null float64
4 sibsp 891 non-null int64
5 parch 891 non-null int64
6 fare 891 non-null float64
7 embarked 889 non-null object
8 class 891 non-null category
9 who 891 non-null object

1
10 adult_male 891 non-null bool
11 deck 203 non-null category
12 embark_town 889 non-null object
13 alive 891 non-null object
14 alone 891 non-null bool
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.7+ KB

[25]: titanic.describe()

[25]: survived pclass age sibsp parch fare


count 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000
mean 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208
std 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429
min 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000
25% 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400
50% 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200
75% 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000
max 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

[27]: plt.figure(figsize=(10, 6))


sns.histplot(data=titanic, x='fare', bins=30, kde=True)
plt.title('Distribution of Fare Prices on the Titanic')
plt.xlabel('Fare')
plt.ylabel('Number of Passengers')
plt.grid(True)
plt.show()

2
[33]: titanic['survived'] = titanic['survived'].map({0: 'No', 1: 'Yes'})
plt.figure(figsize=(8, 6))
sns.countplot(data=titanic, x='pclass', hue='survived')
plt.title('Survival Count by Passenger Class')
plt.xlabel('Passenger Class')
plt.ylabel('Count')
plt.legend(title='Survived')
plt.show()

[35]: plt.figure(figsize=(8, 6))


sns.countplot(data=titanic, x='sex', hue='survived')
plt.title('Survival Count by Gender')
plt.xlabel('Gender')
plt.ylabel('Count')
plt.legend(title='Survived', labels=['No', 'Yes'])
plt.show()

3
[37]: plt.figure(figsize=(10, 6))
sns.histplot(data=titanic, x='age', hue='survived', bins=30, kde=True,␣
↪multiple='stack')

plt.title('Age Distribution by Survival')


plt.xlabel('Age')
plt.ylabel('Number of Passengers')
plt.legend(title='Survived', labels=['No', 'Yes'])
plt.show()

4
[39]: plt.figure(figsize=(12, 6))
sns.heatmap(titanic.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Values in Titanic Dataset')
plt.show()

5
[43]: titanic['survived'] = titanic['survived'].map({'No': 0, 'Yes': 1})

numeric_data = titanic.select_dtypes(include='number')

# Plot correlation heatmap


plt.figure(figsize=(10, 8))
sns.heatmap(numeric_data.corr(), annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap of Numeric Variables')
plt.show()

/home/admin1/anaconda3/lib/python3.9/site-packages/seaborn/matrix.py:260:
FutureWarning: Format strings passed to MaskedConstant are ignored, but in
future may error or produce different behavior
annotation = ("{:" + self.fmt + "}").format(val)

6
[50]: plt.scatter(titanic["age"],titanic["fare"])

[50]: <matplotlib.collections.PathCollection at 0x7f969f1448b0>

[56]: numeric_cols = titanic.select_dtypes(include='number')

numeric_clean = numeric_cols.dropna()

sns.pairplot(numeric_clean)
plt.suptitle('Pairwise Relationships in Titanic Numeric Data', y=1.02)
plt.show()

7
[60]: sns.pairplot(titanic,hue="sex")

[60]: <seaborn.axisgrid.PairGrid at 0x7f969c9625e0>

8
[64]: sns.boxplot(titanic["fare"])

[64]: <Axes: >

9
[74]: sns.displot(titanic["fare"])

[74]: <seaborn.axisgrid.FacetGrid at 0x7f968f91fc70>

10
[76]: plt.hist(titanic["fare"])

[76]: (array([732., 106., 31., 2., 11., 6., 0., 0., 0., 3.]),
array([ 0. , 51.23292, 102.46584, 153.69876, 204.93168, 256.1646 ,
307.39752, 358.63044, 409.86336, 461.09628, 512.3292 ]),
<BarContainer object of 10 artists>)

11
[ ]: sns.jointplot(titan)

12

You might also like