0% found this document useful (0 votes)

27 views18 pages

Seaborn Ploting in Titanic

The document contains a Python script that utilizes pandas, numpy, and seaborn for data analysis and visualization of the Titanic dataset. It includes data loading, descriptive statistics, handling missing values, and various plots to analyze passenger survival based on different features such as age, fare, and class. The analysis highlights trends in survival rates and suggests further steps for data imputation and categorization.

Uploaded by

haridivya6650

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views18 pages

Seaborn Ploting in Titanic

Uploaded by

haridivya6650

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

In [1]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

In [2]: df=pd.read_csv('titan.csv')
df.head()

Out[2]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare ... Embarked WikiId

Braund,
A/5
0 1 0.0 3 Mr. Owen male 22.0 1 0 7.2500 ... S 691.0
21171
Harris

Cumings,
Mrs. John
Bradley
1 2 1.0 1 female 38.0 1 0 PC 17599 71.2833 ... C 90.0
(Florence
Briggs
Th...

Heikkinen,
STON/O2.
2 3 1.0 3 Miss. female 26.0 0 0 7.9250 ... S 865.0
3101282
Laina

Futrelle,
Mrs.
Jacques
3 4 1.0 1 female 35.0 1 0 113803 53.1000 ... S 127.0
Heath
(Lily May
Peel)

Allen, Mr.
4 5 0.0 3 William male 35.0 0 0 373450 8.0500 ... S 627.0
Henry

5 rows × 21 columns

In [3]: df.describe()

Out[3]: PassengerId Survived Pclass Age SibSp Parch Fare WikiId

count 1309.000000 891.000000 1309.000000 1046.000000 1309.000000 1309.000000 1308.000000 1304.000000 13

mean 655.000000 0.383838 2.294882 29.881138 0.498854 0.385027 33.295479 658.534509

std 378.020061 0.486592 0.837836 14.413493 1.041658 0.865560 51.758668 380.377373

min 1.000000 0.000000 1.000000 0.170000 0.000000 0.000000 0.000000 1.000000

25% 328.000000 0.000000 2.000000 21.000000 0.000000 0.000000 7.895800 326.750000

50% 655.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200 661.500000

75% 982.000000 1.000000 3.000000 39.000000 1.000000 0.000000 31.275000 987.250000

max 1309.000000 1.000000 3.000000 80.000000 8.000000 9.000000 512.329200 1314.000000

In [4]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 1309 non-null int64
1 Survived 891 non-null float64
2 Pclass 1309 non-null int64
3 Name 1309 non-null object
4 Sex 1309 non-null object
5 Age 1046 non-null float64
6 SibSp 1309 non-null int64
7 Parch 1309 non-null int64
8 Ticket 1309 non-null object
9 Fare 1308 non-null float64
10 Cabin 295 non-null object
11 Embarked 1307 non-null object
12 WikiId 1304 non-null float64
13 Name_wiki 1304 non-null object
14 Age_wiki 1302 non-null float64
15 Hometown 1304 non-null object
16 Boarded 1304 non-null object
17 Destination 1304 non-null object
18 Lifeboat 502 non-null object
19 Body 130 non-null object
20 Class 1304 non-null float64
dtypes: float64(6), int64(4), object(11)
memory usage: 214.9+ KB

In [5]: df.shape
(1309, 21)
Out[5]:

In [6]: df.isnull().sum()

PassengerId 0
Out[6]:
Survived 418
Pclass 0
Name 0
Sex 0
Age 263
SibSp 0
Parch 0
Ticket 0
Fare 1
Cabin 1014
Embarked 2
WikiId 5
Name_wiki 5
Age_wiki 7
Hometown 5
Boarded 5
Destination 5
Lifeboat 807
Body 1179
Class 5
dtype: int64

In [7]: df.sample(10)

Out[7]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare ... Embarked W

797 798 1.0 3 Osman, Mrs. female 31.00 0 0 349244 8.6833 ... S
Mara
Hanna, Mr.
296 297 0.0 3 male 23.50 0 0 2693 7.2292 ... C
Mansour

Baclini,
644 645 1.0 3 Miss. female 0.75 2 1 2666 19.2583 ... C
Eugenie

Candee,
Mrs. Edward
PC
1115 1116 NaN 1 (Helen female 53.00 0 0 27.4458 ... C
17606
Churchill
Hungerford)

Denkoff, Mr.
335 336 0.0 3 male NaN 0 0 349225 7.8958 ... S
Mitto

Ling, Mr.
169 170 0.0 3 male 28.00 0 0 1601 56.4958 ... S
Lee

Barry, Miss.
977 978 NaN 3 female 27.00 0 0 330844 7.8792 ... Q
Julia

Warren,
Mrs. Frank
Manley
366 367 1.0 1 female 60.00 1 0 110813 75.2500 ... C
(Anna
Sophia
Atkinson)

Elias, Mr.
532 533 0.0 3 male 17.00 1 1 2690 7.2292 ... C
Joseph Jr

Moran, Mr.
5 6 0.0 3 male NaN 0 0 330877 8.4583 ... Q
James

10 rows × 21 columns

UNIVARIATE ANALYSIS

KDE PLOT

In [8]: plt.figure(figsize=(4,3))
sns.kdeplot(data=df.PassengerId)
plt.show()
In [9]: plt.figure(figsize=(4,3))
sns.kdeplot(data=df.Age)
plt.show()

In [10]: plt.figure(figsize=(4,3))
sns.kdeplot(data=df.Fare)
plt.show()
HISTPLOT
In [11]: sns.histplot(df.Fare)
plt.show

<function matplotlib.pyplot.show(close=None, block=None)>

Out[11]:

BOX PLOT

In [12]: sns.boxplot(df.Age)
plt.show()
In [13]: sns.boxplot(x='Embarked', y='Age', data=df)
plt.title("Age distribution as function of Embarked Port")
plt.show()

In [14]: sns.boxplot(x='Embarked', y='Fare', data=df)

plt.title("Fare distribution as function of Embarked Port")
plt.show()
MULTI VARIATE ANALYSIS

LINE PLOT

In [15]: sns.lineplot(x='Age', y='Fare', data=df)

plt.title('Age vs Fare')
plt.show()
PIE CHART
In [16]: df.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',

Out[16]:
'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked', 'WikiId', 'Name_wiki',
'Age_wiki', 'Hometown', 'Boarded', 'Destination', 'Lifeboat', 'Body',
'Class'],
dtype='object')

In [17]: pclass_survived=df.groupby(['Pclass'])['Survived'].sum()

In [18]: pclass_survived
Pclass
Out[18]:
1 136.0
2 87.0
3 119.0
Name: Survived, dtype: float64

In [19]: sns.set_style('ticks')
pclass_survived.plot.pie()
plt.legend()
plt.show()
In [20]: pclass_sex_Survived=df.groupby(['Pclass','Sex'])['Survived'].sum()

In [21]: pclass_sex_Survived

Pclass Sex
Out[21]:
1 female 91.0
male 45.0
2 female 70.0
male 17.0
3 female 72.0
male 47.0
Name: Survived, dtype: float64

In [22]: pclass_sex_Survived.plot.pie(autopct = '%1.2f%%')

plt.legend(bbox_to_anchor=(1.5,1),loc='upper left',borderaxespad=0)
plt.show()
BAR CHART

In [23]: sns.countplot(x='Sex',data=df)

<Axes: xlabel='Sex', ylabel='count'>

Out[23]:

In [24]: sns.catplot(x ="Sex", hue ="Survived",

kind ="count", data = df);
COUNT PLOT
In [25]: sns.countplot(x='Embarked', hue='Pclass', data=df)
plt.title("Count of Passengers as function of Embarked Port")
plt.show()
In [26]: plt.figure(figsize=(4,3))
sns.set_style('darkgrid')
sns.countplot(x='Pclass',hue='Survived',data=df)
plt.title('Pclass:Survived vs Dead')
plt.show()

In [27]: plt.figure(figsize=(4,3))
sns.set_style('darkgrid')
sns.countplot(x='Pclass',hue='Sex',data=df)
plt.title('Pclass:Sex vs Dead')
plt.show()
violin plot
In [28]: # Violinplot Displays distribution of data
# across all levels of a category.
sns.violinplot(x ="Sex", y ="Age", hue ="Survived",
data = df, split = True)

<Axes: xlabel='Sex', ylabel='Age'>

Out[28]:

his graph gives a summary of the age range of men, women and children who were saved. The survival rate
is –

Good for children.

High for women in the age range 20-50.

Less for men as the age increases.

Since Age column is important, the missing values need to be filled, either by using the Name
column(ascertaining age based on salutation – Mr, Mrs etc.) or by using a regressor. After this step, another
column – Age_Range (based on age column) can be created and the data can be analyzed again.

BAR PLOT

In [29]: plt.figure(figsize=(8,4))
sns.barplot(x='SibSp',y='Survived',data=df)
plt.title('SibSp & Survived')
plt.show()

In [30]: # Divide Fare into 4 bins

df['Fare_Range'] = pd.qcut(df['Fare'], 4)

# Barplot - Shows approximate values based

# on the height of bars.
sns.barplot(x ='Fare_Range', y ='Survived',
data = df)

<Axes: xlabel='Fare_Range', ylabel='Survived'>

Out[30]:
Fare denotes the fare paid by a passenger. As the values in this column are continuous, they need to be put
in separate bins(as done for Age feature) to get a clear idea. It can be concluded that if a passenger paid a
higher fare, the survival rate is more.

Pair plot
In [31]: sns.pairplot(data=df)
plt.show()
Heat map
In [32]: heat_map=df.corr()
sns.heatmap(heat_map)
plt.show()
In [33]: plt.scatter(df.Fare,df.Age);

strip plot

In [34]: sns.stripplot(x='Fare',y='Age',data=df)
plt.show()
In [35]: sns.stripplot(x='Fare',y='Age',data=df,size=4)
plt.show()

In [ ]:

Titanic Survival Prediction ML
No ratings yet
Titanic Survival Prediction ML
36 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
33 pages
Data Cleaning and Manipulation in Python
No ratings yet
Data Cleaning and Manipulation in Python
33 pages
Pyt Manual 1
No ratings yet
Pyt Manual 1
85 pages
Titanic Eda
No ratings yet
Titanic Eda
17 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
28 pages
Unit 5 Analysis With Pandas in Python
No ratings yet
Unit 5 Analysis With Pandas in Python
26 pages
Assignment2 DMS672
No ratings yet
Assignment2 DMS672
15 pages
LOGISTIC - REGRESSION - Jupyter Notebook
No ratings yet
LOGISTIC - REGRESSION - Jupyter Notebook
18 pages
Titanic
No ratings yet
Titanic
22 pages
Data Cleaning by Manish Batra 1697684636
No ratings yet
Data Cleaning by Manish Batra 1697684636
30 pages
Titanic Classification
100% (1)
Titanic Classification
7 pages
23L-2589 Lab 10
No ratings yet
23L-2589 Lab 10
17 pages
Assignment ICT EE A 2
No ratings yet
Assignment ICT EE A 2
18 pages
Titanic Data Analysis
No ratings yet
Titanic Data Analysis
14 pages
Titanic
100% (2)
Titanic
13 pages
Machine Learning Notebook
No ratings yet
Machine Learning Notebook
19 pages
Ds 9
No ratings yet
Ds 9
12 pages
ML File 211173
No ratings yet
ML File 211173
19 pages
I2IT DataVisualizationI - JupyterLab
No ratings yet
I2IT DataVisualizationI - JupyterLab
18 pages
Homework 2
No ratings yet
Homework 2
12 pages
AM19 EDA Assignment1
No ratings yet
AM19 EDA Assignment1
13 pages
Outcast: Warriors: Power of Three #3
From Everand
Outcast: Warriors: Power of Three #3
Erin Hunter
4/5 (193)
ML Dataset Performance
No ratings yet
ML Dataset Performance
11 pages
Pandas PD: Import As
No ratings yet
Pandas PD: Import As
19 pages
Assign8.ipynb - Colab
No ratings yet
Assign8.ipynb - Colab
14 pages
Programs Week 10
No ratings yet
Programs Week 10
11 pages
Dspracticalexternak 23 Aug
No ratings yet
Dspracticalexternak 23 Aug
8 pages
Passengerid Survived Pclass Name Sex Age Sibsp Parch Ticket
No ratings yet
Passengerid Survived Pclass Name Sex Age Sibsp Parch Ticket
16 pages
Assignment 5
No ratings yet
Assignment 5
14 pages
BD WPS2
No ratings yet
BD WPS2
11 pages
Titanic Survival Prediction 1692609491
No ratings yet
Titanic Survival Prediction 1692609491
15 pages
Dataset Visualization Basic Ml-1
No ratings yet
Dataset Visualization Basic Ml-1
12 pages
10 - Eda To Prediction Dietanic
No ratings yet
10 - Eda To Prediction Dietanic
21 pages
Homework 1
No ratings yet
Homework 1
17 pages
DSBDA9
No ratings yet
DSBDA9
7 pages
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
No ratings yet
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
13 pages
Day 20
No ratings yet
Day 20
5 pages
PANDAS Groupby Continues 2
No ratings yet
PANDAS Groupby Continues 2
5 pages
ML 3
No ratings yet
ML 3
9 pages
Dev Assignment - 1
No ratings yet
Dev Assignment - 1
6 pages
Lab 5.ipynb - Colab
No ratings yet
Lab 5.ipynb - Colab
6 pages
Project Report
No ratings yet
Project Report
7 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Prac3 23bme053
No ratings yet
Prac3 23bme053
5 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
#10 - Energy Balance - 01 (Rev01)
No ratings yet
#10 - Energy Balance - 01 (Rev01)
48 pages
Pandas Day 4
No ratings yet
Pandas Day 4
7 pages
Assign9.Ipynb - Colab
No ratings yet
Assign9.Ipynb - Colab
4 pages
Basic Question Bank With Answers and Explanations
No ratings yet
Basic Question Bank With Answers and Explanations
275 pages
178 - NaiveBaye's.ipynb - Colab
No ratings yet
178 - NaiveBaye's.ipynb - Colab
3 pages
7 8 - Missing Value Handling
No ratings yet
7 8 - Missing Value Handling
4 pages
3GPP TS 36.331 V10.12.0 (2013-12)
No ratings yet
3GPP TS 36.331 V10.12.0 (2013-12)
312 pages
Assignment Data Science
No ratings yet
Assignment Data Science
2 pages
Maneesha Nidigonda Minor Project .Ipynb
No ratings yet
Maneesha Nidigonda Minor Project .Ipynb
35 pages
Day 20
No ratings yet
Day 20
5 pages
Pra 8-1
No ratings yet
Pra 8-1
3 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Titanic Prediction
No ratings yet
Titanic Prediction
53 pages
The Titanic Dataset
No ratings yet
The Titanic Dataset
6 pages
Surveys (Tunneling)
No ratings yet
Surveys (Tunneling)
66 pages
Pythion Assigment
No ratings yet
Pythion Assigment
3 pages
Titanic Data
No ratings yet
Titanic Data
5 pages
DLL Mapeh-5 Q2
No ratings yet
DLL Mapeh-5 Q2
99 pages
Emcee Script
100% (2)
Emcee Script
2 pages
Introduction To Industrial Relations: Lecture 1& 2
No ratings yet
Introduction To Industrial Relations: Lecture 1& 2
54 pages
Seminar On: Electronic Braking System (Ebs)
No ratings yet
Seminar On: Electronic Braking System (Ebs)
21 pages
UiPath Logo Partner Guidelines
No ratings yet
UiPath Logo Partner Guidelines
10 pages
Agri Surfactants Handbook - V14 - 280225 - ENGLISH
No ratings yet
Agri Surfactants Handbook - V14 - 280225 - ENGLISH
35 pages
Module - 3: Engineering As Social Experimentation
No ratings yet
Module - 3: Engineering As Social Experimentation
16 pages
The Technical Aspects When Using BENDER Communication Solutions
No ratings yet
The Technical Aspects When Using BENDER Communication Solutions
4 pages
Ruabon: Lost Tales Of Solace, #4
From Everand
Ruabon: Lost Tales Of Solace, #4
Karl Drinkwater
No ratings yet
Diagnostic Procedures in Gynecology (2023)
No ratings yet
Diagnostic Procedures in Gynecology (2023)
3 pages
Lumbar Herniation Case Study
No ratings yet
Lumbar Herniation Case Study
1 page
System Monitoring With Sar and Ksar
No ratings yet
System Monitoring With Sar and Ksar
9 pages
Ultrasonic Calculator
No ratings yet
Ultrasonic Calculator
6 pages
AN-1525 Single Supply Operation of The DAC0800 and DAC0802: Application Report
No ratings yet
AN-1525 Single Supply Operation of The DAC0800 and DAC0802: Application Report
7 pages
Zero Knowledge
No ratings yet
Zero Knowledge
5 pages
Project-Description-for-Scoping MCTEP
No ratings yet
Project-Description-for-Scoping MCTEP
33 pages
Corbin's Concepts of Fitness and Wellness: A Comprehensive Lifestyle Approach ISE 13th Edition Charles B. Corbin 2024 Scribd Download
100% (1)
Corbin's Concepts of Fitness and Wellness: A Comprehensive Lifestyle Approach ISE 13th Edition Charles B. Corbin 2024 Scribd Download
79 pages
Demo Recognizing Phrases and Sentence 1
No ratings yet
Demo Recognizing Phrases and Sentence 1
14 pages
The Geisha Memory 2
No ratings yet
The Geisha Memory 2
25 pages
LG Dry Contact (Only AC 24V) : Installation Manual
No ratings yet
LG Dry Contact (Only AC 24V) : Installation Manual
11 pages
Industrial Series HDX Models
No ratings yet
Industrial Series HDX Models
3 pages
Cumulative Records of Students (Cures)
No ratings yet
Cumulative Records of Students (Cures)
4 pages
ICT360 TechEd Report Vol 1
No ratings yet
ICT360 TechEd Report Vol 1
16 pages
RESUME CountryDirectorJapan
No ratings yet
RESUME CountryDirectorJapan
5 pages
Introduction To Computer Graphics
No ratings yet
Introduction To Computer Graphics
2 pages
Essay Structure and Paragraphing
No ratings yet
Essay Structure and Paragraphing
3 pages
Prediction of Compressive Strength of Concrete With Agricultural Waste and Natural Fibre 2024
No ratings yet
Prediction of Compressive Strength of Concrete With Agricultural Waste and Natural Fibre 2024
5 pages

Seaborn Ploting in Titanic

Uploaded by

Seaborn Ploting in Titanic

Uploaded by

In [1]: import pandas as pd

Out[3]: PassengerId Survived Pclass Age SibSp Parch Fare WikiId

count 1309.000000 891.000000 1309.000000 1046.000000 1309.000000 1309.000000 1308.000000 1304.000000 13

mean 655.000000 0.383838 2.294882 29.881138 0.498854 0.385027 33.295479 658.534509

std 378.020061 0.486592 0.837836 14.413493 1.041658 0.865560 51.758668 380.377373

min 1.000000 0.000000 1.000000 0.170000 0.000000 0.000000 0.000000 1.000000

25% 328.000000 0.000000 2.000000 21.000000 0.000000 0.000000 7.895800 326.750000

50% 655.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200 661.500000

75% 982.000000 1.000000 3.000000 39.000000 1.000000 0.000000 31.275000 987.250000

max 1309.000000 1.000000 3.000000 80.000000 8.000000 9.000000 512.329200 1314.000000

<function matplotlib.pyplot.show(close=None, block=None)>

In [14]: sns.boxplot(x='Embarked', y='Fare', data=df)

In [15]: sns.lineplot(x='Age', y='Fare', data=df)

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',

In [22]: pclass_sex_Survived.plot.pie(autopct = '%1.2f%%')

<Axes: xlabel='Sex', ylabel='count'>

In [24]: sns.catplot(x ="Sex", hue ="Survived",

<Axes: xlabel='Sex', ylabel='Age'>

Good for children.

High for women in the age range 20-50.

In [30]: # Divide Fare into 4 bins

# Barplot - Shows approximate values based

<Axes: xlabel='Fare_Range', ylabel='Survived'>

You might also like