0% found this document useful (0 votes)

15 views13 pages

Experiment No 9

The document outlines an experiment focused on data visualization using the Titanic dataset, emphasizing exploratory data analysis techniques. It covers various methods for analyzing both categorical and numerical data, including box plots, count plots, and scatter plots, while providing code examples for implementation. Additionally, it discusses the importance of understanding data distribution and relationships between variables to enhance data storytelling.

Uploaded by

shreeharikasar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views13 pages

Experiment No 9

Uploaded by

shreeharikasar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Experiment No.

9
Performance & Innovation Timely Total Sign & Date
Understanding Completion
3 1 1 5

Aim: Data Visualization II:

1. Use the inbuilt dataset 'titanic' as used in the above problem. Plot a box plot for distribution of
age with respect to each gender along with the information about whether they survived or not.
(Column names : 'sex' and 'age').

2. Write observations on the inference from the above statistics.

Theory: Exploratory Data Analysis

There are various techniques to understand your data, And the basic need is you should have the

knowledge of Numpy for mathematical operations and Pandas for data manipulation. We are

using Titanic dataset. For demonstrating some of the techniques we will also use an inbuilt

dataset of seaborn as tips data which explains the tips each waiter gets from different customers.

Import libraries and loading Data

import numpy as np
import pandas pd
import matplotlib.pyplot as plt
import seaborn as sns
from seaborn import load_dataset
#titanic dataset
data = pd.read_csv("titanic_train.csv")
#tips dataset
tips = load_dataset("tips")
Univariate Analysis

Univariate analysis is the simplest form of analysis where we explore a single variable.

Univariate analysis is performed to describe the data in a better way. we perform Univariate

analysis of Numerical and categorical variables differently because plotting uses different plots.

Categorical Data:

A variable that has text-based information is referred to as categorical variables. Now following

are various plots which we can use for visualizing Categorical data.

1) CountPlot:

Countplot is basically a count of frequency plot in form of a bar graph. It plots the count of each

category in a separate bar. When we use the pandas’ value counts function on any column. It is

the same visual form of the value counts function. In our data-target variable is survived and it is

categorical so plot a countplot of this.

sns.countplot(data['Survived'])
plt.show()

2) Pie Chart:

The pie chart is also the same as the countplot, only gives us additional information about the

percentage presence of each category in data means which category is getting how much
weightage in data. Now we check about the Sex column, what is a percentage of Male and

Female members traveling.

data['Sex'].value_counts().plot(kind="pie", autopct="%.2f")
plt.show()

Numerical Data:

Analyzing Numerical data is important because understanding the distribution of variables helps

to further process the data. Most of the time, we will find much inconsistency with numerical

data so we have to explore numerical variables.

1) Histogram:

A histogram is a value distribution plot of numerical columns. It basically creates bins in various

ranges in values and plots it where we can visualize how values are distributed. We can have a

look where more values lie like in positive, negative, or at the center(mean). Let’s have a look at

the Age column.

plt.hist(data['Age'], bins=5)
plt.show()
2) Distplot:

Distplot is also known as the second Histogram because it is a slight improvement version of the

Histogram. Distplot gives us a KDE(Kernel Density Estimation) over histogram which explains

PDF(Probability Density Function) which means what is the probability of each value occurring

in this column.

sns.distplot(data['Age'])
plt.show()

3) Boxplot:

Boxplot is a very interesting plot that basically plots a 5 number summary. to get 5 number

summary some terms we need to describe.

• Median – Middle value in series after sorting
• Percentile – Gives any number which is number of values present before this percentile like for
example 50 under 25th percentile so it explains total of 50 values are there below 25th percentile
• Minimum and Maximum – These are not minimum and maximum values, rather they describe
the lower and upper boundary of standard deviation which is calculated using Interquartile
range(IQR).

IQR = Q3 - Q1
Lower_boundary = Q1 - 1.5 * IQR
Upper_bounday = Q3 + 1.5 * IQR

Here Q1 and Q3 is 1st quantile (25th percentile) and 3rd Quantile(75th percentile).

Bivariate/ Multivariate Analysis:

We have study about various plots to explore single categorical and numerical data. Bivariate

Analysis is used when we have to explore the relationship between 2 different variables and we

have to do this because, in the end, our main task is to explore the relationship between variables

to build a powerful model. And when we analyze more than 2 variables together then it is known

as Multivariate Analysis. we will work on different plots for Bivariate as well on Multivariate

Analysis.

Explore the plots when both the variable is numerical.

1) Scatter Plot:

To plot the relationship between two numerical variables scatter plot is a simple plot to do. Let

us see the relationship between the total bill and tip provided using a scatter plot.

sns.scatterplot(tips["total_bill"], tips["tip"])
Multivariate analysis with scatter plot:

We can also plot 3 variable or 4 variable relationships with scatter plot. suppose we want to find

the separate ratio of male and female with total bill and tip provided.

sns.scatterplot(tips["total_bill"], tips["tip"], hue=tips["sex"])

plt.show()

We can also see 4 variable multivariate analyses with scatter plots using style argument. Suppose

along with gender we also want to know whether the customer was a smoker or not so we can do

this.
sns.scatterplot(tips["total_bill"], tips["tip"], hue=tips["sex"],
style=tips['smoker'])
plt.show()

Numerical and Categorical:

If one variable is numerical and one is categorical then there are various plots that we can use for

Bivariate and Multivariate analysis.

1) Bar Plot:

Bar plot is a simple plot which we can use to plot categorical variable on the x-axis and

numerical variable on y-axis and explore the relationship between both variables. The blacktip

on top of each bar shows the confidence Interval. let us explore P-Class with age.

sns.barplot(data['Pclass'], data['Age'])
plt.show()
Multivariate analysis using Bar plot:

Hue’s argument is very useful which helps to analyze more than 2 variables. Now along with the

above relationship we want to see with gender.

sns.barplot(data['Pclass'], data['Fare'], hue = data["Sex"])

plt.show()

2) Boxplot:

We have already study about boxplots in the Univariate analysis above. we can draw a separate

boxplot for both the variable. let us explore gender with age using a boxplot.

sns.boxplot(data['Sex'], data["Age"])
Multivariate analysis with boxplot:

Along with age and gender let’s see who has survived and who has not.

sns.boxplot(data['Sex'], data["Age"], data["Survived"])

plt.show()

3) Distplot:

Distplot explains the PDF function using kernel density estimation. Distplot does not have a hue

parameter but we can create it. Suppose we want to see the probability of people with an age

range that of survival probability and find out whose survival probability is high to the age range

of death ratio.
sns.distplot(data[data['Survived'] == 0]['Age'], hist=False, color="blue")
sns.distplot(data[data['Survived'] == 1]['Age'], hist=False, color="orange")
plt.show()

In above graph, the blue one shows the probability of dying and the orange plot shows the
survival probability. If we observe it we can see that children’s survival probability is higher than
death and which is the opposite in the case of aged peoples. This small analysis tells sometimes
some big things about data and it helps while preparing data stories.

Categorical and Categorical:

Now, we will work on categorical and categorical columns.

1) Heatmap:

If you have ever used a crosstab function of pandas then Heatmap is a similar visual

representation of that only. It basically shows that how much presence of one category

concerning another category is present in the dataset. let me show first with crosstab and then

with heatmap.

pd.crosstab(data['Pclass'], data['Survived'])
Now with heatmap, we have to find how many people survived and died.

sns.heatmap(pd.crosstab(data['Pclass'], data['Survived']))

2) Cluster map:

We can also use a cluster map to understand the relationship between two categorical variables.

A cluster map basically plots a dendrogram that shows the categories of similar behavior

together.

sns.clustermap(pd.crosstab(data['Parch'], data['Survived']))
plt.show()

SSGB Mcqs Six Sigma
No ratings yet
SSGB Mcqs Six Sigma
13 pages
CH 03
No ratings yet
CH 03
40 pages
Quiz Like Questions From Unit 6-10
No ratings yet
Quiz Like Questions From Unit 6-10
8 pages
Lecture of BIOSTATISTICS 12.2022 RMDC
No ratings yet
Lecture of BIOSTATISTICS 12.2022 RMDC
85 pages
TSA Theory Part1
No ratings yet
TSA Theory Part1
98 pages
Csec Add Maths May 2018 PDF
100% (1)
Csec Add Maths May 2018 PDF
33 pages
Mathematics Competition
No ratings yet
Mathematics Competition
4 pages
Module 2
No ratings yet
Module 2
75 pages
MMW Chapter 4 GH Annotated1
No ratings yet
MMW Chapter 4 GH Annotated1
33 pages
All All: % (A) Construct Side-By-Side Stem-And-Leaf Plots
No ratings yet
All All: % (A) Construct Side-By-Side Stem-And-Leaf Plots
34 pages
Math Ia Draft 2
No ratings yet
Math Ia Draft 2
12 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
Numerical Descriptive Measures: Tea-Bags
No ratings yet
Numerical Descriptive Measures: Tea-Bags
6 pages
Assignment Stats
No ratings yet
Assignment Stats
9 pages
FS Maths Literacy Grade 12 June 2024 P1 and Memo
No ratings yet
FS Maths Literacy Grade 12 June 2024 P1 and Memo
19 pages
CSA105-LinearRegression-HousePrice-Prediction - Ipynb - Colaboratory
No ratings yet
CSA105-LinearRegression-HousePrice-Prediction - Ipynb - Colaboratory
17 pages
Eduf 2302 Chance
No ratings yet
Eduf 2302 Chance
25 pages
P8-Properties of Distributions
No ratings yet
P8-Properties of Distributions
12 pages
3 Data Description
No ratings yet
3 Data Description
87 pages
Psy Ass Midterm Reviewer
No ratings yet
Psy Ass Midterm Reviewer
9 pages
Cambridge Ordinary Level
No ratings yet
Cambridge Ordinary Level
20 pages
Efficacy and Safety of Long Pulse 1064 and 2940 NM Lasers in Noninvasive Lipolysis and Skin Tightening
No ratings yet
Efficacy and Safety of Long Pulse 1064 and 2940 NM Lasers in Noninvasive Lipolysis and Skin Tightening
8 pages
Comparison of Baits and Types of Pitfall Traps For Capturing Dung and Carrion Scarabaeoid Beetles in East Kalimantan
No ratings yet
Comparison of Baits and Types of Pitfall Traps For Capturing Dung and Carrion Scarabaeoid Beetles in East Kalimantan
14 pages
Data Analisis 2
No ratings yet
Data Analisis 2
13 pages
Seaborn 2
No ratings yet
Seaborn 2
49 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
Data Visualization With Python
No ratings yet
Data Visualization With Python
34 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Chapter 7. The Central Limit Theorem Practice and Homework Solutions
No ratings yet
Chapter 7. The Central Limit Theorem Practice and Homework Solutions
9 pages
2024 HYSS - S4-5 G3 MA Prelim - Paper 2 - MS (For Sharing)
No ratings yet
2024 HYSS - S4-5 G3 MA Prelim - Paper 2 - MS (For Sharing)
26 pages
Ass 8 DSBDL
No ratings yet
Ass 8 DSBDL
27 pages
Experiment No 8
No ratings yet
Experiment No 8
26 pages
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
No ratings yet
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
13 pages
DSBDL Write Ups 8 To 10
No ratings yet
DSBDL Write Ups 8 To 10
7 pages
2descriptive Numerical Summary Measures Central
No ratings yet
2descriptive Numerical Summary Measures Central
52 pages
DSBDAL - Assignment No 9
No ratings yet
DSBDAL - Assignment No 9
12 pages
BDA File
No ratings yet
BDA File
26 pages
L5 6 DataViz
No ratings yet
L5 6 DataViz
79 pages
Data Analysis Graphs
No ratings yet
Data Analysis Graphs
9 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Lecture 4
No ratings yet
Lecture 4
60 pages
ITS62604 Tutorial 6 (Answer)
No ratings yet
ITS62604 Tutorial 6 (Answer)
2 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
6) Exploratory Data Analysis
No ratings yet
6) Exploratory Data Analysis
29 pages
Test Topic 4 - Statistics and Probability - Markscheme 1
No ratings yet
Test Topic 4 - Statistics and Probability - Markscheme 1
13 pages
Python Interviews Question
No ratings yet
Python Interviews Question
47 pages
Exp 2 Data Preprocessing - Cleaning The Dataset Obtained From The UCI ML Repository
No ratings yet
Exp 2 Data Preprocessing - Cleaning The Dataset Obtained From The UCI ML Repository
9 pages
1.1 Univariate Analysis: 1.1.1 Categorical Data
No ratings yet
1.1 Univariate Analysis: 1.1.1 Categorical Data
10 pages
Exp 8
No ratings yet
Exp 8
19 pages
STUDENT EXEMPLAR 2 - Stats Project
No ratings yet
STUDENT EXEMPLAR 2 - Stats Project
22 pages
Data Visualization Lab: Experiment 1
No ratings yet
Data Visualization Lab: Experiment 1
8 pages
Unit 3
No ratings yet
Unit 3
45 pages
Exam Pa Note
No ratings yet
Exam Pa Note
73 pages
Lab Manual For Students
No ratings yet
Lab Manual For Students
38 pages
Data Visualization Part 2
No ratings yet
Data Visualization Part 2
18 pages
Programming For AI: Exploratory Data Analysis
No ratings yet
Programming For AI: Exploratory Data Analysis
52 pages
3-Data Description
No ratings yet
3-Data Description
91 pages
Pandas Cheat Sheet 2
No ratings yet
Pandas Cheat Sheet 2
12 pages
Matplot Lib Practicals
No ratings yet
Matplot Lib Practicals
24 pages
Data Visualization Using Python
No ratings yet
Data Visualization Using Python
3 pages
DMV Unit-4-1 PDF
No ratings yet
DMV Unit-4-1 PDF
10 pages
Data Visualization Using Matplotlib
No ratings yet
Data Visualization Using Matplotlib
10 pages
Sl-3 Assignment No.8
No ratings yet
Sl-3 Assignment No.8
21 pages
Part A Assignment - No - 8
No ratings yet
Part A Assignment - No - 8
19 pages
Descriptive Statistics: Organizing, Summarizing, Describing, and Presenting Data
No ratings yet
Descriptive Statistics: Organizing, Summarizing, Describing, and Presenting Data
17 pages
Advanced Plot Types With Seaborn
No ratings yet
Advanced Plot Types With Seaborn
4 pages
Matplotlib
No ratings yet
Matplotlib
5 pages
Matplotlib Notes
No ratings yet
Matplotlib Notes
5 pages
Seaborn: Key Features
No ratings yet
Seaborn: Key Features
5 pages
Seaborn
No ratings yet
Seaborn
17 pages
Seaborn
No ratings yet
Seaborn
7 pages
Aphical Representation
No ratings yet
Aphical Representation
8 pages
DVA Practical
No ratings yet
DVA Practical
19 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
Ai&Ml Bail606 ML Lab Manual
No ratings yet
Ai&Ml Bail606 ML Lab Manual
50 pages
Unit 2
No ratings yet
Unit 2
36 pages
Geostatitics Notes Alevel Kazembe WIP
No ratings yet
Geostatitics Notes Alevel Kazembe WIP
63 pages
Seaborn 1655435139
No ratings yet
Seaborn 1655435139
13 pages
Data Visualisation
No ratings yet
Data Visualisation
5 pages
Week 6
No ratings yet
Week 6
40 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
Unit 5
No ratings yet
Unit 5
25 pages
Datavisualization Interview
No ratings yet
Datavisualization Interview
3 pages
Unit 3 DS
No ratings yet
Unit 3 DS
30 pages
Advanced Plot Types With Seaborn
No ratings yet
Advanced Plot Types With Seaborn
8 pages
Pandas 3-2
No ratings yet
Pandas 3-2
27 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet

Experiment No 9

Uploaded by

Experiment No 9

Uploaded by

Experiment No.

Aim: Data Visualization II:

2. Write observations on the inference from the above statistics.

Theory: Exploratory Data Analysis

Import libraries and loading Data

categorical so plot a countplot of this.

Female members traveling.

data so we have to explore numerical variables.

the Age column.

summary some terms we need to describe.

Bivariate/ Multivariate Analysis:

Explore the plots when both the variable is numerical.

sns.scatterplot(tips["total_bill"], tips["tip"], hue=tips["sex"])

Numerical and Categorical:

Bivariate and Multivariate analysis.

above relationship we want to see with gender.

sns.barplot(data['Pclass'], data['Fare'], hue = data["Sex"])

sns.boxplot(data['Sex'], data["Age"], data["Survived"])

Categorical and Categorical:

Now, we will work on categorical and categorical columns.

You might also like