Count plots and bar
plots
I N T R O D U C T I O N TO D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
Categorical plots
Examples: count plots, bar plots
Involve a categorical variable
Comparisons between groups
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
catplot()
Used to create categorical plots
Same advantages of relplot()
Easily create subplots with col= and row=
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
countplot() vs. catplot()
import matplotlib.pyplot as plt
import seaborn as sns
sns.countplot(x="how_masculine",
data=masculinity_data)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
countplot() vs. catplot()
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="how_masculine",
data=masculinity_data,
kind="count")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Changing the order
import matplotlib.pyplot as plt
import seaborn as sns
category_order = ["No answer",
"Not at all",
"Not very",
"Somewhat",
"Very"]
sns.catplot(x="how_masculine",
data=masculinity_data,
kind="count",
order=category_order)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Bar plots
Displays mean of quantitative variable per
category
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="day",
y="total_bill",
data=tips,
kind="bar")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Con dence intervals
Lines show 95% con dence intervals for the
mean
Shows uncertainty about our estimate
Assumes our data is a random sample
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Turning off con dence intervals
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="day",
y="total_bill",
data=tips,
kind="bar",
ci=None)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Changing the orientation
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="total_bill",
y="day",
data=tips,
kind="bar")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N TO D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Creating a box plot
I N T R O D U C T I O N TO D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
What is a box plot?
Shows the distribution of quantitative data
See median, spread, skewness, and outliers
Facilitates comparisons between groups
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
How to create a box plot
import matplotlib.pyplot as plt
import seaborn as sns
g = sns.catplot(x="time",
y="total_bill",
data=tips,
kind="box")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Change the order of categories
import matplotlib.pyplot as plt
import seaborn as sns
g = sns.catplot(x="time",
y="total_bill",
data=tips,
kind="box",
order=["Dinner",
"Lunch"])
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Omitting the outliers using `sym`
import matplotlib.pyplot as plt
import seaborn as sns
g = sns.catplot(x="time",
y="total_bill",
data=tips,
kind="box",
sym="")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Changing the whiskers using `whis`
By default, the whiskers extend to 1.5 * the interquartile range
Make them extend to 2.0 * IQR: whis=2.0
Show the 5th and 95th percentiles: whis=[5, 95]
Show min and max values: whis=[0, 100]
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Changing the whiskers using `whis`
import matplotlib.pyplot as plt
import seaborn as sns
g = sns.catplot(x="time",
y="total_bill",
data=tips,
kind="box",
whis=[0, 100])
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N TO D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Point plots
I N T R O D U C T I O N TO D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
What are point plots?
Points show mean of quantitative variable
Vertical lines show 95% con dence intervals
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Line plot: average level of nitrogen dioxide over Point plot: average restaurant bill, smokers vs.
time non-smokers
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Point plots vs. line plots
Both show:
Mean of quantitative variable
95% con dence intervals for the mean
Differences:
Line plot has quantitative variable (usually time) on x-axis
Point plot has categorical variable on x-axis
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Point plots vs. bar plots
Both show:
Mean of quantitative variable
95% con dence intervals for the mean
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Point plots vs. bar plots
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Creating a point plot
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Disconnecting the points
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point",
join=False)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Displaying the median
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="smoker",
y="total_bill",
data=tips,
kind="point")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Displaying the median
import matplotlib.pyplot as plt
import seaborn as sns
from numpy import median
sns.catplot(x="smoker",
y="total_bill",
data=tips,
kind="point",
estimator=median)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Customizing the con dence intervals
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="smoker",
y="total_bill",
data=tips,
kind="point",
capsize=0.2)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Turning off con dence intervals
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="smoker",
y="total_bill",
data=tips,
kind="point",
ci=None)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N TO D ATA V I S U A L I Z AT I O N W I T H S E A B O R N