0% found this document useful (0 votes)
29 views17 pages

Seaborn

Seaborn is a Python library for creating statistical graphics, built on top of matplotlib and designed to work with pandas data structures. It simplifies the process of creating visually appealing plots with less code compared to matplotlib. The document also provides examples of various plot types such as bar plots, scatter plots, and heatmaps, along with installation instructions and a comparison with matplotlib.

Uploaded by

imbilalbaig
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views17 pages

Seaborn

Seaborn is a Python library for creating statistical graphics, built on top of matplotlib and designed to work with pandas data structures. It simplifies the process of creating visually appealing plots with less code compared to matplotlib. The document also provides examples of various plot types such as bar plots, scatter plots, and heatmaps, along with installation instructions and a comparison with matplotlib.

Uploaded by

imbilalbaig
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

https://seaborn.pydata.

org/
Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and
integrates closely with pandas data structures.

How to install Seaborn?


Go to anaconda cmd and type "pip install seaborn" (internet connection is required), pip is a
package manager.

Seaborn Vs Matplotlib
Seaborn smatter than the matpotlib and plot graphs beautiful with less coding. You can do
the same work with matplotlib but it is difficult because it has no such built-in function.

In [1]: # Import Seaborn library


import seaborn as sns

In [2]: # Lets check seaborn built-in datasets


sns.get_dataset_names()

['anagrams',
Out[2]:
'anscombe',
'attention',
'brain_networks',
'car_crashes',
'diamonds',
'dots',
'dowjones',
'exercise',
'flights',
'fmri',
'geyser',
'glue',
'healthexp',
'iris',
'mpg',
'penguins',
'planets',
'seaice',
'taxis',
'tips',
'titanic']

Tipping in a Restaurant Dataset

In [3]: tips = sns.load_dataset("tips")


tips

Out[3]: total_bill tip sex smoker day time size

0 16.99 1.01 Female No Sun Dinner 2

1 10.34 1.66 Male No Sun Dinner 3

2 21.01 3.50 Male No Sun Dinner 3

3 23.68 3.31 Male No Sun Dinner 2

4 24.59 3.61 Female No Sun Dinner 4

... ... ... ... ... ... ... ...

239 29.03 5.92 Male No Sat Dinner 3

240 27.18 2.00 Female Yes Sat Dinner 2

241 22.67 2.00 Male Yes Sat Dinner 2

242 17.82 1.75 Male No Sat Dinner 2

243 18.78 3.00 Female No Thur Dinner 2

244 rows × 7 columns

Bar Plot
Bar plots are a type of data visualization used to represent data in the form of rectangular
bars. The height of each bar represents the value of a data point, and the width of each bar
represents the category of the data.

In [4]: sns.barplot(data=tips, x="sex", y="tip", palette="YlGnBu")

<Axes: xlabel='sex', ylabel='tip'>


Out[4]:
Scatter Plot
Scatter plot uses dots to represent values for two different numeric variables. The position of
each dot on the horizontal and vertical axis indicates values for an individual data point.
Scatter plots are used to observe relationships between variables.

In [5]: sns.scatterplot(data=tips, x="tip", y="total_bill", hue="day", size="size", palette="YlG


# Hue is the property of color. Which refers to the color itself.

<Axes: xlabel='tip', ylabel='total_bill'>


Out[5]:
Box Plot
Box plots give a good graphical image of the concentration of the data. They also show how
far the extreme values are from most of the data.

In [6]: sns.boxplot(data=tips, x="day", y="total_bill", hue="sex", palette="YlGnBu")

<Axes: xlabel='day', ylabel='total_bill'>


Out[6]:
Strip Plot
Strip plot is a single-axis scatter plot that is used to visualise the distribution of many
individual one-dimensional values. The values are plotted as dots along one unique axis, and
the dots with the same value can overlap. To show overlapping values, the opacity or colour
of the dots can be changed, or counts plot can be used instead

In [7]: sns.stripplot(data=tips, x="day", y="tip", hue="sex", palette="YlGnBu", dodge=True)


# Dodge is for seprate variables (here: gender)

<Axes: xlabel='day', ylabel='tip'>


Out[7]:
Swarm Plot
Swarm Plot is similar to stripplot(), but the points are adjusted (only along the categorical
axis) so that they don’t overlap. This gives a better representation of the distribution of values,
but it does not scale well to large numbers of observations. This style of plot is sometimes
called a “beeswarm”.

In [8]: sns.swarmplot(data=tips, x="day", y="tip", hue="day", palette="YlGnBu")

<Axes: xlabel='day', ylabel='tip'>


Out[8]:
Violin Plot
Violin plot features a kernel density estimation (kde) of the underlying distribution

In [9]: sns.violinplot(data=tips, x="day", y="tip", hue="smoker", palette="YlGnBu")

<Axes: xlabel='day', ylabel='tip'>


Out[9]:
Distribution Plot (Histograms)
Distribution plot is suitable for comparing range and distribution for groups of numerical data

In [10]: sns.displot(data=tips, x="tip", bins=20, kde=True)

<seaborn.axisgrid.FacetGrid at 0x1419b102da0>
Out[10]:
Count Plot
Count plot can be thought of as a histogram across a categorical, instead of quantitative,
variable. The basic API and options are identical to those for barplot(), so you can compare
counts across nested variables.

In [11]: sns.countplot(data=tips, x="day", hue="sex", palette="YlGnBu")

<Axes: xlabel='day', ylabel='count'>


Out[11]:
KDE & Rug Plot
Kernel Density Estimate (KDE) plot is a method for visualizing the distribution of observations
in a dataset, analogous to a histogram.

Rug is not really a separate plot. It is a one-dimensional display that you can add to existing
plots to illuminate information that is sometimes lost in other types of graphs. Like a strip
plot, it represents values of a variable by putting a symbol at various points along an axis.
However, it uses short lines to represent points.

In [12]: sns.kdeplot(data=tips, x="tip")


sns.rugplot(data=tips, x="tip")

<Axes: xlabel='tip', ylabel='Density'>


Out[12]:
Pair Plot
Pair plot uses to get the relation between each and every variable present in Pandas
DataFrame.

In [13]: sns.pairplot(data=tips, hue="day", palette="YlGnBu")

<seaborn.axisgrid.PairGrid at 0x1419d062f50>
Out[13]:
Joint Plot
Joint plot is a way of understanding the relationship between two variables and the
distribution of individuals of each variable. The joint plot mainly consists of three separate
plots in which, one of it was the middle figure that is used to see the relationship between x
and y. So, this area will give the information about the joint distribution, while the remaining
two areas will provide us with the marginal distribution for the x-axis and y-axis.

In [14]: sns.jointplot(data=tips, x="tip", y="total_bill", kind="hex", palette="YlGnBu") # Defaul


#sns.jointplot(data=tips, x="tip", y="total_bill", kind="kde", fill=True, cmap="YlGnBu")

<seaborn.axisgrid.JointGrid at 0x1419d711030>
Out[14]:
Heatmap / Matrix Plot
Heatmap is a 2D graphical representation of data where the individual values that are
contained in a matrix are represented as colours. The color of the matrix is dependent on
value. Normally, low-value show in low-intensity color and high-value show in hight-intensity
color format.

Titanic Disaster Dataset

In [15]: titanic = sns.load_dataset("titanic")


titanic
Out[15]: survived pclass sex age sibsp parch fare embarked class who adult_male deck embar

0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southa

1 1 1 female 38.0 1 0 71.2833 C First woman False C Che

2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southa

3 1 1 female 35.0 1 0 53.1000 S First woman False C Southa

4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southa

... ... ... ... ... ... ... ... ... ... ... ... ...

886 0 2 male 27.0 0 0 13.0000 S Second man True NaN Southa

887 1 1 female 19.0 0 0 30.0000 S First woman False B Southa

888 0 3 female NaN 1 2 23.4500 S Third woman False NaN Southa

889 1 1 male 26.0 0 0 30.0000 C First man True C Che

890 0 3 male 32.0 0 0 7.7500 Q Third man True NaN Quee

891 rows × 15 columns

In [16]: x=titanic.corr(numeric_only = True)


x

Out[16]: survived pclass age sibsp parch fare adult_male alone

survived 1.000000 -0.338481 -0.077221 -0.035322 0.081629 0.257307 -0.557080 -0.203367

pclass -0.338481 1.000000 -0.369226 0.083081 0.018443 -0.549500 0.094035 0.135207

age -0.077221 -0.369226 1.000000 -0.308247 -0.189119 0.096067 0.280328 0.198270

sibsp -0.035322 0.083081 -0.308247 1.000000 0.414838 0.159651 -0.253586 -0.584471

parch 0.081629 0.018443 -0.189119 0.414838 1.000000 0.216225 -0.349943 -0.583398

fare 0.257307 -0.549500 0.096067 0.159651 0.216225 1.000000 -0.182024 -0.271832

adult_male -0.557080 0.094035 0.280328 -0.253586 -0.349943 -0.182024 1.000000 0.404744

alone -0.203367 0.135207 0.198270 -0.584471 -0.583398 -0.271832 0.404744 1.000000

In [17]: # Correlation Matrix


sns.heatmap(x, annot=True, cmap="coolwarm")

<Axes: >
Out[17]:
TASK:
Generate general heatmap for "global_warming.csv" dataset. (For styling, set arguments as
much as you want)

In [18]: # Import libraries


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [19]: # load dataset


df = pd.read_csv("global_warming.csv")
df.head()

Out[19]: Country Country Indicator


Indicator Code 2000 2001 2002 2003 2004 20
Name Code Name

CO2
emissions
0 United USA
States (metric EN.ATM.CO2E.PC 20.178751 19.636505 19.613404 19.564105 19.658371 19.5918
tons per
capita)

CO2
emissions
1 United GBR
Kingdom (metric EN.ATM.CO2E.PC 9.199549 9.233175 8.904123 9.053278 8.989140 8.9829
tons per
capita)

2 India IND CO2 EN.ATM.CO2E.PC 0.979870 0.971698 0.967381 0.992392 1.025028 1.0685
emissions
(metric
tons per
capita)

CO2
emissions
3 China CHN (metric EN.ATM.CO2E.PC 2.696862 2.742121 3.007083 3.524074 4.037991 4.5231
tons per
capita)

CO2
emissions
Russian
4 RUS (metric EN.ATM.CO2E.PC 10.627121 10.669603 10.715901 11.090647 11.120627 11.2535
Federation
tons per
capita)

In [20]: # Drop non numeric columns and set country column as index
df = df.drop(columns=['Country Code', 'Indicator Name', 'Indicator Code'], axis=1).set_i
df.head()

Out[20]: 2000 2001 2002 2003 2004 2005 2006 2007 2008

Country
Name

United
20.178751 19.636505 19.613404 19.564105 19.658371 19.591885 19.094067 19.217898 18.461764 17
States

United
9.199549 9.233175 8.904123 9.053278 8.989140 8.982939 8.898710 8.617164 8.424424 7
Kingdom

India 0.979870 0.971698 0.967381 0.992392 1.025028 1.068563 1.121982 1.193210 1.310098 1

China 2.696862 2.742121 3.007083 3.524074 4.037991 4.523178 4.980314 5.334910 5.701915 6

Russian
10.627121 10.669603 10.715901 11.090647 11.120627 11.253529 11.669122 11.672457 12.014507 11
Federation

In [21]: # Set figure size


plt.figure(figsize= (16,9))

# Styling of color bar (Optional)


cbar = {"orientation":"vertical",
"shrink":1,
'extend':'min',
'extendfrac':0.1,
"ticks":np.arange(0,22),
"drawedges":True,
}

# Create heatmap
sns.heatmap(df, vmin = 0, vmax = 21, cmap="coolwarm", annot = True, linewidth = 2, cbar_

plt.title("Who is responsible for global warming", fontsize = 25)


plt.xlabel("CO2 Emissions (Metric Tons/Capita) Per Year", fontsize = 20)
plt.ylabel("Country Name", fontsize = 20)
plt.show()

You might also like