0% found this document useful (0 votes)

77 views16 pages

Essential Python Data Visualization Libraries 1687141550

Uploaded by

boda prasanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views16 pages

Essential Python Data Visualization Libraries 1687141550

Uploaded by

boda prasanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

4 Python Data Visualization Libraries

You Can’t Do Without

Matplotlib, seaborn, Plotly, and pandas - the 4 Python data visualization libraries
you can’t do without. Learn how to use them with our code examples.

In this article, I will explain data visualization libraries in Python in detail. We'll explore some of
the most popular Python data visualization libraries, such as Matplotlib, seaborn, Plotly, and
pandas. We'll see the strengths and weaknesses of each library and provide practical examples
of how to use them to create compelling visualizations.

Data visualization is communicating your findings with graphs and presenting the information
visually. It’s a must-have skill for any data scientist, as it’s a regular part of a data science cycle.
In each cycle part, there’s at least one Python library you should know (check out all the libraries
that you should know here
https://www.stratascratch.com/blog/top-18-python-libraries-a-data-scientist-should-know/). The
one that will help you successfully do the work required.
It is preceded by data collection, for which you can use several data collection libraries in
Python. However, collecting data is just the first step in the process of making predictions and
gaining insights.

Now, let’s focus on data visualization and see how Python libraries help you here.

What is Data Visualization in Data Science

Data visualization is an essential tool for data science that helps data scientists explore,
analyze, and communicate data.

Data science uses data visualization to examine large data sets and identify trends, patterns,
and relations in the data.

It could be useful for building machine learning models or for other purposes.

When visualizing data, you will create different types of graphs, like line plots, scatter plots, and
bar charts. They help data scientists understand and check the data trends and patterns.
Aside from choosing the right chart to visualize your data, you’ll also have to choose between
many design options. These include selecting different color schemes and labeling axes, titles,
and legends.

These chart and design options should be used for focusing on important information the chart
shows.

Choosing the Right Graph

Here, you can see an overview of the most typical graphic used in data science, with their
characteristics. This will help you find adequate data visualization for your project.
4 Python Libraries for Data Visualization
In this section, I will discuss the four most common Python data visualization libraries. I’ll also
show you some examples that can help you learn how to visualize data.

Here’s the overview of these four libraries.

Let’s talk about each, and then we’ll go to the coding example to show you each library’s syntax
and usage.
Python Data Visualization Library #1: Matplotlib

Matplotlib is a famous plotting library for creating visualizations in Python. John Hunter created it
in 2002.

The visuals in Matplotlib can be animation or interactive visualization.

It’s a library widely used in data science and scientific computing and is a core library for many
other Python data visualization libraries.

You can see its usage in data science below. It is a scatter graph that shows the population of
California cities.

Source: https://www.oreilly.com/library/view/python-data-science/9781491912126/

Here is the official Matplotlib library website (https://matplotlib.org/).

Data Visualization in Matplotlib
Let’s start with the basics. In this example, we will draw a stack plot to analyze the world cup
matches data, like goals, offsides, or fouls on different days.

Import the Libraries

We’ll import the library, together with NumPy and pandas.

import pandas as pd
import numpy as np
import matplotlib

Defining data
We’ll then add two variables.
The x variable is a list of strings that specifies the day for the plot.
The df variable is a pandas DataFrame containing the plot's statistical data.

x = ["Monday","Tuesday", "Wednesday", "Thursday"]

df = pd.DataFrame({'goals': [18, 22, 19, 14],

'offsides': [5, 7, 7, 9],
'fouls': [11, 8, 10, 6]})

Create Figures and Plot Data

The columns of the DataFrame represent the different data series that will be plotted in the
stack plot.
The code then creates a figure and axes object using the plt.subplots() function, and plots the
data using the ax.stackplot() function with labels argument, because it will be useful for us to
add legends.
The x-axis labels and y-axis data for each data series are passed as arguments to the
stackplot() function.
# Create a figure and axes
fig, ax = plt.subplots()

# Plot the data

ax.stackplot(x, df['goals'], df['offsides'],df['fouls'], labels=['Goals',
'Offsides', 'Fouls'])

Add Legend and Show The Plot

The ax.set() function is used to add a title and axis labels to the plot.
We will use the legend() function with loc and fontsize arguments. This defines where we put
the legend and what its size is.
Finally, the plt.show() function displays the plot.

# Add a title and axis labels

ax.set(title="World Cup Analysis", xlabel="Match Day", ylabel="Match
Statistics")

ax.legend(loc='upper right',fontsize=8)

# Show the plot

plt.show()

The resulting stack plot shows the cumulative values of the goals, offsides, and fouls statistics
for each match day.
The y-axis values represent each match's total number of goals, offsides, and fouls.
The different data series are stacked on top of each other to show the cumulative values.
This code creates a stack plot using matplotlib, NumPy, and pandas libraries.

Output
Here is what the graph looks like.
Python Data Visualization Library #2: seaborn

seaborn is a python data visualization library based on Matplotlib. It’s a higher-level library
designed explicitly for statistical visualization and is commonly used in conjunction with pandas
for data exploration and analysis. Michael Waskom created it in 2014.

When building a machine learning model in data science, you should detect and remove the
outliers. This technique will increase your model's performance. By drawing a distribution plot
like we’ll show you, you can detect outliers and set filters to remove them.

Here is the official seaborn website (https://seaborn.pydata.org/).

Data Visualization in Seaborn
We’ll use an iris dataset in this example. It’s a popular built-in data set that contains different
data about plants.

We will use a scatter plot to show the relations between petal length and sepal length.

Import the Libraries

As always, first import the library.

import seaborn as sns

Load the Dataset

The iris variable is a DataFrame that contains the Iris flower dataset, which is loaded using the
sns.load_dataset() function.

iris = sns.load_dataset('iris')

Also, many built-in datasets exist in Python libraries. You can access them by loading the
libraries.

Set Style
The sns.set_style function is used to set the plot style to "darkgrid".The code then creates a
scatter plot using the sns.scatterplot function.

sns.set_style("darkgrid")

Draw a Graph
The data argument specifies the DataFrame that contains the data for the plot, and the x and y
arguments specify the columns to use for the x-axis and y-axis data. The hue argument is used
to color the points by the values in the "species" column, and the legend argument is used to
show the full legend for the plot.

sns.scatterplot( data=iris, x='sepal_length', y='petal_length',

hue="species", legend="full")
Output

The resulting scatter plot shows the sepal length and petal length for each flower in the dataset,
with each flower species represented by a different color. The legend shows the mapping of
colors to species.

Python Data Visualization Library #3: Plotly

Plotly was created by Alex Johnson, Chris Parmer, and Jack Parmer in 2012. It is a library
commonly used in web applications and dashboarding and can be integrated with other
languages and frameworks, such as R, MATLAB, and Shiny.
You can use the Plotly library in data science to visualize PCA, see the regression line, draw roc
and pr curves, and more.

Here is Plotly’s official website (https://plotly.com/).

Data Visualization in Plotly

Let's see one example of data visualization in Plotly to understand it better.

Fenerbahce and Galatasaray are two famous football (soccer) teams in Turkey. We will visualize
one of their match results, including ball possession, fouls, and offsides, by creating a stacked
bar chart in the Plotly library.

Import the libraries

First, let’s load Plotly.

import plotly.graph_objects as go

Create Figures
Fig is a Figure object representing the plot, and the add_trace() function is used to add data
series to the plot. The code creates two data series, one for "Fenerbahce" and one for
"Galatasaray" teams.

Add Traces
Each data series is added using the add_trace() function, which takes the x-axis and y-axis
data as arguments, as well as the name of the data series and other properties, such as the
marker color and line style.
import plotly.graph_objects as go

fig = go.Figure()
fig.add_trace(go.Bar(
y=['Fauls', 'Offsides', 'Ball possession percentage'],
x=[20, 14, 52],
name='Fenerbahce',
orientation='h',
marker=dict(
color='rgba(21, 78, 139, 0.6)',
line=dict(color='rgba(246, 78, 139, 1.0)', width=3)
)
))
fig.add_trace(go.Bar(
y=['Fauls', 'Offsides', 'Ball possession percentage'],
x=[12, 18, 48],
name='Galatasaray',
orientation='h',
marker=dict(
color='rgba(210, 71, 80, 0.6)',
line=dict(color='rgba(58, 71, 80, 1.0)', width=3)
)
))

Update layout
The update_layout() function is then used to set the barmode argument to "stack", which
stacks the data series on top of each other to show the cumulative values. The title argument is
also set to specify the title of the plot.

fig.update_layout(barmode='stack', title="Match Statistics between

Fenerbahce and Galatasaray")

Finally, the show() function is used to display the plot.

fig.show()
Output

The resulting horizontal stacked bar chart shows the fouls, offsides, and ball possession
percentage for both teams.

The x-axis values represent the total number of fouls, offsides, and ball possession percentages
in the match.

The y-axis labels indicate which statistics are being plotted.

The different data series are stacked on top of each other to show the cumulative values.

Python Data Visualization Library #4: pandas

pandas was created by Wes McKinney in 2008. It’s a powerful tool for working with tabular data,
including data cleaning and analysis. Not only that, but it also works excellently when visualizing
data. It is often used with other python data visualization libraries, such as Matplotlib and
seaborn, to create rich, informative plots and charts.
Pandas provide somewhat less complex graphs than other visualization libraries. Yet, it still can
be used for different purposes in data science, like seeing data points with scatter plots or
looking at the distribution of the features by histogram and more.

Here is the official pandas website (https://pandas.pydata.org/docs/user_guide/index.html).

Data Visualization in pandas

In this example, we will create a bar chart that shows the salary of different job titles. We will
draw a bar chart by using pandas with matplotlib. It will show the salaries for different job titles.

Import Libraries
The code imports matplotlib.pyplot and renames it as plt. As I said, pandas is often used with
Matplotlib when drawing a graph.

The code also imports NumPy and pandas as np and pd, respectively.

import matplotlib.pyplot as plt

import numpy as np
import pandas as pd

Create Data
As a next step, the code creates a DataFrame object in pandas, which is a 2-dimensional
size-mutable, tabular data structure with rows and columns.

The DataFrame is constructed with a dictionary that contains two columns: "Job Title" and
"salary". The "Job Title" column contains the names of different job titles, and the "salary"
column shows the corresponding salaries for each job title.

df = pd.DataFrame({'Job Title': ['Machine Learning Engineer', 'Data

Scientist', 'Python Developer','Software Engineer','Full Stack Developer'
],'salary': [130000, 120000, 50000, 70000, 125000]})
Draw a Graph
After the DataFrame is created, the code uses the df.plot.barh() function to create a horizontal
bar chart from the data. The x and y parameters specify which columns should be used for the
x-axis and y-axis, respectively.

In this case, the "Job Title" column is used for the x-axis, and the "salary" column is used for the
y-axis. The title parameter is used to specify the chart's title, which in this case is "Salary
According to Job Titles".

ax = df.plot.barh(x='Job Title', y='salary', title = "Salary According to

Job Titles")

Finally, the code uses the xaxis attribute of the ax object (which represents the x-axis of the
chart) to set the major formatter for the x-axis labels.

This is used to specify the format of the x-axis labels. In this case, it formats the values as dollar
amounts with no decimal places.

ax.xaxis.set_major_formatter('${x:1.0f}')

Alright, let's see the output.

Output

The bar chart shows the salary of different job titles, and the salary values are formatted as
dollar amounts on the y-axis.

Conclusion
In this article, I explained the most popular Python data visualization libraries.

Python libraries, in general, have an essential role in data science and are a vital tool for data
scientists.

Matplotlib, seaborn, Plotly, and pandas are some of Python's important (and most used!) data
visualization libraries. You should be familiar with them if you’re serious about data science.

These libraries offer a wide range of features and allow customizations according to your project
needs. Whether a beginner or an experienced data scientist, learning and mastering these
libraries can increase your ability to communicate data through data visualization.

Bio 345 Evolution Activity 1-1 - Natural - Selection ASU
No ratings yet
Bio 345 Evolution Activity 1-1 - Natural - Selection ASU
8 pages
Data Visualization in Python Preview PDF
100% (9)
Data Visualization in Python Preview PDF
58 pages
Unit 4
No ratings yet
Unit 4
27 pages
DAP 5 Module
No ratings yet
DAP 5 Module
68 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
41 pages
Chapter 4 Data Visualizations
No ratings yet
Chapter 4 Data Visualizations
24 pages
V Unit
No ratings yet
V Unit
17 pages
Data Visualization With Matplotlib
No ratings yet
Data Visualization With Matplotlib
20 pages
Data Visualisation
No ratings yet
Data Visualisation
5 pages
Matplotlib Merged Merged
No ratings yet
Matplotlib Merged Merged
93 pages
Data Visualization Lesson
No ratings yet
Data Visualization Lesson
4 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
2 pages
Data Visualization Python Tutorial
100% (1)
Data Visualization Python Tutorial
9 pages
Unit 4 Python
No ratings yet
Unit 4 Python
12 pages
13 - Data Visualization
No ratings yet
13 - Data Visualization
15 pages
Class 1 Data Visualization in Python Using Matplotlib
No ratings yet
Class 1 Data Visualization in Python Using Matplotlib
13 pages
Jmis 26 4 167
No ratings yet
Jmis 26 4 167
9 pages
Unit 3 (Python)
No ratings yet
Unit 3 (Python)
29 pages
Data Visualization
No ratings yet
Data Visualization
31 pages
Day2Part2. DataVisualization
No ratings yet
Day2Part2. DataVisualization
29 pages
Pythonlibraries
No ratings yet
Pythonlibraries
7 pages
Data Visualizations in Python With Matplotlib: Sidita Duli, PHD
No ratings yet
Data Visualizations in Python With Matplotlib: Sidita Duli, PHD
6 pages
Data Visualization in Python With Libraries
No ratings yet
Data Visualization in Python With Libraries
28 pages
Introduction To Matplotlib Using Python For Beginners
No ratings yet
Introduction To Matplotlib Using Python For Beginners
14 pages
Scrib 1
No ratings yet
Scrib 1
7 pages
Ex1 - Plotting and Visualization Using Numpy and Pandas
No ratings yet
Ex1 - Plotting and Visualization Using Numpy and Pandas
14 pages
Data Visualization
No ratings yet
Data Visualization
25 pages
ML Assignment -1
No ratings yet
ML Assignment -1
7 pages
Data Visualisation in Python Using Matplotlib
No ratings yet
Data Visualisation in Python Using Matplotlib
54 pages
Data Visualization Using Matplotlib in Python
No ratings yet
Data Visualization Using Matplotlib in Python
15 pages
5a Introduction To Matplotlib Graphical Representation of Data 1 - PPTX - Lyst6765
No ratings yet
5a Introduction To Matplotlib Graphical Representation of Data 1 - PPTX - Lyst6765
11 pages
Matplotlib in Python
No ratings yet
Matplotlib in Python
23 pages
Matploib
No ratings yet
Matploib
24 pages
Unit 5
No ratings yet
Unit 5
81 pages
Unit 5 Python Notes HM
No ratings yet
Unit 5 Python Notes HM
59 pages
Data Visualization Using Matplotlib and Seaborn
No ratings yet
Data Visualization Using Matplotlib and Seaborn
28 pages
Data Visualization
No ratings yet
Data Visualization
11 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
12 pages
Session 7 - Data Visualization With Python
No ratings yet
Session 7 - Data Visualization With Python
17 pages
Data Mining - Week - 6
No ratings yet
Data Mining - Week - 6
7 pages
Artificial Intelligence - 14 - Data Visualization With Python
No ratings yet
Artificial Intelligence - 14 - Data Visualization With Python
58 pages
Datascienece
No ratings yet
Datascienece
18 pages
UNIT-5 Important Q-A
No ratings yet
UNIT-5 Important Q-A
22 pages
Unit 5
No ratings yet
Unit 5
16 pages
Description of Data Visualization Tools
No ratings yet
Description of Data Visualization Tools
15 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
21 pages
Data Visualisation Using Pyplot
No ratings yet
Data Visualisation Using Pyplot
20 pages
Pandas 3-2
No ratings yet
Pandas 3-2
27 pages
Lec 19
No ratings yet
Lec 19
14 pages
Visualization - Python Data Analysis
No ratings yet
Visualization - Python Data Analysis
13 pages
Matplotlib_ Visualization With Python — Data Science Notes
No ratings yet
Matplotlib_ Visualization With Python — Data Science Notes
5 pages
Python
No ratings yet
Python
29 pages
DAV Exp.1-8 Output
No ratings yet
DAV Exp.1-8 Output
19 pages
AIML Short Term Internship Session 9 Summary-1719044709410
No ratings yet
AIML Short Term Internship Session 9 Summary-1719044709410
14 pages
Data Visualization Notes
No ratings yet
Data Visualization Notes
7 pages
AI Lab4
No ratings yet
AI Lab4
25 pages
Pandas Cheat Sheet 2
No ratings yet
Pandas Cheat Sheet 2
12 pages
Matplotlib
No ratings yet
Matplotlib
9 pages
Unit 4 (2) Python
No ratings yet
Unit 4 (2) Python
27 pages
Machine Learning Experiment
No ratings yet
Machine Learning Experiment
69 pages
2.5 Quiz Project Management and Forecasting MGMT 420 MGMNT of Production Oprtns May 2021 E
No ratings yet
2.5 Quiz Project Management and Forecasting MGMT 420 MGMNT of Production Oprtns May 2021 E
3 pages
SNV GT3023SSC
No ratings yet
SNV GT3023SSC
1 page
R19M Tech ConstructionManagementSyllabus
No ratings yet
R19M Tech ConstructionManagementSyllabus
39 pages
Business Ethics Coke
No ratings yet
Business Ethics Coke
10 pages
Are You in The Line of Fire?: Examples of Hazardous Energy
No ratings yet
Are You in The Line of Fire?: Examples of Hazardous Energy
2 pages
Passiflora
No ratings yet
Passiflora
23 pages
Application of Coconut Fibres As Outer Eco-Insulat
No ratings yet
Application of Coconut Fibres As Outer Eco-Insulat
9 pages
Exp 4 - Hardness of Water
No ratings yet
Exp 4 - Hardness of Water
22 pages
Iek 217 - Individual Essay - Tan Yuen Yi (157601)
No ratings yet
Iek 217 - Individual Essay - Tan Yuen Yi (157601)
6 pages
Report Pelatihan VRV Ekspres
No ratings yet
Report Pelatihan VRV Ekspres
10 pages
KickAss Closings a Guide to Giving the Best Closing Argument of Your Life eBook and TestBank Bundle Instructor Test Bank
No ratings yet
KickAss Closings a Guide to Giving the Best Closing Argument of Your Life eBook and TestBank Bundle Instructor Test Bank
346 pages
Muhammad Shabbir: Key Skills
No ratings yet
Muhammad Shabbir: Key Skills
3 pages
Chapter:1 (RESOURCES) Class-8 Geography
No ratings yet
Chapter:1 (RESOURCES) Class-8 Geography
3 pages
Chapter 3
No ratings yet
Chapter 3
2 pages
2024 01 03 0.8017048920068606
No ratings yet
2024 01 03 0.8017048920068606
36 pages
Reading 2 Time-Series Analysis
No ratings yet
Reading 2 Time-Series Analysis
47 pages
CHEM1513 - 2024 Eng
No ratings yet
CHEM1513 - 2024 Eng
497 pages
Drystar DT2 B (English)
No ratings yet
Drystar DT2 B (English)
4 pages
Đề Thi Học Kì 1 - Lớp 10
No ratings yet
Đề Thi Học Kì 1 - Lớp 10
10 pages
GRE Piping Stress Analysis
100% (3)
GRE Piping Stress Analysis
14 pages
Psychology of Learning - Unit 14 - Week 11
No ratings yet
Psychology of Learning - Unit 14 - Week 11
3 pages
Lecture 1 Introduction To Maintenance Engineering
No ratings yet
Lecture 1 Introduction To Maintenance Engineering
42 pages
Transcrip E1000397G18
No ratings yet
Transcrip E1000397G18
1 page
Aisc Asimovv
No ratings yet
Aisc Asimovv
13 pages
Introduction Carp Circular Hatchery
No ratings yet
Introduction Carp Circular Hatchery
9 pages
ĐỀ KS NGHỆ AN LẦN 1
No ratings yet
ĐỀ KS NGHỆ AN LẦN 1
6 pages
MAPM8.800 Assessment2 Term2 Final 2024
No ratings yet
MAPM8.800 Assessment2 Term2 Final 2024
5 pages
5 - Thermodynamics - Adiabatic Flame Temperature
No ratings yet
5 - Thermodynamics - Adiabatic Flame Temperature
9 pages
Vertical Wicking
No ratings yet
Vertical Wicking
4 pages

Essential Python Data Visualization Libraries 1687141550

Uploaded by

Essential Python Data Visualization Libraries 1687141550

Uploaded by

4 Python Data Visualization Libraries

You Can’t Do Without

What is Data Visualization in Data Science

Choosing the Right Graph

Here’s the overview of these four libraries.

The visuals in Matplotlib can be animation or interactive visualization.

Here is the official Matplotlib library website (https://matplotlib.org/).

Import the Libraries

x = ["Monday","Tuesday", "Wednesday", "Thursday"]

df = pd.DataFrame({'goals': [18, 22, 19, 14],

Create Figures and Plot Data

# Plot the data

Add Legend and Show The Plot

# Add a title and axis labels

# Show the plot

Here is the official seaborn website (https://seaborn.pydata.org/).

Import the Libraries

import seaborn as sns

Load the Dataset

sns.scatterplot( data=iris, x='sepal_length', y='petal_length',

Python Data Visualization Library #3: Plotly

Here is Plotly’s official website (https://plotly.com/).

Data Visualization in Plotly

Import the libraries

fig.update_layout(barmode='stack', title="Match Statistics between

Finally, the show() function is used to display the plot.

The y-axis labels indicate which statistics are being plotted.

Python Data Visualization Library #4: pandas

Here is the official pandas website (https://pandas.pydata.org/docs/user_guide/index.html).

Data Visualization in pandas

import matplotlib.pyplot as plt

df = pd.DataFrame({'Job Title': ['Machine Learning Engineer', 'Data

ax = df.plot.barh(x='Job Title', y='salary', title = "Salary According to

Alright, let's see the output.

You might also like