0% found this document useful (0 votes)

35 views13 pages

Summary: Introduction To Data Visualization Tools

Uploaded by

melikakhajeh94

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views13 pages

Summary: Introduction To Data Visualization Tools

Uploaded by

melikakhajeh94

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Summary: Introduction to Data Visualization Tools

Congratulations! You have completed this module. At this point in the course, you know:

 Data visualization is the process of presenting data in a visual format, such as charts, graphs, and maps,
to help people understand and analyze data easily.
 Data visualization has diverse use cases, such as in business, science, healthcare, and finance.
 It is important to follow best practices, such as selecting appropriate visualizations for the data being
presented, choosing colors and fonts that are easy to read and interpret, and minimizing clutter.
 There are various types of plots commonly used in data visualization.
 Line plots capture trends and changes over time, allowing us to see patterns and fluctuations.
 Bar plots compare categories or groups, providing a visual comparison of their values.
 Scatter plots explore relationships between variables, helping us identify correlations or trends.
 Box plots display the distribution of data, showcasing the median, quartiles, and outliers.
 Histograms illustrate the distribution of data within specific intervals, allowing us to understand its shape
and concentration.
 Matplotlib is a plotting library that offers a wide range of plotting capabilities.
 Pandas is a plotting library that provides Integrated plotting functionalities for data analysis.
 Seaborn is a specialized library for statistical visualizations, offering attractive default aesthetics and
color palettes.
 Folium is a Python library that allows you to create interactive and customizable maps.
 Plotly is an interactive and dynamic library for data visualization that supports a wide range of plot types
and interactive features.
 PyWaffle enables you to visualize proportional representation using squares or rectangles.
 Matplotlib is one of the most widely used data visualization libraries in Python.
 Matplotlib was initially developed as an EEG/ECoG visualization tool.
 Matplotlib’s architecture is composed of three main layers: Backend layer, Artist layer, and the Scripting
layer.
 The anatomy of a plot refers to the different components and elements that make up a visual
representation of data.
 Matplotlib is a well-established data visualization library that can be integrated in different environments.
 Jupyter Notebook is an open-source web application that allows you to create and share documents.
 Matplotlib has a number of different backends available.
 You can easily include the label and title to your plot with plt.
 In order to start creating different types of plots of the data, you will need to import the data into a
Pandas DataFrame.
 A line plot is a plot in the form of a series of data points connected by straight line segments.
 Line plot is one of the most basic type of chart and is common in many fields.
 You can generate a line plot by assigning "line" to 'Kind' parameter in the plot() function.

Data Visualization with Python

Cheat Sheet : Data Preprocessing Tasks in Pandas
Task Syntax Description Example

Load CSV data pd.read_csv('filename.csv') Read data from a df_can=pd.read_csv('data.csv')

CSV file into a
Task Syntax Description Example

Pandas DataFrame

Handling Drop rows with

df.dropna() df_can.dropna()
Missing Values missing values

Fill missing values

df.fillna(value) with a specified df_can.fillna(0)
value

Removing Remove duplicate

df.drop_duplicates() df_can.drop_duplicates()
Duplicates rows

Renaming df.rename(columns={'old_name': Rename one or

df_can.rename(columns={'Age': 'Years'})
Columns 'new_name'}) more columns

Selecting Select a single

df['column_name'] or df.column_name df_can.Age or df_can['Age]'
Columns column

Select multiple
df[['col1', 'col2']] df_can[['Name', 'Age']]
columns

Filter rows based

Filtering Rows df[df['column'] > value] df_can[df_can['Age'] > 30]
on a condition

Applying Apply a function to

Functions to df['column'].apply(function_name) transform values in df_can['Age'].apply(lambda x: x + 1)
Columns a column

Create a new
Creating New column with values df_can['Total'] = df_can['Quantity'] *
df['new_column'] = expression
Columns derived from df_can['Price']
existing ones

Grouping and df.groupby('column').agg({'col1': Group rows by a df_can.groupby('Category').agg({'Total':

Task Syntax Description Example

column and apply

Aggregating 'sum', 'col2': 'mean'}) 'mean'})
aggregate functions

df.sort_values('column', Sort rows based on

Sorting Rows ascending=True/False)
df_can.sort_values('Date', ascending=True)
a column

Show the first n

Displaying First
df.head(n) rows of the df_can.head(3)
n Rows
DataFrame

Show the last n

Displaying Last
df.tail(n) rows of the df_can.tail(3)
n Rows
DataFrame

Check for null

Checking for
df.isnull() values in the df_can.isnull()
Null Values
DataFrame

Selecting Rows Select rows based

df.iloc[index] df_can.iloc[3]
by Index on integer index

Select rows in a
df.iloc[start:end] df_can.iloc[2:5]
specified range

Select rows based

Selecting Rows
df.loc[label] on label/index df_can.loc['Label']
by Label
name

Select rows in a
df.loc[start:end] specified df_can.loc['Age':'Quantity']
label/index range

Summary df.describe() Generates df_can.describe()

Task Syntax Description Example

descriptive
Statistics statistics for
numerical columns

Cheat Sheet : Plot Libraries

Programming Level of Types of Plots
Library Main Purpose Key Features Dashboard Capabilities
Language Customization Possible

Line plots, scatter

Comprehensive plot Requires additional plots, bar charts,
General-purpose
Matplotlib types and variety of Python High components and histograms, pie
plotting
customization options customization charts, box plots,
heatmaps, etc.

Fundamentally used for Line plots, scatter

Easy to plot directly Can be combined with web
data manipulation but plots, bar charts,
Pandas on Panda data Python Medium frameworks for creating
also has plotting histograms, pie
structures dashboards
functionality charts, box plots, etc.

Heatmaps, violin
Can be combined with other
Statistical data Stylish, specialized plots, scatter plots,
Seaborn Python Medium libraries to display plots on
visualization statistical plot types bar plots, count plots,
dashboards
etc.

Line plots, scatter

Dash framework is dedicated
Interactive data interactive web-based Python, R, plots, bar charts, pie
Plotly High for building interactive
visualization visualizations JavaScript charts, 3D plots,
dashboards
choropleth maps, etc.
Programming Level of Types of Plots
Library Main Purpose Key Features Dashboard Capabilities
Language Customization Possible

For incorporating maps into

Choropleth maps,
Geospatial data Interactive, dashboards, it can be
Folium Python Medium point maps,
visualization customizable maps integrated with other
heatmaps, etc.
frameworks/libraries

Can be combined with other Waffle charts, square

PyWaffle Plotting Waffle charts Waffle charts Python Low libraries to display waffle chart pie charts, donut
on dashboards charts, etc.

pandas is an essential data analysis toolkit for Python. From their website:
https://pandas.pydata.org/pandas-docs/stable/reference/index.html
pandas Basics:
The first thing we'll do is install openpyxl (formerly xlrd), a module
that pandas requires to read Excel files.
!mamba install openpyxl==3.0.9 -y
df_can = pd.read_excel(
'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data
%20Files/Canada.xlsx',

sheet_name='Canada by Citizenship',

skiprows=range(20),

skipfooter=2)

print('Data read into a pandas dataframe!')

head() function.
When analyzing a dataset, it's always a good idea to start by getting basic information about your dataframe. We
can do this by using the info() method.

Info()function df_can.info(verbose=False)
Columns: clist of column headers df_can.columns
df_can.index get the list of indices we use the .index instance variables.

**Note: The default type of intance variables index and columns are NOT list.*
tolist():To get the index and columns as lists, we can use the tolist() method.
df_can.columns.tolist() df_can.index.tolist()
print(type(df_can.columns.tolist()))
print(type(df_can.index.tolist()))
shape: To view the dimensions of the dataframe, we use the shape instance variable of it.
df_can.shape
drop() Let's clean the data set to remove a few unnecessary columns. We can use pandas drop() method as
follows: df_can.drop(['AREA','REG','DEV','Type','Coverage'], axis=1, inplace=True)

df_can.head(2)
rename():Let's rename the columns so that they make sense. We can use rename() method by passing in a
dictionary of old and new names as follows:

df_can.rename(columns={'OdName':'Country', 'AreaName':'Continent',
'RegName':'Region'}, inplace=True)
df_can.columns
Adding a column : We will also add a 'Total' column that sums up the total immigrants by country over the
entire period 1980 - 2013, as follows:

df_can['Total'] = df_can.sum(axis=1) df_can['Total']

----------------------------
df_can.isnull().sum() We can check to see how many null objects we have in the dataset as follows:

df_can.describe()
pandas Intermediate: Indexing and Selection
(slicing)
Select Column¶
There are two ways to filter on a column name:
Method 1: Quick and easy, but only works if the column name does NOT have spaces or special characters.

df.column_name # returns series

Method 2: More robust, and can filter on multiple columns.

df['column'] # returns series

df[['column 1', 'column 2']] # returns dataframe

Example: Let's try filtering on the list of countries ('Country' ). df_can.Country # returns a series
df_can[['Country', 1980, 1981, 1982, 1983, 1984, 1985]] # returns a dataframe
Let's try filtering on the list of countries ('Country') and the data for years: 1980 - 1985.

Select Row
There are main 2 ways to select rows:

df.loc[label] # filters by the labels of the index/column

df.iloc[index] # filters by the positions of the index/column

[ ]:
In Matplotlib, backends are the components that handle the rendering of plots. They determine how figures are displayed or saved,
and they can be categorized into two main types: interactive backends and non-interactive backends. Here’s a detailed explanation
of each type of backend and its role:

1. Interactive Backends
Interactive backends allow for real-time interaction with plots. They enable features like zooming, panning, and
updating plots dynamically. Here are some common interactive backends:

TkAgg:
Role: Uses the Tkinter library for creating GUI applications.
Usage: Suitable for desktop applications where you want to display plots in a window.
import matplotlib
2matplotlib.use('TkAgg')
3import matplotlib.pyplot as plt

Qt5Agg:
Role: Utilizes the Qt framework for creating interactive applications.
Usage: Ideal for applications that require a modern GUI and advanced features.
import matplotlib
2matplotlib.use('Qt5Agg')
3import matplotlib.pyplot as plt

GTK3Agg:
Role: Uses the GTK+ toolkit for creating graphical user interfaces.
Usage: Commonly used in Linux environments.
import matplotlib
2matplotlib.use('GTK3Agg')
3import matplotlib.pyplot as plt

Non-Interactive Backends:
Non-interactive backends are used for generating static images without displaying them on the screen. They are
useful for saving plots to files. Here are some common non-interactive backends:

Agg:
Role: A raster graphics backend that generates images in formats like PNG, JPEG, etc.
Usage: Ideal for saving plots to files without displaying them.
import matplotlib
2matplotlib.use('Agg')
3import matplotlib.pyplot as plt

PDF:

Role: Generates vector graphics in PDF format.

Usage: Useful for creating high-quality documents and publications.
import matplotlib
2matplotlib.use('PDF')
3import matplotlib.pyplot as plt

SVG:
Role: Generates vector graphics in SVG format.
Usage: Suitable for web applications and scalable graphic

Choosing a Backend
For Interactive Use: Choose an interactive backend like TkAgg, Qt5Agg, or MacOSX.

For Saving Plots: Use a non-interactive backend like Agg, PDF, or SVG.

Cheat Sheet Data Preprocessing Tasks in Pandas
100% (1)
Cheat Sheet Data Preprocessing Tasks in Pandas
2 pages
(Foundations of Game Engine Development 2) Eric Lengyel - Foundations of Game Engine Development Volume 2 Rendering (2019, Terathon Software)
No ratings yet
(Foundations of Game Engine Development 2) Eric Lengyel - Foundations of Game Engine Development Volume 2 Rendering (2019, Terathon Software)
409 pages
Elements and Principles of Design Quiz
83% (6)
Elements and Principles of Design Quiz
6 pages
Cgarena: Photoshop After Effects 3dsmax Gallery Interview Maya
100% (1)
Cgarena: Photoshop After Effects 3dsmax Gallery Interview Maya
56 pages
English 10: Quarter 2 - Week 6
No ratings yet
English 10: Quarter 2 - Week 6
84 pages
Python Pandas and Matplotlib 7
100% (3)
Python Pandas and Matplotlib 7
72 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
96 pages
Unity Asset Shader FlatKit Manual
No ratings yet
Unity Asset Shader FlatKit Manual
17 pages
Computer Graphics Viva Question
No ratings yet
Computer Graphics Viva Question
8 pages
Tableau Fundamental or Beginners
No ratings yet
Tableau Fundamental or Beginners
13 pages
1.8223architecture Graphics Presentation
No ratings yet
1.8223architecture Graphics Presentation
20 pages
Data Visualization Python Tutorial
100% (1)
Data Visualization Python Tutorial
9 pages
St. Joseph Academy San Jose, Batangas First Quarter S.Y. 2018 - 2019 Diary Curriculum Map in Arts Ix
No ratings yet
St. Joseph Academy San Jose, Batangas First Quarter S.Y. 2018 - 2019 Diary Curriculum Map in Arts Ix
13 pages
Raster Image: Q2: What Is Raster and Vector Images? Explain With Example
No ratings yet
Raster Image: Q2: What Is Raster and Vector Images? Explain With Example
4 pages
Diagnostic-Test English5 2022-2023
No ratings yet
Diagnostic-Test English5 2022-2023
5 pages
Datascienece
No ratings yet
Datascienece
18 pages
Chapter 4 Drawing Basics
No ratings yet
Chapter 4 Drawing Basics
41 pages
Lesson 14: Adding Filters
No ratings yet
Lesson 14: Adding Filters
21 pages
Toa 1 - Theory of Colors
No ratings yet
Toa 1 - Theory of Colors
4 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Create A Magma Hot Text Effect in Photoshop
No ratings yet
Create A Magma Hot Text Effect in Photoshop
20 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Readings
No ratings yet
Readings
95 pages
Pandas 1702216043
No ratings yet
Pandas 1702216043
86 pages
S.Y.B.F.a. Painting
No ratings yet
S.Y.B.F.a. Painting
17 pages
02 Opengl PDF
No ratings yet
02 Opengl PDF
7 pages
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
No ratings yet
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
11 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Activities and Summative Test in Mapeh Second Q Final
100% (1)
Activities and Summative Test in Mapeh Second Q Final
45 pages
Compiled Computer Multimedia 2
No ratings yet
Compiled Computer Multimedia 2
12 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
ML Expt 1 Description
No ratings yet
ML Expt 1 Description
15 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
BDA File
No ratings yet
BDA File
26 pages
Machine Learning Experiment
No ratings yet
Machine Learning Experiment
69 pages
Module1-Cheat-Sheet-LINE PLOT
No ratings yet
Module1-Cheat-Sheet-LINE PLOT
3 pages
Spatial Management of Data
No ratings yet
Spatial Management of Data
20 pages
G-Clamp Assessment Notification
No ratings yet
G-Clamp Assessment Notification
2 pages
Datavischeatsheet
No ratings yet
Datavischeatsheet
2 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Pandas
No ratings yet
Pandas
13 pages
2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Pandas
No ratings yet
Pandas
12 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
CG DADL - 2024 June - Lecture 02
No ratings yet
CG DADL - 2024 June - Lecture 02
64 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Server Hosting Management System (Ip Class 12) (2024-25)
No ratings yet
Server Hosting Management System (Ip Class 12) (2024-25)
21 pages
Lesson 1 - Data Visualisation
No ratings yet
Lesson 1 - Data Visualisation
35 pages
Pandas
No ratings yet
Pandas
25 pages
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
Computer Graphics PROJRCT
No ratings yet
Computer Graphics PROJRCT
13 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
Multiple Choice Questions
No ratings yet
Multiple Choice Questions
6 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
Unit - 3 Window & View Port CGM
No ratings yet
Unit - 3 Window & View Port CGM
4 pages
Data Frame
No ratings yet
Data Frame
95 pages
DMV Unit-4-1 PDF
No ratings yet
DMV Unit-4-1 PDF
10 pages
Cheat Sheet
No ratings yet
Cheat Sheet
15 pages
Data Visualization With Matplotlib
No ratings yet
Data Visualization With Matplotlib
20 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
Lec 19
No ratings yet
Lec 19
14 pages
Lighting - Unreal 5 Features Reference - by Iri Shinsoj - Medium
No ratings yet
Lighting - Unreal 5 Features Reference - by Iri Shinsoj - Medium
1 page
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
Class 1 Data Visualization in Python Using Matplotlib
No ratings yet
Class 1 Data Visualization in Python Using Matplotlib
13 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
Usage of NumPy For Numerical Data in Detail
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
Designing Your School Paper
No ratings yet
Designing Your School Paper
85 pages
Python Comands
No ratings yet
Python Comands
3 pages
Lesson 1 Online File Formats and Principles of Image Manipulation
No ratings yet
Lesson 1 Online File Formats and Principles of Image Manipulation
36 pages
3D Photorealistic Rendering. Volume 1, Interiors & Exteriors With V-Ray & 3ds Max 1st Edition Cardoso
No ratings yet
3D Photorealistic Rendering. Volume 1, Interiors & Exteriors With V-Ray & 3ds Max 1st Edition Cardoso
68 pages
DVA Practical
No ratings yet
DVA Practical
19 pages
Data Preprocessing Tasks in Pandas PYTHON
No ratings yet
Data Preprocessing Tasks in Pandas PYTHON
2 pages
Wa0005.
No ratings yet
Wa0005.
29 pages
Unit 2
No ratings yet
Unit 2
36 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Data Visualisation
No ratings yet
Data Visualisation
5 pages
Multimedia Basics Lesson 1
No ratings yet
Multimedia Basics Lesson 1
3 pages
Datascience
No ratings yet
Datascience
26 pages
Pandas 3-2
No ratings yet
Pandas 3-2
27 pages
Basic Plotting
No ratings yet
Basic Plotting
8 pages
ML Week 7
No ratings yet
ML Week 7
12 pages

Summary: Introduction To Data Visualization Tools

Uploaded by

Summary: Introduction To Data Visualization Tools

Uploaded by

Summary: Introduction to Data Visualization Tools

Data Visualization with Python

Load CSV data pd.read_csv('filename.csv') Read data from a df_can=pd.read_csv('data.csv')

Handling Drop rows with

Fill missing values

Removing Remove duplicate

Renaming df.rename(columns={'old_name': Rename one or

Selecting Select a single

Filter rows based

Applying Apply a function to

Grouping and df.groupby('column').agg({'col1': Group rows by a df_can.groupby('Category').agg({'Total':

column and apply

df.sort_values('column', Sort rows based on

Show the first n

Show the last n

Check for null

Selecting Rows Select rows based

Select rows based

Summary df.describe() Generates df_can.describe()

Cheat Sheet : Plot Libraries

Line plots, scatter

Fundamentally used for Line plots, scatter

Line plots, scatter

For incorporating maps into

Can be combined with other Waffle charts, square

print('Data read into a pandas dataframe!')

df_can['Total'] = df_can.sum(axis=1) df_can['Total']

df.column_name # returns series

df['column'] # returns series

df.loc[label] # filters by the labels of the index/column

Role: Generates vector graphics in PDF format.

You might also like