0% found this document useful (0 votes)
2 views

lecture-week3

The document outlines essential objectives and techniques for data visualization using Python libraries such as Pandas and Seaborn. It covers various plot types, including line, scatter, bar, histogram, and box plots, along with methods to enhance these visualizations. Additionally, it emphasizes the importance of data visualization for understanding relationships, spotting outliers, and effectively communicating data insights.

Uploaded by

trminhselflearn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

lecture-week3

The document outlines essential objectives and techniques for data visualization using Python libraries such as Pandas and Seaborn. It covers various plot types, including line, scatter, bar, histogram, and box plots, along with methods to enhance these visualizations. Additionally, it emphasizes the importance of data visualization for understanding relationships, spotting outliers, and effectively communicating data insights.

Uploaded by

trminhselflearn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Essentials for

Data Visualization
SIT112 | Data Science Concepts
Lecture Week 3
Objectives
1. Use the Pandas plot() method to create these types of plots: line plot, area plot, scatter plot, bar plot,
histogram, density plot, box plot, pie plot
2. Use the parameters of the Pandas plot() method to enhance a plot in these ways: add a title, x and y labels,
and grid lines, rotate the tick labels, set the x- and y-axis limits
3. Use the parameters of the Pandas plot() method to create a plot that has subplots.
4. Chain the Pandas plot() method to methods that prepare the data for the plot() method.
5. Use the Seaborn methods to create these types of plots: line plot, scatter plot, bar plot, box plot, histogram, KDE
plot, ECDF plot
6. Use Seaborn parameters to create a plot with subplots.
7. Use the methods of the Axes object to enhance a plot in these ways: add a title and the labels for the x- and y-axis,
set the ticks for a plot, set the x and y limits, add grid lines, annotate a plot, set the color palette
Why Data
Visualization? UNDERSTAND YOUR SEE THE SPOT UNUSUAL
DATA MORE EASILY. RELATIONSHIPS DATAPOINTS LIKE
BETWEEN VARIABLES. OUTLIERS.
Data Visualization Libraries in Python
Data Visualization with Pandas
Data visualization with Pandas
● Get the data
● Line plot
● Rotating the y-axis
● Scatter plot
● Area plot
● Bar plot
● Histogram
● Density plot
● Box plot
● Pie chart
● Subplots
● Changing units of measurement in figsize
● Chaining
Get the Data:
Long vs Wide
Line Plot of the Wide Data

• Useful for showing trends and changes


in data over time.
• They are also useful for showing the
relationship between two variables.
Line Plot of All Columns
Line Plot – Improve the Appearance
Line Plot – Improve the Appearance
How to rotate the y-axis label?

Ask ChatGPT ☺
Scatter Plot

• Useful for showing the relationship


between two variables.
• They can be used to identify
patterns or trends in the data.
Area Plot

• Useful for showing changes in data


over time or comparing data across
different groups.
Bar Plot

• Particularly useful for showing changes


in data over time or comparing data
across different groups.
Bar Plot - Horizontal
Histogram

• Useful for showing the distribution


of data.
• Particularly useful for identifying
the range and frequency of values
in a dataset.
Density Plot
• A graphical representation of the
probability density function of a
continuous random variable.
• The main use of a density plot is to
visualize the distribution of a continuous
variable.
• It is similar to a histogram but instead of
showing the number of observations
within each bin, it shows the probability
density of the variable at different values.
• It allows us to see the shape of the
distribution, including its central
tendency, spread, and skewness.
Density Plot (Cont.)

• Also useful for comparing the


distributions of different groups or
variables; by overlaying multiple
density plots, we can see how their
distributions differ.
Box Plot
• The box represents the interquartile range
(IQR), which is the range of values between
the first and third quartiles (Q1 and Q3).
• The line inside the box represents the median
(Q2).
• The whiskers extending from the box
represent the range of values within 1.5 times
the IQR. Any values beyond the whiskers are
considered potential outliers and are plotted
as individual points.
• Useful for showing the distribution of data
and identifying outliers.
• Particularly useful for comparing data
between different groups.
Pie Chart
• Displays the relative sizes of different
categories or parts of a whole as slices of a pie.
• Commonly used to show the composition of a
categorical variable and are useful for
conveying simple and easily understandable
information.
• Used in business, finance, and marketing to
show sales figures, market share, and budget
allocations.
• Pie charts can be less effective than other
types of charts when trying to display a large
amount of data, or when the categories are not
easily distinguishable.
Subplots
Subplots
(Cont.)
How to change the units of measurement in
figsize?

In the figsize parameter of the plot()


function, the units of measurement are
indeed inches by default. However, it is
possible to specify a different unit of
measurement by using a string that
specifies the units. How?

Ask ChatGPT ☺
Chaining
Chaining
(Cont.)
Movie Time

Top 5 Python Libraries for Data Visualization


Data Visualization with Seaborn
Data Visualization with Seaborn
● Imports, Get the data
● Line plot, Subplot
● Set the ticks, x limits and y limits, Set the background style
● Save a plot to a file
● Scatter plot, Bar plot, Bar plot horizontal
● Box plot, Box plot horizontal
● Histogram, Histogram – custom bins
● Density function – Kernel, Distribution, Estimate
● Empirical cumulative distribution function
● Enhanced distribution plots
● Annotate a plot, Set the plot size
● Custom titles for subplots
Imports
Get the Data
Line Plot
Seaborn has several built-in
palettes that can be used to
customize the color of a plot:
• deep
• muted
• pastel
• bright
• dark
• colorblind
Subplot
Set the ticks,
x limits, and
y limits
Set the
Background Style
There are several other values that can be
assigned to sns.set_style() to customize the
style of the Seaborn plots:
• "darkgrid": Sets a dark background with
grid lines.
• "whitegrid": Sets a white background
with grid lines.
• "dark": Sets a dark background with no
grid lines.
• "white": Sets a white background with no
grid lines.
• "ticks": Sets tick marks on the axes.
Save a Plot to
a File
Scatter Plot
Bar Plot
Bar Plot -
Horizontal
Box Plot
Box Plot -
Horizontal
Histogram
Histogram –
Custom Bins
Density
Function-
Kernel
Distribution
Estimate
Empirical Cumulative Distribution Function
• The x-axis represents the values of
the variable.
• The y-axis represents the
proportion of data points that have
a value less than or equal to the
corresponding x-value.
• The plot is useful for visualizing
the distribution of a variable and
identifying important features such
as the median and the range of the
data.
Enhanced
Distribution
Plots
Enhanced
Distribution
Plots (Cont.)
Annotate a
Plot
Set the Plot
size
Custom Titles
for Subplots
End of lecture …

You might also like