5.1 Data Visualization I
5.1 Data Visualization I
canvas
Joining Multiple Sheets
• Drag two sheets to canvas to join them by the common field
(primary key)
• This is similar to the JOIN and keys in SQL
Data Types
• If our column names aren’t ideal, we can click on the drop
down arrow to the right of the name and select rename.
• Clicking on the data type icon allows us to change the
default data type for that column.
Live versus Extract
• Connecting live leaves the data in the database or file
• This is best when we want to leverage a high performance
database’s capabilities, or to get up-to-the-second changes
in data visualized in Tableau
Continuous Discrete
Shelves & Cards
• Finally, we have the shelves (or cards).
• A view can be built by dragging and dropping fields from the
data pane into the canvas directly, or onto the shelves.
Shelves
view
Cards
Demo I
Show a table of total sales and profits by region and by
market?
Basic Charts
Types of Charts
• There are so many different types of charts and graphs.
• How do you choose the right chart?
• The following page provides with some good suggestions
https://blog.hubspot.com/marketing/data-visualization-
choosing-chart
• Some of the contents in the following slides are from the
above page
Bar Charts
Bar Charts
• A bar chart (or a column chart) is used to show a
comparison among different items, or it can show a
comparison of items over time
• Design Best Practices for Bar Charts:
Use consistent colors throughout the chart, selecting accent
colors to highlight meaningful data points or changes over
time.
Use horizontal labels to improve readability.
Start the y-axis at 0 to appropriately reflect the values in your
graph.
Demo I
Which product category has the worst sales/profits?
Stacked Bar Charts
Stacked Bar Charts
• A stacked bar chart should be used to compare many
different items and show the composition of each item
being compared.
• Design Best Practices for Stacked Bar Charts:
Best used to illustrate part-to-whole relationships.
Use contrasting colors for greater clarity.
Make chart scale large enough to view group sizes in relation
to one another.
Demo I
Is shipping cost increasing over time?
Scatter Plots
Scatter Plots
• A scatter plot will show the relationship between two
different variables or it can reveal the distribution trends. It
should be used when there are many different data points,
and you want to highlight similarities in the data set.
• This is useful when looking for outliers or for understanding
the distribution of your data.
• Design Best Practices for Scatter Plots:
Include more variables, such as different sizes, to incorporate
more data.
Start y-axis at 0 to represent data accurately.
If you use trend lines, only use a maximum of two to make
your plot easy to understand.
Demo I
What is the relationship between discount and profit?
Line Charts
Line Charts
• A line chart reveals trends or progress over time and can be
used to show many different categories of data. You should
use it when you chart a continuous data set.
• Line charts are often used for looking at how something
changes over time.
• Design Best Practices for Line Charts:
Use solid lines only.
Don't plot more than four lines to avoid visual distractions.
Demo I
Show monthly sales and forecast sales for the next year.
Dual Axis Charts
Dual Axis Charts
• A dual axis chart allows you to plot data using two y-axes
and a shared x-axis. It's used with three data sets, one of
which is based on a continuous set of data and another
which is better suited to being grouped by category.
• Design Best Practices for Dual Axis Charts:
Use the y-axis on the left side for the primary
variable because brains are naturally inclined to look left first.
Use different graphing styles to illustrate the two data sets, as
illustrated above.
Choose contrasting colors for the two data sets.
Demo I
Sales and profits in the same graph?
Histogram
Histogram
• A histogram is a plot that lets you discover, and show, the
underlying frequency distribution of a set of continuous
data.
• This allows the inspection of the data for its underlying
distribution (e.g., normal distribution), outliers, skewness,
etc.
• To construct a histogram from a continuous variable you
first need to split the data into intervals, called bins.
Demo I
What is the frequency of discounts across different product
categories?
Heat Maps
Heat Maps
• A heat map shows the relationship between two items and
provides rating information, such as high to low or poor to
excellent.
• The rating information is displayed using varying colors or
saturation.
• Design Best Practices for Heat Maps:
Use a basic and clear map outline to avoid distracting from
the data.
Use a single color in varying shades to show changes in data.
Avoid using multiple patterns.
Demo I
How does profit margin look like across different product
categories?
Pie Charts
Pie Charts
• A pie chart shows a static number and how categories
represent part of a whole -- the composition of something.
• A pie chart represents numbers in percentages, and the
total sum of all segments needs to equal 100%.
• Design Best Practices for Pie Charts:
Don't illustrate too many categories to ensure differentiation
between slices.
Ensure that the slice values add up to 100%.
Order slices according to their size.
Demo I
Which market generates more profits?
Box Plot
Box Plots
• Box plots visually show the
distribution of numerical
data and skewness through
displaying the data quartiles
(or percentiles) and
averages.
• They present five statistics:
min, first quartile, median,
third quartile, max.
• They can also help spot
outliers.
Demo I
Use boxplot to show the distribution of profit margins in the
Canada and USA markets.
Bullet Graphs
Bullet Graphs
• Bullet graphs are a variation of bar charts with the additions
of reference lines and reference arears.
• They help illustrate the relationship between two measures.
• They are usually used to compare actual value with target
value.
• For instance, the actual sales fall between 60% and 80% of
the target sales, which can be a risk signal to decision
makers.
Demo I
Use a bullet graph to compare sales in 2013 with sales in 2012,
which serves as the base.
Dumbbell Chart
Dumbbell Charts
• Also called DNA charts. They are used to demonstrate the
changes/trends between two data points.
• A dumbbell associates two dots with a line, therefore, it is a
dual axis chart.
Demo I
Use a dumbbell chart to show sales in 2012 and sales in 2013.
Exercise
• Make use of “Music Sale_data.xlsx”, make visualizations to
show best artists, best-selling genre, etc.
• See some examples in the next two slides.
Tableau Resources
• Find dataset from HKSAR
https://data.gov.hk/en/
• Find Student Resource from Tableau
https://community.tableau.com/docs/DOC-10635
• Student Viz Assignment Contest by Tableau
https://www.tableau.com/student-viz-assignment-contest
• View projects from Tableau Public
https://public.tableau.com/en-us/s/
Matplotlib
The SciPy Ecosystem
• You may first generate the data, and then pass them to the
arguments
plot Line Charts
import matplotlib.pyplot as plt
plt.plot(gdp)
plt.show()
Don’t forget to
show the plot
Default x axis if
not specified
plot Line Charts
plt.plot(years, gdp)
plt.show()
plot Line Charts
args = '[color][marker][line]'
• More examples:
• https://matplotlib.org/stable/gallery/index.html
Format Charts
• xlabel() adds a label to the x axis
• ylabel() adds a label to the y axis
• title() adds a title to the entire chart
• text(x, y, text) adds text at location x, y in the chart
• grid(True) shows the grid lines in the chart
Format Charts
plt.hist(x, 50, color = 'b', density = True)
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title('Histogram of IQ')
plt.text(50, .025, r'$\mu=100,\ \sigma=15$')
plt.grid(True)
plt.show()
matplotlib is math
symbols friendly
Writing mathematical expressions
• Any text element can use math text. You should use raw
strings (precede the quotes with an 'r'), and surround the
math text with a pair of dollar signs ($), as in TeX
• To make subscripts and superscripts, use the '_' and '^'
symbols
r'$\alpha_i > \beta_i$'
• See a simple tutorial here:
https://matplotlib.org/2.0.2/users/mathtext.html
Subplots
• The subplot method adds additional subplots to the same
chart.
subplot(pos)
1 2 3
4 5 6
Subplots in One Chart
plt.subplot(131)
plt.plot('TV', 'sales', 'r.', data = advertising)
plt.xlabel('TV')
plt.ylabel('Sales')
plt.subplot(132)
plt.plot('radio', 'sales', 'bo', data = advertising)
plt.xlabel('Radio')
plt.subplot(133)
plt.plot('newspaper', 'sales', 'g>', data = advertising)
plt.xlabel('Newspaper')
plt.subplots_adjust(wspace = 0.8)
plt.suptitle('Impact of promotional strategies on Sales')
plt.show()
Subplots in One Chart
Save your figure
• Before showing the figure, use savefig()
• show functions similar to close
……
plt.savefig('fig.png')
plt.show()
Summary
• Plot line charts
• More charts
• Format charts
• Subplots in one chart
• Save figures