Lecture 5. Visualization 2
Lecture 5. Visualization 2
The objective
• As life expectancy increases around the world, and as older people
remain more active, senior tourism can be a lucrative market for
companies that know how to find and appeal to potential customers.
The World Indicators sample data set contains the kind of data that
might help companies identify the countries or regions where there
are enough of the right kind of customers.
We can show clustering in any graph we choose, let’s show it in a map
Population Urban It is easier to market services in areas with greater population density.
Population 65+ The target population is older residents with the time and funds to travel.
TourismPerCapita This is a measure that you must create as a named calculated field. The
formula is:
SUM([Tourism Outbound])/SUM([Population Total])
Tourism Outbound aggregates the money (in US dollars) that residents of
a country/region spend annually on international travel. But this total
must be divided by the population of each country/region to determine
the average amount each resident spends on international travel.
1. Drag these five fields from the Data pane to Detail on the Marks card.
2. Click to open the Analytics pane.
3. Drag Cluster from the Analytics pane and drop it in the view:
4.Tableau then displays the Clusters dialog box and adds the measures in
the view to the list of variables
It also updates the view by adding clusters to Color. In
this case, Tableau finds two distinct clusters, and is You may then decide that two clusters isn't enough—you
unable to assign certain countries/regions (colored don't have the resources to set up shop in half the
reddish-pink) to either cluster: countries/regions in the world. So you type 4 in
the Number of Clusters field in the Clusters dialog box.
Look at the statistics behind the clusters.
• Close the Clusters dialog box by clicking the X in its upper-right corner
• Click the Clusters field on the Marks card and choose Describe Clusters.
The table at the bottom of the Models tab in the Describe Clusters dialog box shows the average value
for each variable in each cluster.
One cluster has the highest life expectancy (both male and female), the highest concentration of urban
population, and the highest expenditure for international tourism. The only variable for which this
Cluster does not have the highest value is Population 65+. Depending on how you place the variables,
the label could be different, because Tableau does not know your criteria, it just groups observations
based on their distance from each other. Let’s say it’s Cluster 4.
• You could attempt to pick out the Cluster 4 countries/regions from the map, but there is an easier way.
Close the Describe Clusters dialog box and then click Cluster 4 on the Color legend and choose Keep
Only.
• Choose Text Table from ShowMe. You now see a list of the countries/regions in Cluster 4.
This list is not the end of the process. You might try clustering again with a somewhat different set of
variables and maybe a different number of clusters, or you might add some countries/regions to the list
and remove others, based on other factors. Clustering is a try and error process of discovery.
Create a group from cluster results
1. If you drag a cluster to the Data pane, it becomes a group dimension
in which the individual members (Cluster 1, Cluster 2, etc.) contain
the marks that the cluster algorithm has determined are more similar
to each other than they are to other marks.
2. After you drag a cluster group to the Data pane, you can use it in
other worksheets.
3. Drag Clusters from the Marks card to the Data pane to create a
Tableau group.
4. After you create a group from clusters, the group and the original
clusters are separate and distinct. Editing the clusters does not affect
the group, and editing the group does not affect the cluster results.
2 µ =0 Sample Mean
H0
Standard Normal (Z) Distribution
• Problem: Unlimited number of possible normal distributions
(-¥ < µ < ¥ , s > 0)
• Solution: Standardize the random variable to have mean 0
and standard deviation 1
Y -µ
Y ~ N (µ ,s ) Þ Z = ~ N (0,1)
s
• Probabilities of certain ranges of values and specific percentiles of interest can be obtained
through the standard normal (Z) distribution
P-value (aka Observed Significance Level)
• P-value - Measure of the strength of evidence the sample
data provides against the null hypothesis:
P - val : p = P( Z ³ zobs )
When finished, click OK. The new calculated field is added to Measures in the Data pane because it returns a
number. An equal sign (=) appears next to the data type icon. All calculated fields have equal signs (=) next to
them in the Data pane.
Parameters
• A Parameter is a place-holder for a single global value, such as a
number, date, or string.
• For example, you may have a filter to show the top 10 products by profit. You
can replace the fixed value “10” in the filter by a dynamic parameter so you can
quickly look at the top 15, 20, and 30 products.
• The value of Tableau parameter is global so that if the value is
changed, every view and calculation in the workbook that references
the parameter will use the new value.
• You can use the parameter in filters, calculations, reference lines,
controls, etc.
• Creating a Tableau Parameter is similar to creating a calculated field.
Use parameter in a filter
1. Use the superstore example that comes with Tableau, create a
sheet showing sales by customer
2. Drag customer name to the filter area and set the filter to show top
10 customers by sales first
3. Edit filter, select “create a new parameter…”
4. Define the new parameter and the parameter control will
automatically show in the sheet
5. Sort the chart by sales to make it more intuitive
Time series and predictive analysis
• Visualize time series data to spot trend and patterns
• Tableau’s forecasting function runs several different models by default
and select the best one, automatically accounting for data issues such
as seasonality. Forecasting in Tableau uses a technique known as
exponential smoothing and forecasts future values of a time series
from weighted averages of past values from iterations.
• Note that when showing multiple data series in the same graph, you
can choose to bring which one to the front
XKCD Webcomic:
Curve-Fitting
https://xkcd.com/2048/
Combine Tables in Tableau
• If the tables are from the same database, or workbook (for Excel), or
directory (for text) then they are considered as from the same
database.
• Combining tables that are from the same database requires only a
single connection in the data source. Typically, joining tables from
the same database yields better performance.
• Cross-database joins require that you first set up a multi-connection
data source—that is, you create a new connection to each database
before you join tables.
Join Type Result
Inner Keep values that have matches in both tables.
Left Keep all values from the left table and corresponding matches from the
right table.
When a value in the left table doesn't have a corresponding match in the
right table, you see a null value in the data grid.
Right Keep all values from the right table and corresponding matches from the
left table.
When a value in the right table doesn't have a corresponding match in
the left table, you see a null value in the data grid.
Full outer Keep all values from both tables.
When a value from either table doesn't have a match with the other
table, you see a null value in the data grid.
Union Union is not a type of join, it combines two or more tables by appending
rows of data from one table to another. Ideally, the tables should have
the same number of fields, and those fields have matching names and
data types.
Mismatch in Joins
• If there is a mismatch, there will be no data after the join
• Mismatches are often caused by differences in format of the string
values or date values in the fields
• You can often resolve mismatches between the fields in your join by
using a calculation
One table has two columns:
first name and last name
Entity-Relationship-Diagram
Tableau Joins