Tableau 2018.1
Tableau 2018.1
Tableau 2018.1
-Rakhesh
-
Digital World
The world will have 20-25 billion connected devices by 2020.
How does it affect us as an individual and companies? It means
there will be enormous amounts of data (human and machine).
The ability to use that data capitalize that tremendous
opportunity is highly beneficial for both the individual and
companies.
Fast Analytics and Rapid-
fire Business Intelligence
for Everyone
Gartner Chart - 2017
For the Sixth year, Tableau is a leader in the Magic Quadrant for Business Intelligence and
Analytics Platforms Report.
Gartner Chart - 2018
What is Tableau?
“Tableau help people see and
understand their data”
What is Data Visualization?
Data Visualization is, quite simply, the process of describing information through
visual rendering. Data Visualization reveals unnoticed information. Gives fast
answers and find insight in data. Visualization conveys information in a universal
manner and makes it simple to share ideas with others.
Visualization lets you see things that would rather go unnoticed. Any data contain
information but if there’s no visual data you’re missing out on trends, behavior
patterns and dependencies.
Principle 3 : Memory plays a role in cognition, but our working memory is very limited
Why Tableau ?
1. Tableau is groundbreaking data Visualization tool
2. Tableau is an in-memory tool where it leverages the
complete memory hierarchy from disk to L1 Cache
3. In tradition reporting tools data is stored in disks or in
cubes so it takes time to fetch data.
4. Software helps people to easily explore and understand
data (Consumer)
5. Tableau helps anyone quickly analyze, visualize and
share information.(Developer)
6. Tableau Software products: Tableau Desktop, Tableau
Server, Tableau Reader, Tableau Online and Tableau
Public
Tableau Products
1. Tableau Desktop
Backbone of Product Offering. Use Tableau Desktop to create your
workbooks
2. Tableau Server
A secured intranet portal where users can easily publish and share
their Tableau workbooks and users can easily access through web
browser and Mobile
3. Tableau Reader
Free tool that allows you to open, but not edit Tableau workbooks. You
can only ‘Read’ what others have created (as a PDF Reader)
4. Tableau Online
Cloud Based ‘Tableau Server’ solution. Tableau will host your server and manage.
Enabling you to easily scale and publish content right out of the firewall.
5. Tableau Public
Is a free service that lets anyone publish interactive visualization to the
web.
Tableau Product Architecture
Tableau Server Architecture
Data Connectors
Tableau includes a number of optimized data connectors for
databases such as Microsoft Excel, SQL Server, Oracle,
Teradata, Vertica, Cloudera Hadoop, and many more.
This leaves the detail data in the source system and send the aggregate
results of queries to Tableau.
In-memory
Tableau offers a fast, in-memory Data Engine that is optimized for analytics.
You can connect to your data and then, with one click, extract your data to bring
it in-memory in Tableau.
Tableau’s Data Engine fully utilizes your entire system to achieve fast query
response on hundreds of millions of rows of data on commodity hardware.
Data Engine can access disk storage as well as RAM and cache memory, it is not
limited by the amount of memory on a system.
In-memory or Direct Connect?
Ease of use: Tableau desktop is an intuitive, drag and drop tool that lets
you see every change as you make it.
Any data (50 -60) : Tableau desktop can connect directly to databases,
cubes, data warehouses, files and spreadsheets and with live connection
you can see up to the minute data.
Perfect mashups: Provides data blending, can filter data from one data
source to another.
Tableau Server
Interactive Dashboards on the web: Combine and publish data on the web to filter,
highlight and drill down right in a browser.
Web Authoring: Tableau Server enables users to create new workbooks from
published data sources, and edit views directly on the server
1. Software installation
2. Software upgrades
3. Monitoring performance, server utilization, and system tuning
4. Processes that support security, backup and restore, and change
management
5. Managing users, groups, projects, workbooks, and data connections
Tableau Workspace
1. Workbook
2. Toolbar
3. Card & Shelves
4. Visualization
5. Show Me
6. Sheet Tabs
7. Status Bar
8. Data pane
9. Data window
10. Start Button
Tableau Workspace Cont…
The data source name (1): When you load data, you should provide the data
source with a name that identifies the contents. Once you have added several
data sources, you can condense their window in order to save space and then
select different sources from the drop-down menu.
The Dimensions pane (2): This includes categorical fields with qualitative data.
The Dimensions pane typically consists of a string field, a date field, and a field
that has geographical attributes, as well as unique identifiers, such as ID fields.
The Measures pane (3): This usually includes quantitative fields with
numerical data that can be aggregated. Tableau Public will automatically group
numerical fields, except the ones with the ID string in the name as measures.
The Sets pane (4): This includes user-defined, custom fields that interact just
like dimensions and measures do. Sets pane can also create subsets of data that
you can use just like dimensions.
The Parameters pane (5): This includes dynamic placeholders that can replace
constant values in calculated fields and filters. Parameters are unique to a
workbook and not a data source. You'll see the parameters available in your
workbook no matter which data source you are viewing.
Data Types
Tableau expresses fields and assigns data types automatically. If the data
type is assigned by the data source, Tableau will use that data type. If the
data source doesn’t specifically assign a data type, Tableau will assign one.
Tableau supports the following data types:
Getting Started
When you save your work in Desktop, the default save method creates a workbook (.twb) file. If you need to
share your work with people who don’t have a Tableau desktop license or don’t have access to the data source,
you can save your work as a packaged workbook (.twbx) by using the Save As option when saving your file.
Tableau Packaged Data Source File Potentially large Sharing with people who don’t have Contains all of the information in
(.tdsx) access to the original source files, a .tds file as well as any local file data
typically via publishing to Tableau Server. sources for those who do not have access
to the local files.
Demo Dashboard
Malaria
How common is your birthday? Find out exactly
with an interactive heat map.
Preparing Data for Tableau
Data Prep with Text and excel files
1. Date interpreter
2. Pivot Table
3. Meta Data management
The Data Interpreter
Tableau’s Data Interpreter was introduced in V9.0 and is intended to help you
deal with poorly formatted and unstructured spreadsheet data. It includes a
variety of tools to help fix problems and address issues with your data source:
1. Pivoting columns
2. Reformatting data
3. Renaming headings
4. Splitting cells
5. Changing data types
6. Hiding unneeded data
Connecting to Data
Connect to Data
The first step to getting started with Tableau Desktop is to connect to the data
you want to explore. There are several types of data you can connect to and
several ways to connect to your data. For example, you can connect to your
through the start page, the toolbar, or the Data menu
Live – Creates a direct connection to your data. The speed of your data source will
determine performance.
Extract – By default, this option imports the entire data source into Tableau fast
data engine as an extract. The extract is saved with the workbook. If you prefer to
import a subset of the data, click the Edit link. This option requires you specify
what data you want to extract using filters.
Add - Add data source filters to limit the visibility of fields contained in the
workbook.
Data Window
Connecting to varieties of Database
1. Excel Connection
2. Access Connection
3. Oracle Connection
4. MySql Connection
5. Google Analytics
6. TDE File
7. Tableau Server
8. EXASOL
Join Types
Determine Which rows are selected
• Inner Join
• Left/Right Join
• Full Join
Data Blending
Data blending is when you combine data from multiple data source types in a
single worksheet. The data is joined on common dimensions. Data Blending
does not create row level joins and is not a way to add new dimensions or rows
to your data.
Data blending should be used when you have related data in multiple data
sources that you want to analyze together in a single view. For example, you
may have Sales data collected in an Oracle database and Sales Goal data in an
Excel spreadsheet. To compare actual sales to target sales, you can blend the
data based on common dimensions to get access to the Sales Goal measure.
Exercise
1. Single Table Connection
2. Multiple Table Connection
3. Custom SQL
4. Connecting to Google Analytics
Field Types & Visual Cues
Dimension and Measures
When you connect to a data source, Tableau assigns each field in the data source
as playing one of two possible data roles: dimension or measure.
Dimensions
When you first connect to a data source, Tableau assigns any fields that contain
discrete categorical information (for example, fields where the values are strings
or Boolean values) to the Dimensions area in the Data pane.
When you click and drag a field from the Dimensions area to Rows or Columns,
Tableau creates column or row headers.
Measures
When you first connect to a data source, Tableau assigns any fields that contain
quantitative, numerical information (that is, fields where the values are numbers)
to the Measures area in the Data pane.
When you drag a field from the Measures area to Rows or Columns, Tableau
creates a continuous axis.
Continuous and Discrete
Continuous and discrete are mathematical terms. Continuous means "forming
an unbroken whole, without interruption"; discrete means "individually separate
and distinct.“
Discrete fields draw headers; continuous fields draw axes
Continuous Discrete
Continues and Discrete
Blue pill = discrete, which means distinct values, which
this will generate headers for us.
• Cross tab
• Bar Charts
• Line Graphs
• Pie charts
• Heat Map
• Scatter Plots
• Treemaps, Word Clouds and Bubble Charts
• Dual Axis
• Edit axis
• Totals & Subtotals
Cross tab - Sometimes you do need to be able to look up exact values. A table is
an acceptable way to show data in that situation. On most dashboards, a table
shows details alongside summary charts.
Bar Chart - A bar chart uses length to represent a measure. Bars are widely
used in data visualization because they are often the most effective way to
compare categories. Bars can be oriented horizontally or vertically. Sorting
them can be very helpful because the most common task when bar charts are
used is to spot the biggest/smallest items.
Stacked Bar Chart - We can add another dimension to the above bar chart to
produce a stacked bar chart which shows different colors in each bar. We
drag the dimension field named segment to the Marks pane and drop it in
colors. The below chart appears which shows the distribution of each
segment in each bar.
Line Chart - Line charts usually show change over time. Time is represented
by position on the horizontal x-axis. The measures are shown on the vertical
y-axis. The height and slopes of the line let us see trends
Pie Chart - A pie chart represents data as slices of a circle with different
sizes and colors. The slices are labeled and the numbers corresponding to
each slice is also represented in the chart. We can select the Pie chart
option from the Marks card to create a pie chart.
Heat Map - A heat map is a great way to compare categories using color
and size. In this, you can compare two different measures.
Scatter Plot - A scatterplot lets you compare two different measures. Each
measure is encoded using position on the horizontal and vertical axes.
Scatterplots are useful when looking for relationships between two variables
Tree Map - The tree map displays data in nested rectangles. The dimensions
to define the structure of the tree map and measures to define the size or
color of the individual rectangle. The rectangles are easy to visualize as
both the size and shade of the color of the rectangle reflect the value of the
measure.
Bubble Chart - Bubble charts display data as a cluster of circles. Each of the
values in the dimension field represents a circle whereas the values of
measure represent the size of those circles. As the values are not going to be
presented in any row or column, we drag the required fields to different
shelves under the marks card.
Word Clouds - The word cloud (sometimes also referred to as a tag cloud)
displays members of a chosen dimension as text, but in varying sizes and
colors, depending on one or two measures. A common example of word
cloud usage is analyzing the effectiveness of search engine keywords in
website visit metrics, Twitter # Tag, Speech Analysis etc …
Analyzing Data
Basic Filters
There are a few ways to add filtering to your visualization. Dragging any
dimension or measure on to the Filters shelf provides filtering that is accessible
to the designer. Make that filter accessible to more people by turning it into a
quick filter. This places it on the desktop where it is accessible to anyone—even
those reading your report via Tableau Reader or Tableau Server. You can also
create conditional filters that operate according to rules you define.
• Source Filter
1. Extract Filter
2. Data Source Filter
• Traditional Filter
1. Normal Filter
2. Quick Filter
• Context Filter
What is Filtering
Narrowing based on Criteria
Dimensional:
o WHERE Region = “West”
o Ignore this specific value, like Null.
Measure Filters:
o WHERE restaurant Rating > 10
o Exclude Orders created before 2013
Date Filtering
1. Dates can act like either dimensions or measures. Time in Tableau can
be either Discrete or Continuous:
1. A Discrete date is like a block of time. (like 2014, January, or Thursday)
2. Continuous time represents all of the date on a continuum between the
first date and the last.
2. Additionally, a date can be filtered relative to a specific date:
• Relative date filters allow you to consider the dates in your data relative
to a specific anchor, such as Today, Now, or the last date in your data set.
In this way you can look at the last 2 years of data, or the next 4 months
of projected expenses.
Filter Options
Edit filter: Exposes the main filter menu
Remove filter: Removes the quick filter
Apply to worksheets: Applies the filter to all or selected worksheets
Customize: Turns on or off different filter controls
Show title: Turns off or on the quick filter title
Edit title: Modifies the text in the quick filter title
Only relevant values: Reduces the set members displayed in the filter so that only
values included in the filtered set are displayed
Include values: Causes selected items in the filter to be included in the view
Exclude values: Causes selected items in the filter to be excluded from view
Hide card: Removes the quick filter from view but leaves it on the filter shelf
These are the Quick Filter menu items that appear only if the quick filter
is on a dashboard:
Floating: If activated, allows the filter to float on top of other worksheet objects
Select layout container: Activates the layout container in the dashboard
Deselect: Removes the layout container selection in the dashboard
Remove from dashboard: Removes the quick filter from the dashboard
Analysis
• Implementing Hierarchies and Others
• Date Hierarchy
• Fiscal Year and Custom Dates
• Custom Hierarchy
• Sorting
• Grouping
• Simplifies large numbers of dimension members by
• combining them into higher-level categories.
Advanced Views
Blended Axes & Dual Axes
• Reference Lines
• Box & Whisker Plots
• Bullet Charts
• Bins
• Histograms
Dual Axes - Dual axes are useful when you have two measures that have
different scales. To add a measure as a dual axis, drag the field to the right
side of the view and drop it when you see a black dashed line appear. You can
also right-click (control-click on Mac) the measure on the Columns or Rows
shelf and select Dual Axis.
Box & Whisker Plots - box-and-whisker plots tell us about the distribution of
measure’s data values by indicating the important statistical values of the median
(Q2), upper quartile (Q3), lower quartile (Q1), visually expressing the
interquartile range (between Q3 and Q1) and the minimum and maximum values
of the measure. The median is the data value that splits all the values to two
parts in a way that half of them is smaller than the median and the other half is
bigger. Three quarters of the data values are below the upper quartile (Q3) while
one quarter of the data values is above it. Q1 is analogous, at the bottom end.
Bullet Charts - A bullet graph is a very powerful way to compare data against
historical performance or pre-assigned thresholds. A bullet is one of the best ways
to show actual versus target comparisons. The blue bar represents the actual
value, the black line shows the target value, and the areas of gray shading are
performance bands
Bins
Tableau Bins are useful to create a Range of data. Instead of
aggregating the measure to calculate the average age, you can bin the
measure to define age groups: 0–5, 6–10, 11–15, and so on. Then you
can count the number of people in each age group.
• Sets –
• A subset of your data that meets certain conditions based on existing
dimensions.
Tableau - Context Filters
The filters that you add to your visualization are independent of each other. It
means each of the filter reads all the rows from the source data and creates its own
result. But there may be scenarios where we want the second filter to process only
the records returned by the first filter. So in this case the second filter is known as
dependent filters because they process only the data that passes through the
context filter.
Create a dependent numerical or top N filter – You can set a context filter to
include only the data of interest, and then set a numerical or a top N filter.
Sets
A subset of your data that meets certain conditions based on existing dimensions.
All sets can function as advanced, pre defined filters or placed on the rows or
columns shelves
Sets defined in one work sheet can be reused in other work sheets as an item with
automated filter based on the levels selected
Save an existing filter as set for later use and we can use in Calculated fields as
well.
Set can be added to a shelf or filter, no matter where ever you add a set, it behaves
like a filter
Sets
What are Sets in Tableau
• Sets are custom fields that define a subset of data based on some conditions
• They can be based on a computed condition or specific data points on your view
• You can even combine multiple sets
• There are some date periods you cannot group with, but can create sets
Tableau V9 Features
1. Place Search
2. Mapbox Maps (9.3)
3. Census Layers
4. Radial and Lasso Search
5. Local Language Selection
6. Detailed Geo Details
Plotting Your Own Locations on a Map
Formatting and Tooltip
Calculated Field create a new filed in the view when existing data items are not
appropriate for certain calculations
Calculated Fields are normally (but not always) executed at the database
level, where the heavy-lifting happens is dependent on the type of functions
utilized in the formula. Calculated Fields can be used to generate numbers,
dates, date-times, strings, or Boolean (true/false) conditions.
Types of Operator
• General Operators – (+, - )
Precedence Operator
1 –(negate)
2 ^(power)
3 *, /, %
4 +, –
5 ==, >, <, >=, <=, !=
6 NOT
7 AND
8 OR
Functions
1. Number Functions
2. String Calculations
3. Date Calculations
4. Logical Functions
5. Table Calculations
6. LOD Expressions
Number Functions
ABS(number) - ABS(-7) = 7
MAX(number, number) - MIN(4,7), MIN(Sales, Profit)
ZN(expression) - Returns the expression if it is not null, otherwise returns zero.
Use this function to use zero values instead of null values.
String Calculations
CONTAINS(string, substring) - CONTAINS(“Calculation”, “alcu”) = true
FIND(string, substring, [start]) - FIND("Calculation", "alcu") = 2
LEFT(string, number) - LEFT("Matador", 4) = "Mata"
RIGHT(string, number) - RIGHT("Calculation", 4) = "tion"
LEN(string) - LEN("Matador") = 7
REPLACE(string, substring, replacement) - REPLACE("Version8.5", "8.5", "9.0")
= "Version9.0"
SPLIT(string, delimiter, token number) - SPLIT (‘a-b-c-d’, ‘-‘, 2) = ‘b’, SPLIT
(‘a|b|c|d’, ‘|‘, -2) = ‘c’
TRIM(string) - Returns the string with leading and trailing spaces removed. For
example, TRIM(" Calculation ") = "Calculation"
Date Functions
date_part Values
'year' Four-digit year
'quarter' 1-4
1-12 or "January", "February", and so
on
'dayofyear' Day of the year; Jan 1 is 1, Feb 1 is 32,
and so on
'day' 1-31
'weekday' 1-7 or "Sunday", "Monday", and so
on
'week' 1-52
'hour' 0-23
'minute' 0-59
'second' 0-60
Date example
DATEADD(date_part, increment, date)
DATEDIFF(date_part, date1, date2, [start_of_week])
DATENAME(date_part, date, [start_of_week]) – Returns String (Jan-Dec)
DATEPART(date_part, date, [start_of_week]) - Returns Int (1-12)
DATEPARSE(format, string)
DATETRUNC(date_part, date, [start_of_week])
MAKEDATE(year, month, day)
NOW( )
Logical Functions
Logical calculations can be useful when you want to create specific views from
a data source without changing the source itself. For example, combine
different members of a dimension or filter a segment of the data out of a view
• Case expressions
• IF
• IIF
• ISDATE
• ISNULL
Table Calculations
Table calculations are computations that are applied to the values in the table.
These computations are unique in that they use data from multiple rows in the
database to calculate a value. To create a table calculation, you need to define both
what values you want to compute and what values to compute along. These are
defined in the Table Calculation dialog box using the Calculation Type and
Calculate Along drop-down menus.
Table calculations depend upon the data in the view, not the underlying data.
Table Calculations rely on two types of fields: addressing and partitioning
fields. The key to understanding Table Calcs is to know how these fields work.
These functions are fundamental to Table Calculations and are used for most
calculations. It’s important to remember that these functions will return different
values depending on how the calculation is addressed and partitioned.
1. Total
2. Lookup
3. Window
4. Running
5. Previous value
FIRST( )
This function returns a 0 for the first visible row and then negative incremental
numbers for the following rows or partitions.
LAST( )
This function, obviously the opposite of FIRST(), returns a 0 for the last visible row
and then positive incremental numbers for the subsequent rows.
Index() or Rank()
These two functions provide a unique incremental number for every row or partition
assigned in the scope and direction of the table calculation. Basically, this table
calculation will give you a rank field for every row or pane you can see in your viz.
The RANK() function has additional functionality allowing for addressing ties and
other groupings based on rank.
LOOKUP( )
Unlike Data items, Parameters are not part of our data source but added by
us as needed
Parameters are dynamic values that can replace constant values in calculations,
filters, and reference lines. For example, you may create a calculated field that
returns true if Sales is greater than $500,000 and otherwise return false. You can
replace the constant value of “500000” in the formula with a parameter. Then
using the parameter control you can dynamically change the threshold in your
calculation. Alternatively, you may have a filter to show the top 10 products by
profit. You can replace the fixed value “10” in the filter to by a dynamic parameter
so you can quickly look at the top 15, 20, and 30 products.
Parameters – Usage
1. Constant Line
2. Average Line
3. Median
4. Totals
5. Average Line
6. Trend Line
7. Forecast
8. Reference Line
Forecasting
Forecasting is about predicting the future value of a measure. There are many
mathematical models for forecasting. Tableau uses the model known as
exponential smoothing. In exponential smoothing, recent observations are given
relatively more weight than older observations. These models capture the
evolving trend or seasonality of the data and extrapolate them into the future.
The result of a forecast can also become a field in the visualization created
Using R and Tableau
R is a language and an environment for most latest statistical and machine
learning libraries and awesome graphics. It is open source, distributed under GNU
license and so, a popular among academia and industry alike. It provides a wide
variety of statistical (linear and nonlinear modelling, classical statistical tests,
time-series analysis, classification, clustering etc) and graphical techniques, and is
highly extensible. The rich ecosystem that R enjoys and the talent gap that it fills
being famous among student, makes it the language of future.
Together, R and Tableau could be really potent couple that today's data science
has to offer to solve any organization's end to end data discovery needs.
Installation R:
1. Install R(Cran) and then install packages
• install.packages("Rserve"); library(Rserve);
2. Go to Tableau Help > Settings and Performance > Manage R connection
3. Enter Server name and connect (Start Rserve before connecting)
Tableau 9+ Features
1. Performance
2. Updated Color(Pick Color)
3. Split Function
4. Tool tip (No lag)
5. Tableau App
6. Map box Integration
7. Total Control
8. Union
9. Version Control (Server)
10. LOD
LOD Expressions
For example if [State] dimension is on Row shelf, SUM([Sales]) will give sum
of all transactions for each [State]. If [Product Type] is also on one of the
shelves I mentioned above then SUM([Sales]) will give sum of all transactions
within each [State] for each [Product Type]. The more dimensions in your
sheet or the more unique members each dimension contains, the more granular
your results will be. Since each result as drawn as a mark in your visualization,
the finer the level of detail for your sheet, the more marks you will have.
Types of LOD
FIXED LOD Calculates the aggregation at the level of detail specified by the
list of regardless of any dimensions in the view.
Ex- {FIXED [Department] : SUM([Sales])}
INCLUDE LOD This level of detail expressions compute values using the
specified dimensions in addition to whatever dimensions are in the view.
{INCLUDE [Item] : AVG([Sales])}
1. How many patients had how many readmissions? (e.g. I want to know that 500
patients were readmitted 1 time, 335 were readmitted 2 times, 23 were
readmitted 3 times, etc…)
2. How many students had how many classes? (e.g. 9 students took 6 classes, 5
students took 5 classes, etc…)
3. How many products had how many sales?
4. How many customers have made exactly how many purchases ?
5. Keeping % of total unchanged regardless of filter
LOD Syntax
Tableau Online provides similar benefits at a lower price point but requires that you
publish Tableau reports outside of your firewall.
Server administrator: Can access, interact with, publish, and manage all objects on the server
Site administrator: Can access, interact with, publish, and damage all objects within a site
Publisher: Can access, interact with, and publish objects (workbooks)
Interactor: Can access and interact with objects (workbooks)
Viewer: Can access workbooks and publish objects (workbooks)
Unlicensed: Can publish only
Tableau Server has a robust system for managing access.
Once subscriptions are activated, you will be able to subscribe to your favorite views and
receive regular e-mails.
Workbook Optimization