Tableau - Prep
Tableau - Prep
This is a project-based course, for students looking for a practical, hands-on, and highly
engaging approach to learning Tableau Prep for business intelligence
Quizzes & Homework Exercises to reinforce key concepts, with step-by-step solutions
Bonus Projects to test your abilities and apply the skills developed throughout the course
Build and organize your flow, review data types and size, and filter
3 Examining & Filtering your data using values and calculations
Leverage value and field operations like group, clean, convert, and split,
4 Operations & Calculations and create custom calculations (LODs and more)
5 Combining & Pivoting Combine and pivot your various data by leveraging aggregate, join,
union and pivot tools
THE You’ve just been hired by Maven Charter Schools, an up-and-coming private education
SITUATION institution. They have a wealth of public and private school data, but need help cleaning and
transforming it in order to expose meaningful patterns and insights.
THE Maven Charter Schools would like you to examine, clean, shape, combine and share competitive
BRIEF education data from the Massachusetts education market.
All you’ve been given is a folder of excel/csv files containing information about teacher pay and
performance, student SAT scores, pupil expenditures, and graduation rates by school and district.
1 This course is designed to get you up & running with Tableau Prep
• Our goal is to provide a deep foundational understanding of Tableau Prep Builder; we won’t cover advanced
topics like R/Python or Tableau Prep Server integration in depth
2 What you see on your screen may not always match mine
• Tableau Prep updates on a monthly basis for minor releases and quarterly/yearly for major releases, so features
and functionality may change over time
Tableau Prep is a self-service data preparation tool, providing users with visual and intuitive
tools to combine, shape, and clean raw data for analysis
Tableau Prep is included as part of the Tableau Creator role, which includes Tableau Prep
Builder, Tableau Desktop, and one license of Tableau Server or Tableau Online
USER ROLES:
PRODUCTS:
Connections Pane
Connect to local, server, or
published data sources
Profile Pane
Displays a summary of
each field in your data
sample
Data Grid
Displays a preview of the rows and columns in your source data
Tableau Prep uses visual indicators to represent steps, field types, and notifications within a flow;
familiarizing yourself with these indicators will help you interpret exactly how a flow functions
Input Steps Clean Steps, Changes Pane & Toolbars Join Steps
Icons in flow pane shows data source type Icons track changes made to data Icons define join types between data sources
Data Source Calculated Field Hide Profile Pane Full Anti Join
Data Source with Wildcard Union Change Data Type Show Profile Pane Inner Join
Excel Edit Value Merge Fields Left Inner Join
Excel with Wildcard Union Exclude Values Remove Fields Left Outer Join
CSV Filter Values Rename Field Full Outer Join
CSV with Wildcard Union Group Values Search Right Inner Join
Tableau Extract Keep Only Split Fields Right Outer Join
Tableau Prep uses visual indicators to represent steps, field types, and notifications within a flow;
familiarizing yourself with these indicators will help you interpret exactly how a flow functions
Calculated Field Rename Field CSV File Shows when data is sampled
Change Data Type Search Published Data Source Hover to show exact row count
Edit Value Split Fields Local Tableau Data Extract
Exclude Values Boolean Data Type Run Flow
Filter Values Date Data Type
Notifications
Group Values Date Time Data Type
Identify problems, errors or alerts
Keep Only Numeric Data Type
Merge Fields Text Data Type No Notifications
Remove Field Notification Alert
Error in the Step
Aggregation
Join OUTPUT
Union
Pivot
INPUT
Clean
It’s important to think about data design before you begin to clean or transform your data, as design
needs will vary based on your audience, use case, and performance needs
• Row-heavy data which is the most flexible • Highly dimensional data with many • Highly aggregated and curated views
structure for Tableau Desktop columns for best performance
• Ideal combo of good performance & • Allows for deep analysis and many • Ideal for executive-level visualizations
dynamic aggregation “cuts” of data and specific high-level use cases
• Commonly used with transactional data • Most common with survey data
and unique record data sets
Tableau Prep enables users to connect, clean, and configure raw data from virtually any source
• Connect to local files, databases or • Clean your data upfront with • Configure field names, data
published sources tools like data interpreter types, text settings, etc.
• Enhance connections with wildcard • Filter initial data down before • Choose which fields to include or
unions, SQL and more the main flow exclude from the flow
https://tableau.mavenanalytics.com
NOTE: Data Interpreter is available for NOTE: Data Interpreter is NOT available NOTE: Data Interpreter is NOT available for
text/csv files for database connections tableau server connections
Wildcard unions allow you to combine files or tables within a folder or directory at the input stage
Search In
Select the directory/schema to use
to find files/tables for the union
Include Subfolders
Includes files contained in
subdirectories of the parent folder
Joins can also be created at the input stage for certain database connections; if table relationships
are present, Linked Keys will be available to specify which fields to use for the join
Linked Keys
Unique Identifier (Primary Key)
Related Fields (Foreign Key)
Unique and Related Fields
Text files require additional configuration in the Settings tab to determine how they will be ingested
First Line Contains Header is the default, and pulls the first row as headers
Generate Field Names Automatically will generate generic headers (F1, F2, etc.)
Text Qualifier selects the character that encloses the values in a file
• NOTE: This defaults to automatic and gives ‘, “, and “none” as options
Character Set selects the character set that describes the file encoding (UTF-8, etc.)
Locale sets the geographic location to parse the file (important for dates, currency,
decimals/thousands separators, etc.)
To optimize performance, Tableau Prep samples large data sets and returns a subset of records
Use all data: Retrieves all rows regardless of size (can cause performance issues)
• NOTE: Data will still limit to 1 million rows (Aggregate/Union) and 3 million (Join/Pivot)
Fixed number of rows: Select custom number or rows (recommended <1 million)
Random sample: Returns the number of rows requested, but looks at all records
and returns a representative sample (may impact performance prior to cache)
If data changes while building a flow, you can refresh during the input stage using several methods:
OPTION 2:
Edit Connection
Edit the data connection OPTION 1: Refresh
and return to the flow
For File Inputs, refresh using the refresh
icon or the input step
THE Happy Hipsters, a lifestyle apparel company, wants to analyze World Happiness data to
SITUATION support an upcoming marketing campaign, and has enlisted your help
THE The Happy Hipsters team has asked you to help clean and consolidate their raw data
BRIEF into a single source, which will enable them to explore and analyze key global happiness
metrics for their new campaign
After connecting to sources, users can examine & filter data using Tableau Prep’s visual interface;
it’s important to conduct these steps before making any major changes to your data in the flow!
Examine Filter
• Profile your data by looking at field value distributions • Reduce the data being pulled, using various filtering tools
• Review data types, data size, and find specific fields or values • Organize your flow’s tools and settings for optimal performance
and clear documentation
• Sort & Highlight values in your fields to find gaps or deficiencies
One of the first steps in evaluating data is to examine data size, field types and unique values;
this can be done at several stages, but the simplest approach is to add a clean step
The profile pane allows you to visualize the distribution of your data, by plotting the frequency of
each distinct value as bins in a histogram; this is a great way to identify outliers and null values!
Summary View
Detail View
Continuous view of values
Discrete view of individual showing both the range and
values within the column frequency in which they appear
NOTE: Click the distribution in the column
to skip to desired values
Use the toolbar search or field search options to find specific fields or values in your data
Within the profile pane, you can sort bins by either frequency or alphabetical order (ascending or
descending), or click to drag and rearrange profile cards
Highlighting is a quick way to trace fields back through flow steps, see related values across
fields, and pinpoint identical values in your data
Trace Fields
Select a field to trace where
it was used or modified
within your flow
Related Values
Highlight related values by
selecting a value/bin in the
profile pane
NOTE: Related values are
highlighted in blue
Identical Values
Select a value in the data grid to highlight all identical values
There are several filtering methods in Tableau Prep, based on the field type and step chosen:
Keep or Exclude Keeps or removes selected value or field (available for all field types; String, Number, Date, Date Time, etc.)
Calculation Filter Filters values based on calculated field condition (available for all field types)
Selected Values Filter Chooses values to keep or exclude even if they aren’t in the data source (available for all field types)
Range of Values Filter Filters by minimum and maximum value parameters (available for Number field type)
Range of Dates Filter Filters by minimum and maximum date value parameters (available for Date and Date Time field types)
Wildcard Match Filter Filters by partial or whole matching text (available for String field type)
Null Values Filter Keeps only Null or Non-Null Values (available for all field types)
Keep Only/Exclude
Single or multi-select values
Calculation Filter from the profile card to keep
or exclude
Condition must be Boolean
(only filter available in steps
other than clean step)
Selected Values Range of Values Range/Relative Dates Null Values Wildcard Match
Manually select values to Filter numeric values within a Range of dates (upper/lower) or Filter to only null or non-null Keep/exclude values based on
keep/exclude (keyed values can specified lower/upper limit time period relative to today or values a pattern (filter results display
be added even if not in data) an anchor date on left pane)
THE Your brother-in-law Sai just started his first business venture: a food truck specializing in
SITUATION Indian desserts called Bengali Sweet Treats. As the family’s resident data nerd, you’ve
been enlisted to help him analyze popular Indian dishes to help him perfect his menu.
THE Sai needs you to examine a spreadsheet containing hundreds of Indian dishes, and profile
their ingredients, prep time, regional origin, and flavor profile.
BRIEF
You’ll need to connect, profile, and filter the data to give Sai some ideas for his award-
winning food truck!
• Clean & transform data using a range of value and • Perform logical, string, aggregate or level of
field operations (group, filter, split, etc.) detail calculations to create new fields
• NOTE: Cleaning steps can be performed in multiple • Apply analytic functions (i.e. rank) across tables
flow steps (except output) or partitions
Records Filter
Rename Field
Fields
Duplicate Field
Calculated Field
Clean
Convert Dates
Edit Values
Values
Group Values
Split Values
Cleaning Operations
Accessible via the profile pane or drop-down menu
Layout Options:
Data Grid
Shows detailed data view
List View
Shows columns in list form
Pause/Resume Updates
Options to pause or resume updates
Limited Features
Features which require visual representation of values (splitting,
filtering, grouping, etc.) are disabled while updates are paused
Value operations can be used to filter, clean, group or split values inside fields
Filter allows you to reduce the number of records using various filter criteria
Clean provides a list of quick cleaning operations which apply to all values in the field
Group Values replaces individual or multiple values with new a group value
PRO TIP: Use Tableau Prep’s recommendations (light bulb) to automatically clean your data
Use cleaning tools to change text case, remove specific characters, or trim spaces from strings
Manual Grouping
Automatically group text values using fuzzy matching algorithms based on pronunciation,
common characters or spelling
Pronunciation
Find and group values which sound alike. and move
threshold slider to the left or right to adjust strictness
(left = fewer groups, right = more groups)
Common Characters
Find and group values with letters and/or numbers in
common (i.e. “John Smith” and “Smith, John” likely
refer to the same person)
Spelling
Find and group values which are spelled alike, and move
threshold slider to the left or right to adjust strictness
(left = fewer groups, right = more groups)
Automatic Split
Splits values automatically using common delimiters
Custom Split
Define the delimiter and number of columns for the split
Calculated Split
Split text using a custom calculated field
NOTE: Calculations are automatically generated when
either split type (automatic or custom) is performed
Double-Click
Double-click a value in the profile pane to edit it directly
(field turns into a group after the first try)
Right-Click
Right-click and choose “Edit Value” to edit or replace
the value with null
Convert dates to modify formats without the need for calculated fields or parsing functions
Date and Time
Convert date field to datetime format (ex. 1/23/2020, 11:14:02 PM)
Year Number
Convert date field to year number format (ex. 2010, 2015, 2020)
Quarter Number
Convert date field to quarter number format (ex. 1, 2, 3, 4).
Month Number
Convert date field to month number format (ex. 1, 2, 3, 4 … 11, 12)
Week Number
Convert date field to week number format (ex. 1, 2, 3, 4 … 52, 53)
Field types can be customized in every flow step except the output, and are used to assign fields
as numbers (decimal or whole values), dates (date or datetime) or text strings
Number (decimal)
Numeric value with decimal values (best for exact values like dollars, ratios, etc.)
Number (whole)
Numeric value with no decimal (best for quantity, date parts, ID fields, etc.)
Date
Date fields (best when date filtering and date calculations are needed – datediff, dateadd, etc.)
String
String fields (best for most dimensional values, text values that should be parsed, etc.)
Note that data types not only impact how fields are used in Tableau Prep, but also how
data visualization tools interact with data and users
Data roles represent standard sets of values, which can be used to validate the values within a field
URL
Web link-based role / URL fields
Published Data Roles are used in Prep Builder in conjunction with
Email Prep Conductor (not covered in this course) to compare values in
your flow against published standardized data values
Email role fields
Field cleaning operations can be used to modify, add or remove fields from the flow
Rename Field changes the field name referenced (can double-click name as well)
Duplicate Field creates a copy of the field (and adds a “-1” to the name)
Keep Only Field keeps only the selected field(s) in the flow.
• NOTE: Use Ctrl or Cmd to select more than one field to keep
Create Calculated Field creates a new calculated field with the selected field referenced
• NOTE: We’ll cover calculations in depth later in this section!
Computation-based functions
Number used on numerical fields
Level of detail (LOD) calculations are used to perform aggregations at different grains of data
Options include:
Visual Editor: • RANK()
• RANK_DENSE()
• RANK_MODIFIED()
• RANK_PERCENTILE()
• ROW_NUMBER()
Copy and paste individual elements within flows, including cleaning operations, fields or steps
Reusable flow steps can be created, saved and imported into other flows, and are commonly used
for steps which are used frequently or leveraged by other users
Publish to Server
Publish a flow to Tableau
Server using publisher
credentials
NOTE: Published flows which
utilize file-based input steps
are not yet supported
THE Your old boss at Tech Data Talent (TDT) contracted you for some data prep assistance.
SITUATION You’ll need to use your Tableau Prep skills to make sure the TDT team is working with
clean and accurate data.
THE Your task is to clean survey response data to help the team accurately analyze mental health
BRIEF trends in the tech industry. The key will be to clean and organize the data in a way that will
allow TDT’s analytics group to easily analyze and visualize patterns.
Data can be transformed and combined using several types of flow steps in Tableau Prep, including
Union, Join, Aggregate and Pivot
• Union and join are used to blend data • Change the granularity of your data • Transpose rows to columns (or
together to create combined tables using aggregate (i.e. daily to monthly) columns to rows) using a pivot step
• Union stacks records from common • Group data by fields in your table to • Set up data outputs for optimal
columns, and joining adds related fields control the level of aggregation consumption using different table
from another table layouts
The union step appends (or “stacks”) records from multiple tables, based on matching columns
Drag to Union
Manually drag one step over another to
union, and use (+) to add more tables
PRO TIP: If you need to union 10+ tables, try using wildcard unions in the input step!
Resulting Fields
Count of total and
mismatched fields
Aggregate allows you to change the granularity of your data by summarizing values at higher levels
Additional Fields
Drag fields to the “Grouped Fields” or “Aggregated Fields” panes PRO TIP: Use “Group By” with no aggregation to create a
(NOTE: fields not selected will not pass through this step) unique list of dimensions
Join is used to combine data between tables which share common or related fields
Review the join results in the profile pane to identify and resolve common issues, including mismatched
values or incorrect join types or clauses
Pivoting transposes rows to columns (or vice versa), allowing you to create “wide” or “tall” tables
Descriptions
Add descriptive notation to steps
to provide details and clarity
Color Scheme
Customize colors to identify
related steps in the flow
Group Steps
Use groups to organize and compress large collections of
flow steps to make them easier to digest and share
THE As a leader of your local F1 racing fan club, you’re in charge of preparing data for the club’s
SITUATION upcoming annual F1 fantasy draft.
THE You’ve been asked to gather data to help members accurately analyze driver stats, lap times,
BRIEF and race results. The key will be to combine raw data into a centralized source that combines
all historical race data as well as peripheral driver and result information.
Tableau Prep allows you to configure options for sharing data outputs and updating flows
Share Update
• Share data outputs as local files, published data sources, or • Refresh your flow and configure incremental update
updated tables in databases options
• Preview your data in Tableau Desktop prior to automating your • Learn about the benefits of using Prep Conductor to
flow to ensure your success criteria have been met fully automate prep flows
Save your flow locally to retain steps, bundle local data sources, and share flows with other users
Save Flow
Manually save your flow as a .tfl
file to retain your work
Save As
Use Save As to choose the
type of flow file saved
`
Use the Preview in Tableau Desktop option to preview the output while developing a flow
Create local extracts in Tableau Prep to output as either .csv or .hyper file formats
Name
Name the output extract
Write Options
Choose local write options (create
table or append to table)
Run Flow
Execute the flow on full data
Prep can write to external databases as a new table or append/replace data in an existing table
Database
Select a database schema
Table
Select an existing table or create a new one
Write Options
Create, append, or replace table data
Run Flow
Execute the flow on full data
PRO TIP: Select “enable incremental refresh” on input/output to only add new data!
Publish data sources to Tableau Server to grant user access to data and enable automated refresh
Project
Select the project where your flow will be located
Name
Name your flow
Description
Give a brief description of what your flow does
Tags
Make the flow searchable on server using tags
Connections
Edit connections to embed credentials; local files
need to be uploaded (flat) or use direct connection
(refreshed on regular basis)
Tableau Prep Conductor can be used to automate and optimize flows in Tableau Server / Online
Schedule Flows
Schedule flows to automatically run on a
set day or at a specified refresh time
Administration
View performance and scheduling to
optimize flow runs
Alerts
Configure alerts and email notifications to
notify you of failed flows
THE Your friend Anna is a Director at Maven Financial, a local bank branch, and needs your help
SITUATION extracting customer data from Tableau Prep.
THE Anna has asked you to set up outputs for various stakeholders, utilizing various file formats
BRIEF and platforms. Your job is to deliver the data in a predictable and efficient way, to enable the
business to use it going forward.
*If you do not have access to Tableau Server, you can skip this step and review the solution video
*Copyright Maven Analytics, LLC