Alteryx Tools Sheet v11.3
Alteryx Tools Sheet v11.3
Favorites
Icon Tool Description Example
Add one or more points in your data stream Allows users to get a look at their data anywhere in
to review and verify your data. the process
Browse
Query records based on an expression to If you are looking at a dataset of customers, you
split data into two streams, True (records may want to only keep those customer records that
that satisfy the expression) and False (those have greater than 10 transactions or a certain level
Filter that do not). of sales.
Create or update fields using one or more For instance if there is missing or NULL value, you
expressions to perform a broad variety of can use the formula tool to replace that NULL value
calculations and/or operations. with a zero.
Formula
Combines two inputs based on a Can be used to join Customer profile data as well as
commonality between the two tables. Its transactional data, and join the two data sources
function is like a SQL join but gives the based on a unique customer ID
Join option of creating 3 outputs resulting from
the join.
Output the contents of a data stream to a Loading enriched data back into a database.
file or database.
Output
Limit the data stream to a number, Choosing the first 10 records for each region of
percentage, or random set of records. previously sorted data so that you end up with the
Sample top 10 stores in each region.
Select, deselect, reorder and rename fields, If your workflow only requires 5 fields out of 50 that
change field type or size, and assign a are read in from the file/database, you can deselect
Select description. all but the 5 required fields to speed up processing
downstream.
Sort records based on the values in one or Allows you to sort data records into
more fields. ascending/descending order- such as ranking your
Sort customers based on the amount of $$ they spend
Summarize data by grouping, summing, You could determine how many customers you have
counting, spatial processing, string in the state of NY and how much they have spent in
concatenation, and much more. The output total or an average per transaction.
Summarize contains only the results of the
calculation(s).
Add annotation or images to the module This will allow uses to document what they did
canvas to capture notes or explain processes during a certain portion of the analysis, so other
Comment for later reference. users have an understanding of what they were
building in the workflow.
These tools are new in 9.x 10.x 11.0
Manually add data which will be stored in A lookup table where you are looking for certain
the module. words or codes to be replaced with new
Text Input classifications. You could create a Find field and a
Replace field to populate with the values needed.
Combine two or more data streams with Transaction data stored in different files for different
similar structures based on field names or time periods, such as a sales data file for March and
Union positions. In the output, each column will a separate one for April, can be combined into one
contain the data from each input. data stream for further processing.
Input/Output
Icon Tool Description Example
Add one or more points in your data stream Allows users to get a look at their data anywhere in
to review and verify your data. the process
Browse
Input the current date and time at module This is a useful tool to easily add a date time header
runtime, in a format of the user's choosing. for a report
Date Time
(Useful for adding a date-time header to a
Now
report.)
Input a list of file names and attributes from Lists all files in a directory- can be used in
a specified directory. conjunction with the Dynamic Input tool to bring in
Directory the most recent data file that is available
Take In-DB Connection Name and Query Use a Dynamic Input In-DB tool when creating an
fields from a standard data stream and input In-DB macro for predictive analysis.
Dynamic them into an In-DB data stream.
Input In-DB
Output information about the In-DB Use a Dynamic Output In-DB tool to output
workflow to a standard workflow for information about the In-DB workflow to a standard
Dynamic
Predictive In-DB. workflow for Predictive In-DB.
Output In-
DB
Manually draw or select map objects (points, Output results to google maps, and allow interaction
lines, and polygons) to be stored in the with the results for example. Not really! This tool is
Map Input module. only so you can pick a spatial object, either by
drawing or selecting one, to use in your module
(app).
Output the contents of a data stream to a Output to data- anywhere we can read from, we can
file or database. output to – no – we have some formats we can read
Output only. Example- loading enriched data back into a
database
Manually add data which will be stored in Manually store data or values inside the Alteryx
the module. module- example- a lookup table where you are
values of segmentation groups and you want the
Text Input
description name
This tool enables access to an XDF format This can be used when building and running
file (the format used by Revolution R predictive analytics procedures on large amounts of
Enterprise's RevoScaleR system to scale data that open source R has difficulty computing -
predictive analytics to millions of records) (specifically Linear Regression, Logistic Regression,
XDF Input
for either: (1) using the XDF file as input to Decision Trees, Random Forests, Scoring, Lift Chart)
a predictive analytics tool or (2) reading the
file into an Alteryx data stream for further
data hygiene or blending activities
These tools are new in 9.x 10.x 11.0
This tool reads an Alteryx data stream into This can be used when building and running
an XDF format file, the file format used by predictive analytics procedures on large amounts of
Revolution R Enterprise's RevoScaleR data that open source R has difficulty computing -
system to scale predictive analytics to (specifically Linear Regression, Logistic Regression,
XDF Output millions of records. By default, the new XDF Decision Trees, Random Forests, Scoring, Lift Chart)
files is stored as a temporary file, with the
option of writing it to disk as a permanent
file, which can be accessed in Alteryx using
the XDF Input tool
Preparation
Icon Tool Description Example
Automatically set the field type for each Trying to identify the best fit field for text based
string field to the smallest possible size and inputs. Make the data streaming into Alteryx as
type that will accommodate the data in each small as possible to limit processing time and ensure
Auto Field
column. proper formats for downstream processes.
Query records based on an expression to Allows users to exclude values – all fields come
split data into two streams, True (records through! in the stream- for instance if you are
that satisfy the expression) and False (those looking at a dataset of customers, you may want to
Filter that do not). eliminate certain characteristics of that customer
such as race or sex
The Data Cleansing tool automatically Remove nulls, eliminate extra white space, clear
performs common data cleansing with a numbers from a string entry.
Data simple check of a box.
Cleansing
The Date Filter macro is designed to allow a Return transaction records by specifying a start and
user to easily filter data based on a date end date.
Date Filter criteria using a calendar based interface.
Create or update fields using one or more For instance if there is missing or NULL value, you
expressions to perform a broad variety of can use the formula tool to replace that NULL value
Formula calculations and/or operations. with a zero
Create new rows of data. Useful for creating Creating data specifically time series data, Create
a sequence of numbers, transactions, or 365 unique records for each day of year.
Generate dates.
Rows
Update specific values in a numeric data For example, if you have a data set that is missing
field with another selected value. Useful for information, such as salary, and displays (NULL)
Impute
replacing NULL() values. rather than just making it zero, you can use the
Values
mean or median to fill in the NULL, to improve
accuracy of the results.
Group multiple numeric fields into tiles or For instance if you have transactional data, you can
bins, especially for use in predictive analysis. group them into different buyer personas- ie.. Males
Multi-Field
between 30-35, that spend between $1k> per
Binning
month, etc…
Create or update multiple fields using a For instance if there is missing or NULL value on
single expression to perform a broad variety multiple fields, you can use the formula tool to
Multi-Field
of calculations and/or operations. replace that NULL value with a zero
Formula
Create or update a single field using an Creating unique identifiers at a group level- cross
expression that can reference fields in row comparisons- Sales volume for yr1, yr2, yr3 in
Multi-Row subsequent and/or prior rows to perform a different rows and want to notice the difference
Formula broad variety of calculations and/or between the sales in each of these rows
operations. Useful for parsing complex data
and creating running totals.
These tools are new in 9.x 10.x 11.0
Generate a random number or percentage of If you want to base your analysis based on 35% of
records passing through the data stream. the data for instance, it will randomly return records
Random %
Sample
Assign a unique identifier to each record. This can be used to assign a customer id to a legacy
transaction, allowing for more accurate direct
Record ID marketing/promotional offerings in the future
Limit the data stream to a number, Allows you to select a subset of data/records for
percentage, or random set of records. your analysis- Can be used to focus on a select
Sample group of related records or transactions for analysis-
such as selecting all items in an online shopping cart
Select, deselect, reorder and rename fields, Allows you to determine a specific subset of records
change field type or size, and assign a should or should not be carried down throughout the
Select description. analysis- for instance if we are looking at customer
transactional data and we want to eliminate all
transactions that are less than $5K)
Select specific records and/or ranges of If a user wants to find records that are less than
records including discontinuous ranges. $100 or in a range of $100-$150 it will return
Select
Useful for troubleshooting and sampling. records in this range
Records
Sort records based on the values in one or Allows you to sort data records into
more fields. ascending/descending order- such as locating your
top 1000 customers based on the amount of $$ they
Sort spend
Group data into sets (tiles) based on value Creating logical groups of your data. User defined
ranges in a field. breaks or statistical breaks. Very good for
Tile bucketing high valued customers vs. low valued
customers
Separate data into two streams, duplicate Only want to mail to one individual, and based on a
and unique records, based on the fields of unique identifier(customer id)
Unique the user's choosing.
Join
Icon Tool Description Example
Append the fields from a source input to Adding small value to a million records.. A small to
every record of a target input. Each record big merge- Adding time stamps as well as the name
Append of the target input will be duplicated for of the users who last accessed it onto your database
Field every record in the source input. records.
Search for data in one field from one data Think of this like Excel- find and replace. Looking
stream and replace it with a specified field for something and then replacing.
Find
from a different stream. Similar to an Excel
Replace
VLOOKUP.
Combine two data streams based on For instance this can be used to join Customer
common fields (or record position). In the profile data as well as transactional data, and join
Join joined output, each row will contain the data the two data sources based on a unique customer
from both inputs. ID
Combine two or more inputs based on For instance this can be used to join Customer
common fields (or record position). In the profile data as well as transactional data, and join
Join
joined output, each row will contain the data two or more data sources based on a unique
Multiple
from both inputs. customer ID
The Make Group tool takes data Used primarily with Fuzzy Matching- ID 1 can match
relationships and assembles the data into 10 different values from source 2 and that becomes
Make Group groups based on those relationships. a group.
These tools are new in 9.x 10.x 11.0
Identify non-identical duplicates in a data Helps determine similarities in your data. For
stream. instance if you have 2 different data sets with
different ID, looking a names and address as a way
Fuzzy Match
to standard and matching them up based on these
types of characteristics- and displays all of the id’s
that match
Match your customer or prospect file to the Matching a business listing file to Dun and
Dun &
Dun & Bradstreet business file. (Requires, Bradstreet.
Bradstreet
Alteryx with Data Package and installation of
Business
the Dun & Bradstreet business location file.)
File
Matching
Combine two or more data streams with Can be used to combine datasets with similar
similar structures based on field names or structures, but with different data. You might have
positions. In the output, each column will transaction data stored in different files for different
contain the data from each input. time periods, such as a sales data file for March and
Union
a separate one for April. Assuming that they have
the same structure (the same fields), Union will join
them together into one large file, which you can
then analyze
Parse
Icon Tool Description Example
Transform date/time data to and from a Easy conversion between strings and actual date
variety of formats, including both time formats- Example- taking Military time into
expression-friendly and human readable standard times. Or turning Jan 1, 2012 into 1.1.12,
Date Time
formats. etc..
Parse, match, or replace data using regular An example would be if someone is trying to parse
expression syntax. unstructured text based files- Weblogs or data feeds
RegEx from twitter, helps arrange the data for analytical
purposes into rows and columns.
Split the text from one field into separate Allows you bring customer data from an excel file for
rows or columns. instance that contains first name and last name in
Text to
one column and split them up into 2 columns so first
Columns
name is in one column and last name is in the other
column- this will make it easy to sort and analyze
Read in XML snippets and parse them into Cleaning an xml file, parse xml text
individual fields.
XML Parse
Transform
Icon Tool Description Example
Manually transpose and rearrange fields for Used for staging data for reports
presentation purposes.
Arrange
Count the records passing through the data Returns a count of how many records are going
stream. A count of zero is returned if no through the tool
Count
records pass through.
Records
Pivot the orientation of the data stream so Think of it as a way to change your excel
that vertical fields are on the horizontal axis, spreadsheet that has a column of customer ID’s and
Cross Tab
summarized where specified. then next to a column of revenue. This will turn
these two columns into two rows
Calculate a cumulative sum per record in a Can take 3 columns of sales totals and summarize 3
data stream. yr totals of sales by that row.. ( ie yr 1 sales 10K, yr
Running
2 15K, yr 3 25K)
Total
These tools are new in 9.x 10.x 11.0
Summarize data by grouping, summing, For instance if you wanted to look at a certain group
counting, spatial processing, string of customers of a certain age or income level, or get
Summarize concatenation, and much more. The output an idea of how many customers you have in the
contains only the results of the state of NY
calculation(s).
Pivot the orientation of the data stream so Think of it as a way to change your excel
that horizontal fields are on the vertical axis. spreadsheet that has a row of customer ID’s and
Transpose then below that a row of revenue. This will turn
these two rows into two columns
Calculate the weighted average of a set of So if you are looking at calculating average spend,
values where some records are configured to this will determine and “weight” if there are certain
Weighted
contribute more than others. customers spending levels that are contributing to
Average
the average.
Report/Presentation
Icon Tool Description Example
Create a chart (Area, Column, Bar, Line, Create bar, line, pie charts
Pie, etc.) for output via the Render tool.
Charting
Send emails for each record with Allows you to create dynamically updated email
attachments or e-mail generated reports if content
Email desired.
Add an image for output via the Render Add graphics/image that will be included in report
tool.
Image
Arrange two or more reporting snippets How to arrange the pieces of your report
horizontally or vertically for output via the
Layout Render tool.
Create a map for output via the Render Create a map for a report
tool.
Report Map
Recombine the component parts of a map Takes a customized legend and reassembles it.
legend (created using the Map Legend
Map Legend
Splitter) into a single legend table, after
Builder
customization by other tools.
Split the legend from the Report Map tool Help customize legends by adding symbols such as $
into its component parts for customization or % for instance or removing redundant text
Map Legend
by other tools. (Generally recombined by
Splitter
the Map Legend Builder.)
Arrange reporting snippets on top of one Allows you to specify how to put a map together-
another for output via the Render tool. example, putting a legend inside a map – I’d
Overlay suggest a different example such as…overlaying a
table and chart onto a map
Output report snippets into presentation- Saves reports out of Alteryx
quality reports in a variety of formats,
Render including PDF, HTML, XLSX and DOCX.
Add a footer to a report for output via the Apply a footer to the report
Report Render tool.
Footer
Add a header to a report for output via the Apply a header to the report
Report Render tool.
Header
These tools are new in 9.x 10.x 11.0
Create a data table for output via the Creates table for selected data fields
Render tool.
Table
Add and customize text for output via the Allows you to customize a title or other text related
Render tool. aspects to your report
Report Text
Documentation
Icon Tool Description Example
Add annotation or images to the module This will allow users to document what they did
canvas to capture notes or explain during a certain portion of the analysis, so other
Comment processes for later reference. users have an understanding
Add a web page or Windows Explorer Helps you organize URL embedded into the canvas –
window to your canvas. I suggest a different example: Display a web page
Explorer Box for reference in the module or use it to show a
shared directory of macros
Organize tools into a single box which can Helps you organize your module.
Tool be collapsed or disabled.
Container
Spatial
Icon Tool Description Example
Expand or contract the extents of a spatial Identify all of the business on a road, by placing a
object (typically a polygon). buffer on that road to determine who they
Buffer are/where they are
Create spatial points in the data stream Finding a spatial ref to a longitude, latitude
using numeric coordinate fields.
Create Points
Calculate the distance or drive time Creating the drive distance or drive time to a
between a point and another point, line, or customer location
Distance polygon.
Identify the closest points or polygons in As a customer, find me the nearest location to visit,
one file to the points in a second file. optimizing my driving route
Find Nearest
Create drive time trade areas that do not Create drive time trade areas that do not overlap,
Non Overlap overlap for a point file. for a point file
Drivetime
Create a polygon or polyline from sets of Build a trade area- build an object of where all of
points. my customers are coming from. Building a polygon
Poly-Build to fit a series of points
These tools are new in 9.x 10.x 11.0
Split a polygon or polyline into its Break a polygon into a sequential set of points.
component polygons, lines, or points.
Poly-Split
Round off sharp angles of a polygon or Crisp objects rendered on a map. (coastal view
polyline by adding nodes along its lines. more detailed)
Smooth
Extract information about a spatial object, Getting the Lat/Lon of a point. Maybe the area
such as area, centroid, bounding rectangle, square miles of a cover area for Telco/wireless
Spatial Info etc.
Combine two data streams based on the Finding all customers that fall within a defined trade
relationship between two sets of spatial area, based on their geographic proximity
Spatial Match objects to determine if the objects
intersect, contain or touch one another.
Create a new spatial object from the Want to remove overlap from intersecting trade
Spatial combination or intersection of two spatial areas.
Process objects.
Define radii (including non-overlapping) or Defining boundaries for where your customers or
drive-time polygons around specified prospects are coming from
Trade Area points.
Data Investigation
Icon Tool Description Example
Determine which fields in a database have For example, if the user is trying to determine who
a bivariate association with one another. should be contacted as part of a direct marketing
Association
campaign, to estimate the probability a prospect will
Analysis
respond favorably if contacted in the marketing
campaign.
Create a contingency table based on For example you can build a table of males and
Contingency selected fields, to list all combinations of females and how many times they purchase certain
Table the field values with frequency and percent products during a weeks’ time.
columns.
Split the data stream into two or three For example, in the case of a direct marketing
random samples with a specified campaign, we want to know the probability that a
percentage of records in the estimation and prospect that is contacted as part of the campaign
validation samples. If the total is less than will respond favorably to it, before we include that
100%, the remaining records fall in the prospect on the contact list of the campaign in order
holdout sample. to make the decision whether to include the
prospect on the list. As a result, what we really care
about in selecting a predictive model to implement a
business process is the ability of that model to
accurately predict new data (the model that does
the best job of predicting data in the estimation
sample does not do the best job of predicting new
data since it “over fits” the estimation sample). To
Create do this, we need to know the actual outcome in
Samples order to assess model accuracy. As a result, we will
take data where we know the outcomes (perhaps as
a result of a test implementation of that campaign),
and use part of the data (the estimation sample) to
create a set of candidate predictive models, and
another, separate, part of the data (the validation
sample) to compare the ability of the different
candidate models to predict the outcomes for this
second set of data in order to select the model to
put into production. At times the user may want to
use a third portion of the available data (the holdout
sample) for the purposes of developing unbiased
estimates of the economic implications of putting a
model into a production business process.
These tools are new in 9.x 10.x 11.0
Allows you to fit one or more distributions Helpful when trying to understand the overall nature
to the input data and compare them based of your data as well as make decisions about how to
on a number of Goodness-of-Fit* analyze it. For instance, data that fits a Normal
Distributed
statistics. Based on the statistical distribution would likely be well-suited to a Linear
Analysis
significance (p-values) of the results of Regression, while data that is Gamma Distributed be
these tests, the user can determine which better-suited to analysis via the Gamma Regression
distribution best represents the data. tool.
Produce a concise summary report of This tool provides a concise, high level overview of
descriptive statistics for the selected data all the fields in a database. This information can be
fields. invaluable to users in determining what fields they
need to pay special attention to in their analysis. For
instance, if State is a field in a customer database
for an online retailer, and there is only a single
Field customer from the state of Alaska, the analyst will
Summary quickly be able to determine that any analysis
Report (ranging from simple means by state to more
advanced predictive models) that involve the State
field will result in very unreliable information for
Alaska. Given this information, the user may decide
to not use the State field, or to combine the Alaska
customer with customers from another state
(perhaps Hawaii) for analysis purposes.
Produce a frequency analysis for selected For example, What is the distribution of a company’s
fields - output includes a summary of the customers by income level? From the output, you
Frequency
selected field(s) with frequency counts and might learn that 35% of your customers are in high
Table
percentages for each value in a field. income, 30% are in middle high, 25% are in middle
low, and 10% are in low.
This tools plots the empirical bivariate Could be used for reports to visualize contingency
density of two numeric fields using colors tables for media usage. For example, a survey of
to indicate variations in the density of the how important Internet reviews were in making a
data for different levels of the two fields purchase decision (on a 1 to 10 scale), and when
the customer searched for this information (done in
Heat Plot time categories from six or more from the time of
purchase to within one hour of the point of
purchase). The heat plot allows user to see that
those who looked at internet reviews five to six
weeks from the point of purchase were most
influenced by those reviews.
Provides a histogram plot for a numeric Provides a visual summary of the distribution of
field. Optionally, it provides a smoothed values based on the frequency of intervals. For
empirical density plot. Frequencies are example, the US Census using their data on the
displayed when a density plot is not time occupied by travel to work, Table 2 below
Histogram selected, and probabilities when this option shows the absolute number of people who
is selected. The number of breaks can be responded with travel times "at least 15 but less
set by the user, or determined than 20 minutes" is higher than the numbers for the
automatically using the method of Sturges. categories above and below it. This is likely due to
people rounding their reported journey time.
Sample incoming data so that there is In many applications, the behavior of interest (e.g.,
equal representation of data values to responding favorably to a promotion offer) is a fairly
enable effective use in a predictive model. rare event (untargeted or poorly targeted direct
marketing campaigns often have favorable response
rates that are below 2%). Building predictive models
directly with this sort of rare event data is a problem
since models that predict that no one will respond
favorably to a promotion offer will be correct in the
Oversample
vast majority of cases. To prevent this from
Field
happening, it is common practice to oversample the
favorable responses so that there is a higher penalty
for placing all customers into a non-responder
category. This is done by taking all the favorable
responders and a sample of the non-responders to
get the total percentage of favorable responders up
to a user specified percentage (often 50% of the
sample used to create a model).
Replaces the Pearson Correlation For example age and income are related. So as age
Coefficient in previous versions… increases so will income.
Pearson
Correlation The Pearson coefficient is obtained by
dividing the covariance of the two variables
by the product of their standard deviations
These tools are new in 9.x 10.x 11.0
Take a numeric or binary categorical The best use of this tool is for gaining a basic
(converted into a set of zero and one understanding of the nature of the relationship
values) field as a response field along with between a categorical variable and numeric variable.
Plot of Means a categorical field and plot the mean of the For instance, it allows us to visually examine
response field for each of the categories whether customers in different regions of the
(levels) of the categorical field. country spend more or less on women’s apparel in a
year.
Produce enhanced scatterplots, with Shows the relationship between two numeric
options to include boxplots in the margins, variables or a numeric variable and a binary
a linear regression line, a smooth curve via categorical variable (e.g., Yes/No). In addition to the
non-parametric regression, a smoothed points themselves, the tool also produces lines that
conditional spread, outlier identification, show the trends in the relationships. For instance, it
Scatterplot and a regression line. The smooth curve may show us that household spending on restaurant
can expose the relationship between two meals increases with household income, but the rate
variables relative to a traditional scatter of increase slows (i.e., shows “diminishing returns”)
plot, particularly in cases with many as the level of household income increases.
observations or a high level of dispersion in
the data.
Assesses how well an arbitrary monotonic For example, is there a correlation between income
function could describe the relationship level and their level of education
Spearman
between two variables without making any
Correlation
other assumptions about the particular
Coefficient
nature of the relationship between the
variables.
Shows the distribution of a single numeric For example, it can create a plot of the distribution
variable, and conveys the density of the of the number of minutes of cell phone talk time
distribution based on a kernel smoother used by different customer age group categories in a
that indicates the density of values (via particular month. In this way, the tool allows a data
width) of the numeric field. artisan to gain a more complete understanding of a
particular field, or of the relationship between two
Violin Plot different fields (one categorical and one numeric).
In addition to concisely showing the nature
of the distribution of a numeric variable,
violin plots are an excellent way of
visualizing the relationship between a
numeric and categorical variable by
creating a separate violin plot for each
value of the categorical variable.
AB Testing
Icon Tool Description Example
Compare the percentage change in a For example comparing Tuesday lunch traffic at a
performance measure to the same restaurant last year (when the test was not run) to
AB Analysis measure one year prior. Tuesday lunch traffic for the same week this year
(when the test was run).
The Control Select tool matches one to ten AB Controls takes the two things like (seasonality,
control units (e.g., stores, customers, etc.) growth, etc.) and for the treatment stores,
to each member of a set of previously compares them versus the control candidates (the
selected test units, on the criteria such as other stores in the chain) that are nearest to those
seasonal patterns and growth trends for a stores (seasonality, growth) within a given drive
key performance indicator, along with time. – So compare low-growth store to other low-
AB Controls other user provided criteria. growth stores within X distance.
The goal is to find the best set of control units
(those units that did not receive the test treatment,
but are very similar to a unit that did on important
criteria) for the purposes of doing the best
comparison possible.
Determine which group is the best fit for For example choosing which DMA that you would
AB testing. want to compare using up to 5 criteria that you want
to use at the treatment observation level (minimal
AB criteria, as well as the spread between criteria
Treatments between DMAs and within DMAs based on the
criteria that you want). Absolute comparison would
be against the average for the entire chain /
customer set.
These tools are new in 9.x 10.x 11.0
Create measures of trend and seasonal For example, it gives users the ability to cluster
patterns that can be used in helping to treatment observations units based on underlying
match treatment to control units (e.g., trends over the course of a year (general growth
stores or customers) for A/B testing. The rate month-to-month = high-growth, low-growth,
trend measure is based on period to period medium growth), or specific seasonality patterns
percentage changes in the rolling average (climate zones, ) based on your measures (like
AB Trends
(taken over a one year period) in a traffic, sales volumes) and frequency (daily, weekly,
performance measure of interest. The monthly) and over what period of time. Can do day
same measure is used to assess seasonal specific data (Mondays versus Tuesdays versus
effects. In particular, the percentage of the etcetera).
total level of the measure in each reporting
period is used to assess seasonal patterns.
Predictive
Icon Tool Description Example
Provides generalized boosted regression Provides a visual output that enables an
models based on the gradient boosting understanding of both the relative importance of
methods of Friedman.* It works by serially different predictor fields on the target, and the
Boosted
adding simple decision tree models to a nature of the relationship between the target field
Model
model ensemble so as to minimize an and each of the important predictor fields. Such as
appropriate loss function. most important variables related to churn or which
variables to focus on in a targeted campaign.
Provides generalized boosted regression Provides a visual output that enables an
models based on the gradient boosting understanding of both the relative importance of
methods of Friedman.* It works by serially different predictor fields on the target, and the
adding simple decision tree models to a nature of the relationship between the target field
Boosted model ensemble so as to minimize an and each of the important predictor fields. Such as
Model appropriate loss function. most important variables related to churn or which
variables to focus on in a targeted campaign.
*only available in Microsoft SQL Server
2016 and Teradata
Estimate regression models for count data Regression models for count data (e.g., integer
(e.g., the number of store visits a values like the number of numbers to a cell phone
customer makes in a year), using Poisson account, the number of visits a customer makes to
regression, quasi-Poisson regression, or our store in a given year) that are integer in nature.
negative binomial regression. The R
Count functions used to accomplish this are glm() Like linear/logistic regression; typically using with
Regression (from the R stats package) and glm.nb() small numbers (visits to a doctor’s office – always a
(from the MASS package). positive number, and typically an integer value) –
helps address the risk of biased results in linear
regression where you have a relatively small number
of possible positive integer values
Predict a target variable using one or more A decision tree creates a set of if-then rules for
predictor variables that are expected to classifying records (e.g., customers, prospect, etc.)
have an influence on the target variable by into groups based on the target field (the field we
constructing a set of if-then split rules that want to predict). For instance, in the case of
optimize a criteria. If the target variable evaluating a credit union’s applicants for personal
identifies membership in one of a set of loans, the credit union can use the method with data
categories, a classification tree is on past loans it has issued and find that customers
Decision Tree
constructed (based on Gini coefficient) to who had:
maximize the 'purity' at each split. If the (a) an average monthly checking balances of
target variable is a continuous variable, a over $1,500
regression tree is constructed using the (b) no outstanding personal loans
split criteria of 'minimize the sum of the (c) were between the ages of 50 and 59
squared errors' at each split. had a default rate of less than 0.3%, and therefore
should have their loan applications approved.
Predict a target variable using one or more A decision tree creates a set of if-then rules for
predictor variables that are expected to classifying records (e.g., customers, prospect, etc.)
have an influence on the target variable by into groups based on the target field (the field we
constructing a set of if-then split rules that want to predict). For instance, in the case of
optimize a criteria. If the target variable evaluating a credit union’s applicants for personal
identifies membership in one of a set of loans, the credit union can use the method with data
Decision Tree
categories, a classification tree is on past loans it has issued and find that customers
constructed (based on Gini coefficient) to who had:
maximize the 'purity' at each split. If the (a) an average monthly checking balances of
target variable is a continuous variable, a over $1,500
regression tree is constructed using the (b) no outstanding personal loans
(c) were between the ages of 50 and 59
These tools are new in 9.x 10.x 11.0
split criteria of 'minimize the sum of the had a default rate of less than 0.3%, and therefore
squared errors' at each split. should have their loan applications approved.
Relate a binary (yes/no) variable of For instance what is the probability that someone
interest (target variable) to one or more who graduated five years ago in engineering from a
variables (predictor variables) that are university will make a donation to that university if
expected to have an influence on the target they are included in a telephone based fund raising
Logistic variable. campaign? How does this probability compare to the
Regression probability that someone who graduate from the
university 20 years ago with a degree in education
will donate to the same campaign (i.e., which of
these two people represents a better donation
prospect)?
Uses the database’s native language (e.g., For instance what is the likelihood that someone
R) to create an expression to relate a who filed for bankruptcy five years ago will default
binary (yes/no) variable of interest (target on a loan payment vs. someone who defaulted 15
variable) to one or more variables years ago.
(predictor variables) that are expected to
Logistic have an influence on the target variable
Regression expression. Accessible via the regular
In-DB predictive tool palette and will
automatically convert to the In-DB version
of the tool if an In-DB connection exists.
This tool allows a user to create a Can be used to help in financial risk assessment by
feedforward perceptron neural network scoring an applicant to determine the risk in
model with a single hidden layer. The extending credit or detect fraudulent transactions in
neurons in the hidden layer use a logistic an insurance claims database.
(also known as a sigmoid) activation
function, and the output activation function
depends on the nature of the target field.
Specifically, for binary classification
Neural problems (e.g., the probability a customer
Network buys or does not buy), the output
activation function used is logistic, for
multinomial classification problems (e.g.,
the probability a customer chooses option
A, B, or C) the output activation function
used is softmax, for regression problems
(where the target is a continuous, numeric
field) a linear activation function is used for
the output.
Calculate a predicted value for the target This tool takes a model and provides predicted
variable in the model. This is done by model values for the target variable in new data.
Score appending a ‘Score’ field to each record in This is what actually enables a predictive model to
the output of the data stream, based on be incorporated into a business process.
These tools are new in 9.x 10.x 11.0
Time Series
Icon Tool Description Example
Estimate a univariate time series The two most commonly used methods of univariate
forecasting model using an autoregressive (single variable) time series forecasting tools are
TS ARIMA integrated moving average (or ARIMA) ARIMA and exponential smoothing. This tool
method. implements the ARIMA model, and can be run in a
fully automated way. In other words, the user
These tools are new in 9.x 10.x 11.0
Estimate a univariate time series This tool implements the creation of univariate time
forecasting model using an exponential series models using exponential smoothing. As with
smoothing method. the TS ARIMA tool, the user can run the tool in a
fully automated mode or alter any of the method
TS ETS parameters used in creating models. It can help you
understand the effect that factors such as economic
and market conditions, customer demographics,
pricing decisions and marketing activities have on
your business
This tool allows a user to take a data This tool is used primarily as a preparation step for
stream of time series data and “fill in” any using downstream time series-related tools and
gaps in the series macros. Some time series tools will produce
unexpected results or errors if the data stream
TS Filler contains gaps in the time series, e.g. you have a
series of data that is supposed to contain
measurements every 5 minutes, but you don’t
actually have measurements covering every 5
minutes.
Provide forecasts from either an ARIMA or This tool can be used for inventory management. For
ETS model for a specific number of future instance based on past history and inventory levels
periods. to help predict what your inventory levels should be
TS Forecast
in the next 3 months. The forecasts are carried out
using models created using either the TS ARIMA or
TS ETS tools.
Create a number of different univariate This tool allows a number of different, commonly
time series plots, to aid in the used time series plots to be created. One particularly
understanding the time series data and interesting plot is the Time Series Decomposition
determine how to develop a forecasting plot breaks a time series into longer term trend,
TS Plot model. seasonal, and error components. This can be
particularly useful in spotting changes in underling
trend (say the use of a particular cell tower) in what
is otherwise “noisy” data due to time of day and day
of week effects.
Predictive Grouping
Icon Tool Description Example
Appends the cluster assignments from a K- After you create clusters using a cluster analysis tool
Centroids Cluster Analysis tool to a data you can then append the cluster assignments both
Append
stream containing the set of fields (with to the database used to create the clusters as well
Cluster
the same names, but not necessarily the as to new data not used in creating the set of
clusters.
These tools are new in 9.x 10.x 11.0
Step 1 of a Market Basket Analysis: Take For example the Market basket rule can state that is
transaction oriented data and create either someone purchases Beer they are most likely to also
a set of association rules or frequent item purchase pizza as well or if they purchase Fish, they
Market Basket sets. A summary report of both the are most likely to purchase white wine at the same
Rules transaction data and the rules/item sets is time.
produced, along with a model object that
can be further investigated in an MB
Inspect tool.
Step 2 of a Market Basket Analysis: Take A tool for inspecting and analyzing association rules
the output of the MB Rules tool, and and frequent items sets. So do the Beer and Pizza
provide a listing and analysis of those rules MB Rules really fit?
that can be filtered on several criteria in Filter association rules and review them
order to reduce the number or returned graphically (two different visuals)
Market Basket rules or item sets to a manageable Can look at certain levels of
Inspect number. support/confidence/lift
Used to fine tune rules (which ones to
use/keep/focus on)
What are the rules that begin to make sense --
> outputs the rules (yxdb stream)
Reduce the dimensions (number of numeric Groups fields together by examining how a set of
fields) in a database by transforming the variables correlate or relate with one another- such
Principal
original set of fields into a smaller set that as household income level and educational
Components
accounts for most of the variance (i.e., attainment for people living in different geographic
areas. For a particular area we may have the
These tools are new in 9.x 10.x 11.0
information) in the data. The new fields are percentage of people who fall into each of eight
called factors, or principal components. different educational groups and the percentage of
households that fall into nine different income
groups. It turns out that household income and
educational attainments are highly related
(correlated) with one another. Taking advantage of
this, a principal components analysis may allow a
user to reduce the 17 different education and
income groups into one or two composite fields that
are created by the analysis. These two composite
fields would capture nearly all the information
contained in the original 17 fields, greatly simplifying
downstream analyses (such as a cluster analysis).
Connectors
Icon Tool Description Example
Read CSV, DBF and YXDB files from Get data that is stored in Amazon S3- useful for the
Amazon S3 Amazon S3. Analytics Gallery- b/c it is hosted in Amazon
Download
Write CSV, DBF and YXDB files to Amazon Put data in Amazon S3
S3.
Amazon S3
Upload
Retrieve data from a specified URL, Pull data off the web. Download competitors store
including an FTP site, for use in a data listings from their website
stream.
Download
Bring in data from Google Analytics A company can combine google analytics data with
Google other sources of data.
Analytics
The Marketo Input Tool reads Marketo Allows users to access data directly from Marketo
records for a specified date range. Two
types of Marketo records can be retrieved:
1. LeadRecord: These are lead
records and there will be one
record for each lead.
2. ChangeRecord: These records
Marketo
track the activities for each lead.
Input
There are potentially many
ChangeRecord Records for each
LeadRecord
The Input Tool retrieves records in batches
of 1000 records, where the Marketo
Append tool makes an API request for each
record
The Marketo Append tool retrieves Marketo Appends records accessed from Marketo
records and appends them to the records
of an incoming data stream. Two types of
Marketo records can be retrieved:
1. LeadRecord: These are lead
records and there will be one
record for each lead.
2. ActivityRecord: These records
Marketo track the activities for each lead.
Append There can be many ActivityRecord
records for each LeadRecord.
Both of these record types are retrieved by
specifying a LeadKey, which must be
supplied by an upstream tool. More
information on LeadKeys can be found in
the Append Tab section under
Configuration Properties.
These tools are new in 9.x 10.x 11.0
The Marketo Output tool calls to the Allows users to write data back into Marketo
Marketo API function: syncLead(). Data is
written back to Marketo using an 'Upsert'
Marketo
operation. This means if a record doesn't
Output
currently exist, it will be created (see
'Inserts' below). If a record currently
exists, it will be updated
Write data to a MongoDB database. Allows users to writing data back into MongDB
Mongo DB MongoDB is a scalable, high-
Output performance, open source NoSQL
database.
Read and query data from a MongoDB Allows users to read data that is stored in MongoDB-
MongoDB database. MongoDB is a scalable, high- can be used to enable Big Data analytics such as
Input performance, open source NoSQL Ecommerce transaction analysis.
database.
Read and query data from Salesforce.com. Pull data from SFDC
Address
Icon Tool Description Example
Standardize address data to conform to the This can be used to help in Marketing optimization
U.S. Postal Service CASS (Coding Accuracy by improving on the accuracy of the address and
Support System) or Canadian SOA customer information – can improve geocoding and
CASS
(Statement of Accuracy). fuzzy matching by standardizing address data –
CASS Certification is required by USPS for bulk mail
discounts.
Determine the coordinates (Latitude and Could be used for processing customer files to
Longitude) of an address and attach a identify clustering of customers.
Canada
corresponding spatial object to your data Geocoding – assigning an address to a physical
Geocoder
stream. Uses multiple tools to produce the location is the first crucial step to any spatial
most accurate answer. analysis.
Determine the coordinates (Latitude and Could be used for processing customer files to
Longitude) of an address and attach a identify clustering of customers.
corresponding spatial object to your data Geocoding – assigning an address to a physical
US Geocoder
stream. Uses multiple tools to produce the location is the first crucial step to any spatial
most accurate answer. analysis.
Parse a single address field into different Take an address field and break it into different
fields for each component part such as: pieces. Example, address in a single field/column-
Parse
number, street, city, ZIP. Consider using city, zip, street
Address
the CASS tool for better accuracy.
Determine the coordinates (Latitude and Find the latitude, longitude of a customer
Longitude) of an address and attach a
corresponding spatial object to your data Could be used for processing customer files to
stream. Consider using the U.S. Geocoder identify clustering of customers.
Street
or Canadian Geocoder macros for better Geocoding – assigning an address to a physical
Geocoder
accuracy. location is the first crucial step to any spatial
analysis.
These tools are new in 9.x 10.x 11.0
Determine the coordinates (Latitude and Find the latitude longitude of a ZIP+4
Longitude) of a 5, 7, or 9 digit ZIP code.
US ZIP9
Coder
Demographic Analysis
Icon Tool Description Example
Append demographic variables to your data Append demographics to a trade area- example-
stream from the installed dataset(s). what is the population within 10 mins of my store
Allocate
Append
Input behavior cluster names, IDs and Generates a list of all of the clusters in the
other meta info from an installed dataset. segmentation system
Behavior
Metainfo
Append a behavior cluster code to each Example- It appends a lifestyle segments to records
record in the incoming stream. based on a geo-demographic code (ie. block group
Cluster Code code)
Compare two behavior profile sets to Calculate correlations and market potential between
output a variety of measures such as two profiles- Market potential for new customers in a
Compare market potential index, penetration, etc. new market that fit the same profiles of current
Behavior customers
Create behavior profiles from cluster Example- it can show the distribution of customers
information in an incoming data stream. across the clusters in the segmentation
Create
Profile
Generate a comparison report from two Shows market potential towards the likelihood of
behavior profile sets for output via the certain purchasing behaviors at the segment level
Report Render tool.
Comparison
These tools are new in 9.x 10.x 11.0
Generate a detailed report from a behavior Generates a report of distributions and penetration
profile set for output via the Render tool. of customers by segment
Report Detail
Generate a rank report from a set of Use to rank geographies based on market potential
behavior profiles for output via the Render of prospective customers
Report Rank tool.
Input a behavior profile set from an Used in conjunction or after the Write Behavior
installed dataset or external file. Profile set- Provides the ability to open the
previously saved profile sets and use for analysis
Profile Input
Output a profile set (*.scd file) from Taking a customer file that has a segment appended
behavior profile sets in an incoming data to it and saving it into an Alteryx format so that it
Profile stream. Generally only used when using can be used repeatedly
Output the standalone Solocast desktop tool.
Calgary
Icon Tool Description Example
Find the counts of predefined sets of values Very fast, find a count of how many times a value
that occur in a Calgary database file. occurs in a file- this can be filtered or based on a
Cross Count user defined condition
Find the counts of sets of values (from the Very fast, find a count based on a filter that is gone
incoming data stream) that occur in a into the tool from previous process. Example-
Cross Count Calgary database file. number of people by segmentation group in a trade
Append area(the advantage is you are getting a count,
rather than extracting the data and getting the
count yourself)
Input data from the Calgary database file Very Fast input data from an indexed file. This can
Calgary with a query. be filtered or based on a user defined condition
Input
Query a Calgary database dynamically Very fast extraction data from a large index file,
based on values from an incoming data based on a condition passed to it from inside
Calgary Join stream. Alteryx. Example- extract all businesses with
employees greater than 1000 (where 1000 is from
an Alteryx calculation above it)
Create a highly indexed and compressed Create a highly indexed and compressed file for use
Calgary Calgary database which allows for with any of the Calgary tools (to enable the speed)
Loader extremely fast queries.
Developer
Icon Tool Description Example
Return the results of a data stream directly Have an Alteryx process being called from the inside
to an API callback function. For use with of another program. Example- someone wants to
API Output custom application development. have their own Application interface, but use Alteryx
for the processing. (ie. Mortgage calculator)
The Base 64 Encoder macro issues a base Useful where there is a need to encode binary data
64 encode string that needs to be stored and transferred over media
that is designed to deal with textual data. This is to
Base64 ensure that the data remains intact without
Encoder modification during transport.
(http://en.wikipedia.org/wiki/Base64)
Read from input files or databases at You have a process in Alteryx that can pull a
runtime using an incoming data stream to value/extract from a Database. Example, you have
Dynamic dynamically choose the data. Allows for a process in Alteryx that says which zip codes have
Input dynamically generated queries. the highest value. Then that extracts data from
another source based on that zip code – decreases
overhead of reading in the entire database
Dynamically (using data from an incoming If you bring in a csv file, where all of the column
stream) rename fields. Useful when names have underscores, this could replace all of
Dynamic
applying custom parsing to text files. the underscores to spaces. It will not change the
Rename
data, but will change the column names.
Replace data values in a series of fields Say you have a hundred different income fields and
(using a dynamically specified condition) instead of the actual value in each field, you want to
Dynamic
with expressions or values from an represent the number with a code of A, B, C, D, etc.
Replace
incoming stream. that represents a range. The Dynamic Replace tool
can easily perform this task.
Select or de-select fields by field type or an Select the first 10 columns of a file regardless of
Dynamic expression. what they contain
Select
Output the schema (field types and names, Gives users the types of the data, that the column
etc.) of a data stream. contains(description of the data)
Field Info
The JSON Parse tool separates Java Script This tool is used in our Social Media tools where we
Object Notation text into a table schema are pulling data from web API’s that return JSON via
for the purpose of downstream processing. the Download tool. The JSON is parsed before it is
JSON Parse
It can be built back up into usable JSON presented to the user into a nicely formatted table.
format by feeding the output into the JSON
Build tool.
Write log messages to the Output Window. Gives you extra messaging information in the
Generally used in authoring macros. output. The number of columns in a certain table,
Message the number of rows, or errors.
Execute an R language script and link This free open-source programming language has
incoming and outgoing data from Alteryx to 1000’s of algorithms available, or you can write your
R R, an open-source tool used for statistical own algorithm in R and bring it into Alteryx
and predictive analysis.
Run external programs as part of an Call other programs from inside Alteryx- example,
Alteryx process. curl.exe was widely used to scape data, before
Run Alteryx had the download tool
Command
Test assumptions in a data stream. Check conditions inside modules and give errors if
required- example- if you want to make sure all of
Test the records are joined before you start the process,
the error/test would stop the module
Social Media
Icon Tool Description Example
Search Foursquare Venues by a location A company can analyze location data of where users
with an option of filtering by a search term. are checking in.
Foursquare
These tools are new in 9.x 10.x 11.0
Search tweets of the last 7 days by given A company can analyze what people are saying
Twitter search terms with location and user around their company over that last week.
Search relationship as optional properties.
Laboratory
Icon Tool Description Example
The Blob Convert tool will take different Convert a PNG, GIF or JPG Blob to Report Snippet.
data types and either converts them to a Allows the Reporting Tools to recognize the incoming
Blob Convert Binary Large Object (Blob) or takes a Blob images as report snippets for building reports.
and converts it to a different data type.
The Blob input tool will read a Binary Large Read a list of documents from disk either as blobs or
Object such as an image or media file, by encoded strings.
Blob Input browsing directly to a file or passing a list
of files to read.
The Blob Output tool writes out each record Write a list of documents to disk either as blobs or
into its own file encoded strings.
Blob Output
The JSON Build tool takes the table schema After parsing JSON data, send it through the
of the JSON Parse tool and builds it back Formula tool to modify it as needed and then use
JSON Build into properly formatted Java Script Object the JSON Build tool to output properly formatted
Notation. JSON.
The Make Columns tool takes rows of data This tool is useful for reporting or display purposes
and arranges them by wrapping records where you want to layout records to fit nicely within
into multiple columns. The user can specify a table. Arrange a table of 10 records into a table of
Make
how many columns to create and whether 5 records spread across 2 columns.
Columns
they want records to layout horizontally or
vertically.
The Throttle tool slows down the speed of This is useful for slowing down requests sent per
the downstream tool by limiting the minute when there are limits to how many records
Throttle number of records that are passed through can be sent in.
the Throttle tool.
Interface
Icon Tool Description Example
The Action tool updates the configuration
of a module with values provided by
Action interface questions, when run as an app or
macro.
The Check Box tool will display a check box
option to the end user in an app or macro.
Check Box The resulting value, True (checked) or
False (unchecked), is passed to
downstream tools.
The Condition tool tests for the presence of
user selections. The state is either true or
Condition false.
Establish a database connection for an In- If you want to connect directly to Oracle or SQL
Connect DB workflow Server.
In-DB
Filter In-DB records with a Basic filter or If you are looking at a dataset of customers, you
Filter with a Custom expression using the may want to only keep those customer records that
In-DB database’s native language (e.g., SQL). have greater than 10 transactions or a certain level
of sales and want this process to run in a database.
These tools are new in 9.x 10.x 11.0
Create or update fields in an In-DB data For instance if there is missing or NULL value, you
stream with an expression using the can use the formula tool to replace that NULL value
Formula database’s native language (e.g., SQL). with a zero and run this process in a database.
In-DB
Stream data from an In-DB workflow to a If you already have an establish Alteryx Workflow
Data Stream standard workflow, with an option to sort and you want to bring data from a database into the
Out the records. workflow.
Summarize In-DB data by grouping, You could determine how many customers you have
Summarize summing, counting, counting distinct fields, in the state of NY and how much they have spent in
In-DB and more. The output contains only the total or an average per transaction while all of this
result of the calculation(s). process takes place in the database.
Combine two or more In-DB data streams Transaction data stored in different files for different
with similar structures based on field time periods, such as a sales data file for March and
Union
names or positions. In the output, each a separate one for April, can be combined into one
In-DB
column will contain the data from each data stream for further processing inside the
input. database.
Use an In-DB data stream to create or If you need to update and save the results of the in-
Write update a table directly in the database. database blending into the database.
In-DB