0% found this document useful (0 votes)
102 views21 pages

SP141 QuickReferenceGuide

Uploaded by

Supplier Renault
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views21 pages

SP141 QuickReferenceGuide

Uploaded by

Supplier Renault
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

SP141

This document has been provided for the purpose of giving you a takeaway
for the Spotfire training experience. The pages in this document contain
important concept reminders, organized by the key learning objectives for
this course.
Course SP141 TIBCO Spotfire Analyst Advanced Calculations - Key learning objectives:
SPOTFIRE EXPRESSIONS - Build expressions to incorporate functions and property controls to enhance visual analysis
Page 2 • Insert Calculated Column • Columns • Properties
• Custom Expression • Functions • Recent expressions
Page 3 • Property Controls • List box • Insert as Value
• Input field • Drop-down list • Insert as Text
• Slider • Label • $map and $esc functions
Page 4 • Binning functions • Logical functions • Operators
• Conversion functions • Math functions • Property functions
• Date and Time functions • Ranking functions • Spatial functions
Page 5 • Statistical functions • Text functions • Expression shortcuts
Page 6 • Expression syntax • Loose format • THEN, [Value]
Page 7 • OVER • Axis.Axis Name • NavigatePeriod
Page 8 • Node navigation • Previous() • Parent()
• All() • AllPrevious() • Intersect()

RELATIONSHIPS & PREDICTIONS – Forecast future values or predict columns based on existing values
Page 9 • Lines & Curves • Calculated lines • Drawn lines
Page 10 • Forecast • Holt-Winters • Confidence interval
Page 11 • Data Relationships • numerical vs. categorical • R-squared values
• numerical vs. numerical • categorical vs. categorical • p-values
Page 12 • Regression Modeling • Model Summary • Diagnostic Visualizations
Page 13 • Classification Modeling • Model Summary • Diagnostic Visualizations
STATISTICAL ENGINES - Configure data functions to use the functionality of R, S+, SAS, and MATLAB within Spotfire
Page 14 • Entering TERR script
Page 15 • Data Function • Statistical engines • Samples
Page 16 • Register Data Function • Script • Input and Output
MULTIVARIATE DATA ANALYSIS - Explore computational tools in order to bring order to multivariate data
Page 17 • Normalization • Empty values • Replace or break lines
Page 18 • Line Similarity • Correlation similarity • Euclidean distance
Page 19 • K-means Clustering • Interpret results • Additional information
Page 20 • Hierarchical Clustering • Dendrograms • Clustering methods
Page 21 • Pruning line • Cluster ID column • Clustering settings

© TIBCO Software Inc. Page 1 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
SPOTFIRE EXPRESSIONS

Calculate new columns or alter the expression applied to a visualization property using this dialog:

Insert ▼ Calculated Column … -or-

Available columns can Properties are values stored outside Functions can be found by
be added to the expression; the data table and can be used as part of viewing a specific functions
use the search field to limit the expression. Right-click to Insert as Category, or use the
columns displayed Value when property is part of a search field to limit
mathematical calculation. functions displayed

Expressions dialog

Create expressions here using a combination of Available


columns, Available properties and Functions

Recent expressions allows you to insert an Display name may edit a custom
expression you have recently created recently, perhaps expression on a visualization property
in another visualization or even another analysis session just to change the title on the selector

© TIBCO Software Inc. Page 2 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
SPOTFIRE EXPRESSIONS

Properties may be:


• Numerical values Property controls can be created
• Dates in order to collect or select values
• Text which are stored as:
• Column names • Document Properties
• Column values • Data Table Properties
• Expressions • Column Properties
• Aggregations

• Properties can be referenced in two ways in expressions:


 PropertyType(“property”) , denotes a property as a value interpreted as expression is run
Example: [Euro]*DocumentProperty(“CurrencyConversionFactor”)

 ${property} , wrapping in ‘${‘ and ‘}’ denotes a property as a preprocessed text


Examples: ${AggregationMethod}([Sales])
${SelectedExpression}
Sum([${ColumnName}])

• Other syntax you may encounter


 $map , used with list data types – where multiple values are input for a property,
‘$map’ instructs Spotfire to loop through list

 $esc , used to replace ‘[‘ and ‘]’ to indicate that a property value is a column name

© TIBCO Software Inc. Page 3 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
SPOTFIRE EXPRESSIONS

Spotfire functions are organized Binning functions are an option for implementing binning
into categories:
instead of using: Insert ▼ Binned Column …
-or-

Conversion functions change data types


select examples
String convert to a string value
SN substitute null (note "null" means empty value)

Date and Time functions


select examples
DateAdd adds an interval to a Date, Time or a DateTime
DateDiff calculates the difference* between two Date, Time or DateTime values
DatePart returns a specified part of a Date, Time or a DateTime (such as day, week, month year)
* TimeSpan is a data type describing the difference between two dates,
it has 5 possible fields: Days.Hours:Minutes:Seconds.Milliseconds

Logical functions Operators


select examples select examples
If tests a condition and returns a result + add
Case tests multiple conditions and returns result - subtract
Is Null part of an If or Case statement, tests if value is null or empty * multiply
!= part of an If or Case statement, tests if two values are not equal / divide
= equal to
Math functions > greater than
select examples <= less than or equal to
RandBetween returns a random number <> not equal to
Log, Log10, Ln converts to logarithmic values % return the remainder
Round round to a specific number of digits past the decimal & concatenate
Abs returns the absolute value And, Not, Or, Xor logical operators

Ranking functions Property functions


select examples select example
Rank assigns an integer ranking (skips at replicates) RowId returns a unique identifier for each row
DenseRank assigns an integer ranking

Spatial functions
select examples
GreatCircleDistance returns the shortest distance between two points, calculated on the surface of a unit sphere

© TIBCO Software Inc. Page 4 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
SPOTFIRE EXPRESSIONS

Spotfire functions are organized Statistical functions


into categories: select examples
WeightedAverage include a weighting factor
TrimmedMean remove outlying values
ValueForMax another value associated with a maximum
UniqueCount replicates are counted only once

Text functions
select examples
UniqueConcatenate return only one from replicates
Substitute replace text
Trim removes whitespace from the beginning and end of string
Right, Left, Mid return specific characters from string
Upper, Lower change case
RXReplace, ~= functions are based on regular expressions

Cumulative Sum Sum([Amount]) THEN Sum([Value]) OVER (AllPrevious([Axis.X]))

Moving Average Sum([Amount])


THEN Avg([Value]) OVER (LastPeriods(3,[Axis.X]))
THEN If (Count() OVER (LastPeriods(3,[Axis.X]))=3,[Value], null )

Difference Sum([Amount]) THEN [Value] - First([Value]) OVER (NavigatePeriod([Axis.X],0,-1))

Difference % Sum([Amount]) THEN ([Value] / First([Value]) OVER (NavigatePeriod([Axis.X],0,-1))) -1


Difference
Sum([Amount]) THEN [Value] - First([Value]) OVER (NavigatePeriod([Axis.X],"Year",-1))
Year Over Year
Expression Shortcuts

Difference Sum([Amount]) THEN ([Value] / First([Value]) OVER (NavigatePeriod( [Axis.X],"Year", -1))) -1


% Year Over Year
% of Total Sum([Amount]) THEN [Value] / Sum([Value]) OVER (All([Axis.X]))

Year to Date Sum([Amount]) THEN Sum([Value]) OVER (Intersect(AllPrevious([Axis.X]),


Total NavigatePeriod([Axis.X],"Year", 0,0)))

Year to Date Sum([Amount]) THEN Sum([Value]) OVER (Intersect(AllPrevious([Axis.X]),


Growth NavigatePeriod([Axis.X],"Year", 0,0))) THEN ([Value] / First ([Value]) OVER
(NavigatePeriod([Axis.X],"Year", -1))) -1

Change Sum([Amount]) THEN ([Value] / Sum(If([CategoryIndex.X]=0,[Value],0)) OVER


Relative to Start (All([Axis.X]))) -1

Change Sum([Amount]) THEN ([Value] / Sum(If(([X.Year]=2012) AND ([X.Quarter]=2),[Value],0))


Relative to Fixed Point OVER (All([Axis.X]))) -1

Compound Annual Sum([Amount]) THEN (Real([Value] / Sum ([Value]) OVER


Growth Rate (CAGR) (NavigatePeriod([Axis.X],"Year", -1))) ^ (1 / ([X.Year] - First ([X.Year]) OVER (NavigatePeriod(
[Axis.X],"Year",-1))))) -1

© TIBCO Software Inc. Page 5 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
SPOTFIRE EXPRESSIONS

Syntax notes for Spotfire expression language


• [column name] , wrapping in ‘[‘ and ‘]’ forces Spotfire to interpret as a column name reference
Examples: [Electronics] [# of Visits!]
[Customer ID] [@company.com]

• <expression> , wrapping in ‘<‘ and ‘>’ forces Spotfire to interpret as categorical


Examples: <[Store Location]>
<Quarter([Most Recent Purchase])>

• Function(Arg1, Arg2, Arg3) , arguments are separated by ‘,’ and may be columns, #s, or text
Examples: Sum([Electronics],[Furniture],[Toys])
Avg([Profit])/Sqrt(3)
Substitute(“pound sign”, [Label], “hash”)

• Functions, may not require parenthesis - check the description box


Examples: [Cents]/100
case when [Ratio]>0 then "↑" when [Ratio]<0 then "↓" else "no change" end

• Functions, may be nested as variables within other functions


Examples: If(Len([Name])>10,"Long","Short")
If([Electronics]/[Toys] Is Error, null, [Electronics]/[Toys])

• Function([Secondary Data Table].[Column]) , columns from other


data tables can be included in custom expressions
Examples: Avg([Math]) - Avg([Average SAT scores by state].[Math])
Spotfire expression language

 spaces are ignored [except within column names]


Examples:
[Electronics]/[Toys] Sum([Electronics])
[Electronics] / [Toys] Sum ( [Electronics] )
is loosely formatted

 case is ignored
Examples: AVG([TOYS]) Avg([toys])
avg([Toys]) avg([toys])

 single quotes and double quotes are treated the same


Examples: Concatenate([City], “, “ ,[State])
Concatenate([City], ‘, ‘ ,[State])

• THEN , a keyword which breaks an expression into separate portions to facilitate processing on
database computational engines or perform more efficient calculations on in-memory data
Example: Sum([Sales])
THEN Avg([Value]) OVER (LastPeriods(3,[Axis.X]))
THEN If (Count() OVER (LastPeriods(3,[Axis.X]))=3,[Value], null )

• [Value] , a placeholder variable which represents the results of the previous THEN expression
Example: Sum([Sales]) THEN [Value] / Sum([Value]) OVER (All([Axis.X]))

© TIBCO Software Inc. Page 6 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
SPOTFIRE EXPRESSIONS

• OVER , may be applied in calculated columns or custom expressions in four ways

 OVER ([Column Name]) Examples: Avg([Sales]) OVER([Region])

 OVER ([Axis.Axis Name]) Sum([Sales]) / Sum([Sales]) OVER([Axis.Panels])

 OVER (NodeNav([Axis.Axis Name])) Sum([Sales]) OVER (AllPrevious([Axis.X]))


Sum([Sales]) OVER (LastPeriods(3,[Axis.X]))/3
 OVER (NodeNav([Column Name])) [Sales]-Sum([Sales]) OVER (Previous([Quarter]))

OVER functions redirect standard groupings for visualization properties


Node Navigation

All Next Intersect LastPeriods NextPeriod


Parent AllNext FirstNode ParallelPeriod PreviousPeriod
Previous AllPrevious LastNode

NavigatePeriod
Arg 1: Hierarchy to navigate Optional ...
[Axis.Name] Arg 4: Level to move down to
Arg 2: Level to move to in hierarchy number of steps
0, 1, 2 (relative to leaf level) name of level
“Month”, “Quarter”, “Year”
Arg 3: Number of steps to move sideways in hierarchy When Arg 4 is omitted, you are
-2, -1, 0, 1, 2 navigated to the leaf level

Valid OVER groupings based


on visualization types

Syntax is:
[Axis.Axis Name]

© TIBCO Software Inc. Page 7 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
SPOTFIRE EXPRESSIONS

All() function

Sum([Sales]) / Sum([Sales]) OVER (All([Axis.X]))

Previous() function

Sum([Sales]) - Sum([Sales]) OVER (Previous([Axis.X]))

AllPrevious() function

Sum([Sales]) OVER (AllPrevious([Axis.X]))

Parent() function

Sum([Sales]) / Sum([Sales]) OVER (Parent ([Axis.X]))

Intersect() function

Sum([Sales]) OVER (Intersect(Parent([Axis.X]),(AllPrevious([Axis.X])))

© TIBCO Software Inc. Page 8 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
RELATIONSHIPS & PREDICTIONS

Visualization Properties ♦ Lines & Curves

Lines & Curves can be added to a variety of


different visualization types. These lines and
curves may be drawn based on values or
expressions you provide, or calculated based
upon standard aggregations, custom
expressions, or data fitting algorithms.

• Horizontal Line or Vertical Line


– Straight line – fixed, aggregate, property or expression calculated
– Line from data table – draws line for each value in column
drawn
– Average and ± 1 Standard Deviation
• Curve Draw – draws line based upon a simple expression
• Curve from Data Table – draws line based upon expression which can include column values
• Line from column values – draws line based on X and Y columns
• Straight Line Fit
• Polynomial Curve Fit – select the degree, 2 through 5
• Logistic Regression Curve Fit – may fix max and/or min
• Power Curve Fit
• Logarithmic Curve Fit
• Exponential Curve Fit
• Gaussian Curve Fit – may fix position, width, and/or amplitude

For more information


about each option
above, click Help and
scroll to the Add section.

© TIBCO Software Inc. Page 9 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
RELATIONSHIPS & PREDICTIONS

Visualization Properties
♦ Lines & Curves
• Forecast – Holt-Winters ...

If your time-series data is not spaced equally,


Actual – this is the data which exists in your data
you can apply one of the Time-series
table(s)
aggregation methods in order to make the
visualization suitable for forecasting. Fitted – the actual data is submitted to a smoothing
calculation
Forecast – values are projected forward in time from
Time-series aggregation methods

the actual data


Confidence – two lines are drawn which represent the
boundaries of the confidence interval

© TIBCO Software Inc. Page 10 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
RELATIONSHIPS & PREDICTIONS

Tools ▼ Data Relationships … The Data Relationships tool allows you to make pair-wise
comparisons of data columns, in an effort to determine if there are
any potential relationships between the data in those columns.

COLUMN TYPES COMPARISON METHOD PRIMARY EVALUATION


COMPARED MEASURE AND VISUALIZATION

* Parametric – assumes that the data is normally distributed and that the
variances of the groups or errors are approximately equal

† Nonparametric – uses the rank order of the data rather than the actual
values; is appropriate when the parametric assumption
of normality and equality of variance is not met

Interpreting results:
As R2 values approach 1, the Interacting with the results table will
correlation or inverse correlation allow you to view the raw data
between X and Y variables is stronger. columns for the marked rows in the
corresponding results visualizations.

As p-values get smaller, the


statistical significance of the
comparison is stronger.

© TIBCO Software Inc. Page 11 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
RELATIONSHIPS & PREDICTIONS

Using predictive modeling tools involves a three-step process:


Fitting the model Evaluating the model Predicting from the model
These statistics are based on R functions, and are calculated using the TIBCO
Tools ▼ Regression Modeling… Enterprise Runtime for R (TERR) statistical engine

Linear Regression Method (parametric)


Fitting the model
Model Summary Residual standard error low as possible
R-squared 1.0 is ideal (ranges 0 to 1)
p-value smaller is better
Table of Coefficients p-values small indicates predictor is important
Residuals vs Fitted Residual values randomly distributed around zero (no patterns)

Normal Quantile-Quantile Shape of curve ideal is straight line (see help for other line shapes)

Scale – Location Square root of residuals randomly distributed


Cook’s Distance High peaks large impact on coefficients
Response vs Fitted Points approximate a line ideal is slope of 1, through origin / 45-degrees

Evaluating the model


Evaluation Summary R2 (R-squared) 1.0 is ideal (ranges 0 to 1)
SSE (Sum of Squares Error ) smaller is better
Residuals vs Predicted Residual values randomly distributed around zero (no patterns)
Response vs Predicted Points approximate a line ideal is slope of 1, through origin / 45-degrees
Normal Quantile-Quantile Shape of curve ideal is straight line (see help for other line shapes)

Regression Tree Method (nonparametric)


Fitting the model
Model Summary Node), split, n, deviance, yval for root, each branch and leaves (terminal nodes*)
Residuals vs Fitted Residual values randomly distributed around zero (fitted = terminal nodes)
Response vs Fitted Points approximate a line ideal is slope of 1, through origin / 45-degrees
Evaluating the model
Evaluation Summary R2 (R-squared) 1.0 is ideal (ranges 0 to 1)
SSE (Sum of Squares Error ) smaller is better
Residuals vs Predicted Residual values randomly distributed around zero (no patterns)
Response vs Predicted Points approximate a line ideal is slope of 1, through origin / 45-degrees

© TIBCO Software Inc. Page 12 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
RELATIONSHIPS & PREDICTIONS

Using predictive modeling tools involves a three-step process:


Fitting the model Evaluating the model Predicting from the model
These statistics are based on R functions, and are calculated using the TIBCO
Tools ▼ Classification Modeling… Enterprise Runtime for R (TERR) statistical engine

Logistic Regression Method (parametric)


Fitting the model
Model Summary Deviance values low as possible
AIC used for comparison, lower means better model
Table of Coefficients p-values small indicates predictor is important
Residuals vs Fitted Residual values randomly distributed around zero
Normal Quantile-Quantile Shape of curve ideal is straight line (see help for other line shapes)
Predicted Probability 2 plots, one for each result value ideal is all values close to 1 / all values close to 0

ROC Curve Shape of curve ideal is 0,0 to 0,1 then to 1,1

Evaluating the model


Evaluation Summary Accuracy 1.0 is ideal (consider as a percentage)
Kappa 1.0 is ideal (ranges between -1 and 1)
Confusion Matrix Matches between observed & predicted more correct predictions is better
Predicted Probability 2 plots, one for each result value ideal is all values close to 1 / all values close to 0
ROC Curve Shape of curve ideal is 0,0 to 1,0 then to 1,1

Classification Tree Method (nonparametric)


Fitting the model
Model Summary Node), split, n, loss, yval (yprob) for root, each branch and leaves (terminal nodes*)
Predicted Probability 2 plots, one for each result value all values close to 1 / all values close to 0
ROC Curve Shape of curve ideal is 0,0 to 1,0 then to 1,1
Evaluating the model
Evaluation Summary Accuracy 1.0 is ideal (consider as a percentage)
Kappa 1.0 is ideal (ranges between -1 and 1)
Confusion Matrix Matches between observed & predicted more correct predictions is better
Predicted Probability 2 plots, one for each result value ideal is all values close to 1 / all values close to 0
ROC Curve Shape of curve ideal is 0,0 to 0,1 then to 1,1

Other menu items/icons: Launch TIBCO Spotfire User’s Guide


View ▼ Analytic models
Edit model Evaluate model
Duplicate model Predict from model Insert ▼ Predicted Columns …

© TIBCO Software Inc. Page 13 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
STATISTICAL ENGINES

TIBCO Enterprise Runtime for R (TERR)


TERR implementations

Entering TERR Script


1. Within the Spotfire Expression language: TERR_functions
Your script may contain multiple
input variables ...
(input1, input2, ... inputN)
... but may result in only one output
variable: (output)

2. Within the Spotfire Expression language: Create Expression Functions


Edit▼ Data Function Properties

3. As a Data Function

© TIBCO Software Inc. Page 14 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
STATISTICAL ENGINES

• Calculations based on scripts from different statistical engines:


DATA
FUNCTION

• Some engines require configuration of TIBCO Spotfire Statistics Services (TSSS)


• Some engines may require a working installation of third party software
• Function sources:
1. Sample Data Functions are provided for each engine
2. Can be defined from the existing functions in the corresponding package
repository in TSSS
3. You can write your own scripts
Library

TIBCO Spotfire Server

© TIBCO Software Inc. Page 15 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
STATISTICAL ENGINES

Unless the data function you want is already registered


on the Spotfire Library, the first step will be to register
Spotfire Server the data function. Even if the data function exists on
DATA
• Define and edit function the library, it may be a good idea to Open in the
FUNCTION
• Script, input, output parameters Register Data Functions dialog in order to view the
Description, Script, and Input/Output Parameters.
• Save function to Spotfire Library

S-PLUS function
Tools ▼ Register Data Function … S-PLUS script
R function - Open Source R
1. Write script or select function R script - Open Source R
2. Define input variables R script – TIBCO Enterprise Runtime for R
3. Define output variables MATLAB® script
SAS® script

Spotfire Client Insert ▼ Data Function …


4. Select function
• Define input and output 5. Define Input handling If you have opened in the Register
handling (relative to analysis) 6. Define Output handling Data Functions dialog, all you
• Refresh the function have to do is Run in order to insert
automatically, limit based on the data function into the analysis
marking or filtering Run document (you do not have to visit
• Function definition copy placed the Insert menu).
in Analysis document
When the data function is Run or Inserted, a copy is placed in the
analysis document. Now, the statistical engine is still required, but the
client no longer needs to be connected to the Spotfire Server.

Input and Output handling for the copy of the Function definition
within a given Spotfire analysis can be edited.
Spotfire Statistics
Services Edit ▼ Document Properties ♦ Data Functions
-or- Local Adapter
Decide if the function will
automatically update, or will
require you to click to update.
• TIBCO Enterprise Runtime for R (TERR)
• S+ Engine
• Open source R Engine
• MATLAB® Engine You may limit data
• SAS® Engine input based upon
subsets of data
defined by filtering
or marking.

© TIBCO Software Inc. Page 16 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
MULTIVARIATE DATA ANALYSIS

Normalization
Before initiating any computational
multivariate data analysis techniques (Line
Similarity, K-means Clustering, Hierarchical
Clustering), consider whether any
normalization needs to be applied.

Heat Map Dendrograms


Hierarchical Clustering
Visualization Properties ♦
Dendrograms All multivariate
techniques

File ▼ Add Data Tables …


Transformations ♦
Normalization

Empty values
Before initiating any computational multivariate data
analysis techniques, consider how empty values will be
treated during the calculations:

Heat Map Dendrograms


Hierarchical Clustering
Visualization Properties ♦ Dendrograms

Line Similarity
K-means Clustering
Visualization Properties ♦ Appearance

Uncheck to allow row interpolation, check to


break lines and prevent row interpolation.

© TIBCO Software Inc. Page 17 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
MULTIVARIATE DATA ANALYSIS

Tools ▼ Line Similarity … Custom options


Determine how you will define the Master line:
Ascending
LINE Single marked line
SIMILARITY Descending
Flat then ascending
Ascending, then descending
Ascending, then flat
Descending, then flat

Average of multiple marked lines Descending, then ascending


Flat, then descending
Maximum value
Mean value
Minimum value

Select a Distance measure:

Interpreting results:
Correlation similarity Euclidean distance
compares the shape to the master line compares the distance between points to the master line

d d Similarity = ∑d
Similarity = +1
d
d d

Similarity = 0

Similarity = -1 Similarity ~0

Information about the master line and similarity calculation


settings can be found in the resulting Line Similarity column:

Edit ▼ Column Properties … General ♦ Description

© TIBCO Software Inc. Page 18 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
MULTIVARIATE DATA ANALYSIS

Tools ▼ K-means Clustering …


Select a Distance measure: Select a Max number of clusters:
K-MEANS
CLUSTERING

Interpreting results:
Good K-means clustering meets two criteria:
1. Each cluster group has similar line patterns
2. Similar patterns do not appear in different cluster groups
If these criteria are not met, consider repeating the clustering with a different max number of clusters.

Additional information about the clustering settings and calculations can be found in the resulting K-means
Clustering column:
Edit ▼ Column Properties … General ♦ Description

© TIBCO Software Inc. Page 19 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
MULTIVARIATE DATA ANALYSIS

Tools ▼ Hierarchical Clustering … -or- Visualization Properties


♦ Dendrograms
HIERARCHICAL
CLUSTERING
Select a Distance measure:

Distance-based measures Shaped-based measures

Euclidean distance Correlation


• shortest distance (d) between two points • Pearson Product Moment Correlation

Square Euclidean & Half Square Euclidean Cosine correlation


• Similar to Euclidean, only differs • factors in an small effect of magnitude
by subtle changes to algorithm on the resulting value

City block (aka. Manhattan metric)


• (d) measured like walking in a city Tanimoto
• ranges from 0 to +1
• effect of a large (d) in a single • only applicable for a binary variable
dimension may be diminished

Select a Clustering method:


WPGMA
Weighted Pair-Group Method with Arithmetic mean
SimCN= ½ (SimAC + SimBC)
Need to calculate the Similarity value for CN (SimCN)
UPGMA
Unweighted Pair-Group Method with
Arithmetic mean
SimCN= (6/10 x SimAC) + (4/10 x SimBC)
Single Linkage
Minimum distance between patterns
SimCN= Sim for most similar rows
Complete Linkage
Maximum distance between patterns
SimCN= Sim for least similar rows
Ward’s Method
Incremental sum of squares calculation (always uses Half-square Euclidean distance as distance measure)
SimCN= (13/17 x SimAC) + (11/17 x SimBC) – (7/17 x SimAB)

© TIBCO Software Inc. Page 20 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide
MULTIVARIATE DATA ANALYSIS

Interpreting results:
The advantage of applying Hierarchical clustering, over K-means clustering, is the fact that
you can select the number of cluster groups after the clustering has been applied. The
HIERARCHICAL resulting dendrogram will show you a map of similarities - remember, shorter brackets
CLUSTERING indicate greater similarity. Move the Pruning line to select a number of cluster groupings.
A new Row cluster IDs column is tied to the position of the pruning line.

Hierarchical
clustering
settings

Pruning line

Clusters with only


one heat map row

Number of Column values


clusters will change
when pruning
line is moved

Drill down
You can mouseover and mark
dendrogram to drill down to get
details about specific areas of the
dendrogram or clustered heat map.

© TIBCO Software Inc. Page 21 TIBCO Spotfire Analyst Advanced Calculations – Quick Reference Guide

You might also like